From mlemoine at mentel.com Wed Aug 15 13:29:40 2012 From: mlemoine at mentel.com (Mathieu Lemoine) Date: Wed, 15 Aug 2012 09:29:40 -0400 Subject: [hunchentoot-devel] Patch from the past: Ref. [c.2009-07] Hunchentoot effective DOS-attack Message-ID: Hi, I am currently hitting the problem described in the following thread: http://lists.common-lisp.net/pipermail/tbnl-devel/2009-July/004768.html I'm using SBCL 1.0.58 with hunchentoot 1.2.3 and usocket 0.5.5 (both from quicklisp). I understand the long-term fix will be some hard work. However, in this thread, Hans is referencing a few diff hunks : - http://bknr.net/trac/changeset/4430?format=diff&new=4430 - http://bknr.net/trac/changeset/4434?format=diff&new=4434 - http://bknr.net/trac/changeset/4435?format=diff&new=4435 Since the old trac is not working anymore, could it be possible to either point me toward the equivalent commits in the github repository or the content of these patches. By the way, Peter, if you're still around, did you need to apply only 4435 or both this and 4434 in order to solve the second issue (unbound variable on *acceptor*)? If anybody has information with respect to these issues, including the corresponding bug report in usocket, so I can do some follow up with them, it would be greatly appreciated. Thanks in advance for your help and time. Mathieu. -------------- next part -------------- An HTML attachment was scrubbed... URL: From hans.huebner at gmail.com Wed Aug 15 15:38:28 2012 From: hans.huebner at gmail.com (=?ISO-8859-1?Q?Hans_H=FCbner?=) Date: Wed, 15 Aug 2012 17:38:28 +0200 Subject: [hunchentoot-devel] Patch from the past: Ref. [c.2009-07] Hunchentoot effective DOS-attack In-Reply-To: References: Message-ID: Hi Mathieu, I have mined the Subversion repository for the commits that you referenced, and I found them to represent this patch: Index: taskmaster.lisp =================================================================== --- taskmaster.lisp (revision 4427) +++ taskmaster.lisp (revision 4435) @@ -127,9 +127,21 @@ #-:lispworks (defmethod handle-incoming-connection ((taskmaster one-thread-per-connection-taskmaster) socket) - (bt:make-thread (lambda () - (process-connection (taskmaster-acceptor taskmaster) socket)) - :name (format nil "Hunchentoot worker \(client: ~A)" (client-as-string socket)))) + ;; We are handling all conditions here as we want to make sure that + ;; the acceptor process never crashes while trying to create a + ;; worker thread. One such problem exists in + ;; GET-PEER-ADDRESS-AND-PORT which can signal socket conditions on + ;; some platforms in certain situations. + (handler-case + (bt:make-thread (lambda () + (process-connection (taskmaster-acceptor taskmaster) socket)) + :name (format nil "Hunchentoot worker \(client: ~A)" (client-as-string socket))) + + (error (cond) + ;; Need to bind *ACCEPTOR* so that LOG-MESSAGE can do its work. + (let ((*acceptor* (taskmaster-acceptor taskmaster))) + (log-message *lisp-errors-log-level* + "Error while creating worker thread for new incoming connection: ~A" cond))))) ;; LispWorks implementation Now, the patch actually does not apply, but it was on trunk and has been evolved into the current version: https://github.com/edicl/hunchentoot/blob/master/taskmaster.lisp#L355 If I understand you correctly, you say that the code does not work. Can you somehow provide a way to reproduce the problem? Thanks, Hans On Wed, Aug 15, 2012 at 3:29 PM, Mathieu Lemoine wrote: > Hi, > > I am currently hitting the problem described in the following thread: > http://lists.common-lisp.net/pipermail/tbnl-devel/2009-July/004768.html > > I'm using SBCL 1.0.58 with hunchentoot 1.2.3 and usocket 0.5.5 (both from > quicklisp). > > I understand the long-term fix will be some hard work. However, in this > thread, Hans is referencing a few diff hunks : > > http://bknr.net/trac/changeset/4430?format=diff&new=4430 > http://bknr.net/trac/changeset/4434?format=diff&new=4434 > http://bknr.net/trac/changeset/4435?format=diff&new=4435 > > Since the old trac is not working anymore, could it be possible to either > point me toward the equivalent commits in the github repository or the > content of these patches. > > By the way, Peter, if you're still around, did you need to apply only 4435 > or both this and 4434 in order to solve the second issue (unbound variable > on *acceptor*)? > > If anybody has information with respect to these issues, including the > corresponding bug report in usocket, so I can do some follow up with them, > it would be greatly appreciated. > > Thanks in advance for your help and time. > > Mathieu. > > _______________________________________________ > tbnl-devel site list > tbnl-devel at common-lisp.net > http://common-lisp.net/mailman/listinfo/tbnl-devel From mlemoine at mentel.com Wed Aug 15 16:02:37 2012 From: mlemoine at mentel.com (Mathieu Lemoine) Date: Wed, 15 Aug 2012 12:02:37 -0400 Subject: [hunchentoot-devel] Patch from the past: Ref. [c.2009-07] Hunchentoot effective DOS-attack In-Reply-To: References: Message-ID: Hi Hans, I though this was the problem because of the error message I find in my *standard-error* log: Error while creating worker thread for new incoming connection: Socket error in "getpeername": ENOTCONN (Transport endpoint is not connected) However, I noticed reading your patches, this is the error message introduced by the patch. Since the error message is present many times, I think your code is working, otherwise the acceptor would crash instantly. Currently, the symptoms I saw are these: - Timeout when trying to reach port 80 (on which Hunchentoot is listening) - No error message in either *standard-error* nor *standard-output* logs - Only one thread (hunchentoot listener?) running in addition to the main process thread (as shown by htop) This problem occurs in production environment after a few hours of operations. I restarted it once again with a swank server this time. Next time the problem occurs, I will try to dig in it to gather more informations on the state of my software. I still don't know what is causing the problem or how to reproduce it, but I will try to check it up and keep you posted. Thanks for your time and help. Mathieu. 2012/8/15 Hans H?bner > Hi Mathieu, > > I have mined the Subversion repository for the commits that you > referenced, and I found them to represent this patch: > > Index: taskmaster.lisp > =================================================================== > --- taskmaster.lisp (revision 4427) > +++ taskmaster.lisp (revision 4435) > @@ -127,9 +127,21 @@ > > #-:lispworks > (defmethod handle-incoming-connection ((taskmaster > one-thread-per-connection-taskmaster) socket) > - (bt:make-thread (lambda () > - (process-connection (taskmaster-acceptor > taskmaster) socket)) > - :name (format nil "Hunchentoot worker \(client: > ~A)" (client-as-string socket)))) > + ;; We are handling all conditions here as we want to make sure that > + ;; the acceptor process never crashes while trying to create a > + ;; worker thread. One such problem exists in > + ;; GET-PEER-ADDRESS-AND-PORT which can signal socket conditions on > + ;; some platforms in certain situations. > + (handler-case > + (bt:make-thread (lambda () > + (process-connection (taskmaster-acceptor > taskmaster) socket)) > + :name (format nil "Hunchentoot worker \(client: > ~A)" (client-as-string socket))) > + > + (error (cond) > + ;; Need to bind *ACCEPTOR* so that LOG-MESSAGE can do its work. > + (let ((*acceptor* (taskmaster-acceptor taskmaster))) > + (log-message *lisp-errors-log-level* > + "Error while creating worker thread for new > incoming connection: ~A" cond))))) > > ;; LispWorks implementation > > Now, the patch actually does not apply, but it was on trunk and has > been evolved into the current version: > > https://github.com/edicl/hunchentoot/blob/master/taskmaster.lisp#L355 > > If I understand you correctly, you say that the code does not work. > Can you somehow provide a way to reproduce the problem? > > Thanks, > Hans > > On Wed, Aug 15, 2012 at 3:29 PM, Mathieu Lemoine > wrote: > > Hi, > > > > I am currently hitting the problem described in the following thread: > > http://lists.common-lisp.net/pipermail/tbnl-devel/2009-July/004768.html > > > > I'm using SBCL 1.0.58 with hunchentoot 1.2.3 and usocket 0.5.5 (both from > > quicklisp). > > > > I understand the long-term fix will be some hard work. However, in this > > thread, Hans is referencing a few diff hunks : > > > > http://bknr.net/trac/changeset/4430?format=diff&new=4430 > > http://bknr.net/trac/changeset/4434?format=diff&new=4434 > > http://bknr.net/trac/changeset/4435?format=diff&new=4435 > > > > Since the old trac is not working anymore, could it be possible to either > > point me toward the equivalent commits in the github repository or the > > content of these patches. > > > > By the way, Peter, if you're still around, did you need to apply only > 4435 > > or both this and 4434 in order to solve the second issue (unbound > variable > > on *acceptor*)? > > > > If anybody has information with respect to these issues, including the > > corresponding bug report in usocket, so I can do some follow up with > them, > > it would be greatly appreciated. > > > > Thanks in advance for your help and time. > > > > Mathieu. > > > > _______________________________________________ > > tbnl-devel site list > > tbnl-devel at common-lisp.net > > http://common-lisp.net/mailman/listinfo/tbnl-devel > > _______________________________________________ > tbnl-devel site list > tbnl-devel at common-lisp.net > http://common-lisp.net/mailman/listinfo/tbnl-devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hans.huebner at gmail.com Wed Aug 15 16:15:48 2012 From: hans.huebner at gmail.com (=?ISO-8859-1?Q?Hans_H=FCbner?=) Date: Wed, 15 Aug 2012 18:15:48 +0200 Subject: [hunchentoot-devel] Patch from the past: Ref. [c.2009-07] Hunchentoot effective DOS-attack In-Reply-To: References: Message-ID: Hi Mathieu, you may want write a log entry in the DO clause of the loop in this function: https://github.com/edicl/hunchentoot/blob/master/taskmaster.lisp#L250 It could be that somehow, you're running out of worker threads, but the code fails to correctly decrement the free worker count. You can also inspect request-count slot of the taskmaster when you see the hanging server again. Keep us posted about what you find. -Hans On Wed, Aug 15, 2012 at 6:02 PM, Mathieu Lemoine wrote: > Hi Hans, > > I though this was the problem because of the error message I find in my > *standard-error* log: > Error while creating worker thread for new incoming connection: Socket error > in "getpeername": ENOTCONN (Transport endpoint is not connected) > However, I noticed reading your patches, this is the error message > introduced by the patch. > Since the error message is present many times, I think your code is working, > otherwise the acceptor would crash instantly. > > Currently, the symptoms I saw are these: > - Timeout when trying to reach port 80 (on which Hunchentoot is listening) > - No error message in either *standard-error* nor *standard-output* logs > - Only one thread (hunchentoot listener?) running in addition to the main > process thread (as shown by htop) > > This problem occurs in production environment after a few hours of > operations. I restarted it once again with a swank server this time. > Next time the problem occurs, I will try to dig in it to gather more > informations on the state of my software. > > I still don't know what is causing the problem or how to reproduce it, but I > will try to check it up and keep you posted. > > Thanks for your time and help. > Mathieu. > > > 2012/8/15 Hans H?bner >> >> Hi Mathieu, >> >> I have mined the Subversion repository for the commits that you >> referenced, and I found them to represent this patch: >> >> Index: taskmaster.lisp >> =================================================================== >> --- taskmaster.lisp (revision 4427) >> +++ taskmaster.lisp (revision 4435) >> @@ -127,9 +127,21 @@ >> >> #-:lispworks >> (defmethod handle-incoming-connection ((taskmaster >> one-thread-per-connection-taskmaster) socket) >> - (bt:make-thread (lambda () >> - (process-connection (taskmaster-acceptor >> taskmaster) socket)) >> - :name (format nil "Hunchentoot worker \(client: >> ~A)" (client-as-string socket)))) >> + ;; We are handling all conditions here as we want to make sure that >> + ;; the acceptor process never crashes while trying to create a >> + ;; worker thread. One such problem exists in >> + ;; GET-PEER-ADDRESS-AND-PORT which can signal socket conditions on >> + ;; some platforms in certain situations. >> + (handler-case >> + (bt:make-thread (lambda () >> + (process-connection (taskmaster-acceptor >> taskmaster) socket)) >> + :name (format nil "Hunchentoot worker \(client: >> ~A)" (client-as-string socket))) >> + >> + (error (cond) >> + ;; Need to bind *ACCEPTOR* so that LOG-MESSAGE can do its work. >> + (let ((*acceptor* (taskmaster-acceptor taskmaster))) >> + (log-message *lisp-errors-log-level* >> + "Error while creating worker thread for new >> incoming connection: ~A" cond))))) >> >> ;; LispWorks implementation >> >> Now, the patch actually does not apply, but it was on trunk and has >> been evolved into the current version: >> >> https://github.com/edicl/hunchentoot/blob/master/taskmaster.lisp#L355 >> >> If I understand you correctly, you say that the code does not work. >> Can you somehow provide a way to reproduce the problem? >> >> Thanks, >> Hans >> >> On Wed, Aug 15, 2012 at 3:29 PM, Mathieu Lemoine >> wrote: >> > Hi, >> > >> > I am currently hitting the problem described in the following thread: >> > http://lists.common-lisp.net/pipermail/tbnl-devel/2009-July/004768.html >> > >> > I'm using SBCL 1.0.58 with hunchentoot 1.2.3 and usocket 0.5.5 (both >> > from >> > quicklisp). >> > >> > I understand the long-term fix will be some hard work. However, in this >> > thread, Hans is referencing a few diff hunks : >> > >> > http://bknr.net/trac/changeset/4430?format=diff&new=4430 >> > http://bknr.net/trac/changeset/4434?format=diff&new=4434 >> > http://bknr.net/trac/changeset/4435?format=diff&new=4435 >> > >> > Since the old trac is not working anymore, could it be possible to >> > either >> > point me toward the equivalent commits in the github repository or the >> > content of these patches. >> > >> > By the way, Peter, if you're still around, did you need to apply only >> > 4435 >> > or both this and 4434 in order to solve the second issue (unbound >> > variable >> > on *acceptor*)? >> > >> > If anybody has information with respect to these issues, including the >> > corresponding bug report in usocket, so I can do some follow up with >> > them, >> > it would be greatly appreciated. >> > >> > Thanks in advance for your help and time. >> > >> > Mathieu. >> > >> > _______________________________________________ >> > tbnl-devel site list >> > tbnl-devel at common-lisp.net >> > http://common-lisp.net/mailman/listinfo/tbnl-devel >> >> _______________________________________________ >> tbnl-devel site list >> tbnl-devel at common-lisp.net >> http://common-lisp.net/mailman/listinfo/tbnl-devel > > > > _______________________________________________ > tbnl-devel site list > tbnl-devel at common-lisp.net > http://common-lisp.net/mailman/listinfo/tbnl-devel From mlemoine at mentel.com Thu Aug 16 13:42:20 2012 From: mlemoine at mentel.com (Mathieu Lemoine) Date: Thu, 16 Aug 2012 09:42:20 -0400 Subject: [hunchentoot-devel] Patch from the past: Ref. [c.2009-07] Hunchentoot effective DOS-attack In-Reply-To: References: Message-ID: Hi Hans, Yes, that is the problem. Here is the result of my swank session (*cheshire* being the variable holding my acceptor): CL-USER> (hunchentoot::taskmaster-max-thread-count > (hunchentoot::acceptor-taskmaster *cheshire*)) > 100 > CL-USER> (hunchentoot::taskmaster-request-count > (hunchentoot::acceptor-taskmaster *cheshire*)) > 100 > CL-USER> (hunchentoot::taskmaster-max-accept-count > (hunchentoot::acceptor-taskmaster *cheshire*)) > 120 > CL-USER> (setf (hunchentoot::taskmaster-request-count > (hunchentoot::acceptor-taskmaster *cheshire*)) 0) > 0 > CL-USER> (hunchentoot::note-free-connection > (hunchentoot::acceptor-taskmaster *cheshire*)) > 0 > After that, the server resumed serving requests without any further problems. What is weird, however, is that I never have the "Can't handle a new request, too many request threads already" error message in either of my logs. This may be a long shot, but by reading handle-incoming-connection ( https://github.com/edicl/hunchentoot/blob/master/taskmaster.lisp#L288) and create-request-handler-thread ( https://github.com/edicl/hunchentoot/blob/master/taskmaster.lisp#L355), I think that if an error arises while creating a thread (such as the "getpeername": ENOTCONN [...] message), the counter is not decremented because the thread is never started (thus doesn't reach the dynamic scope of the unwind-protect). I sent a pull request with the patch for this idea and will keep you posted if that solved the bug or not. I should be fixed by tomorrow since the bug occurs after roughly 9 hours of operations. Thanks for your help. 2012/8/15 Hans H?bner > Hi Mathieu, > > you may want write a log entry in the DO clause of the loop in this > function: > > https://github.com/edicl/hunchentoot/blob/master/taskmaster.lisp#L250 > > It could be that somehow, you're running out of worker threads, but > the code fails to correctly decrement the free worker count. You can > also inspect request-count slot of the taskmaster when you see the > hanging server again. > > Keep us posted about what you find. > -Hans > > On Wed, Aug 15, 2012 at 6:02 PM, Mathieu Lemoine > wrote: > > Hi Hans, > > > > I though this was the problem because of the error message I find in my > > *standard-error* log: > > Error while creating worker thread for new incoming connection: Socket > error > > in "getpeername": ENOTCONN (Transport endpoint is not connected) > > However, I noticed reading your patches, this is the error message > > introduced by the patch. > > Since the error message is present many times, I think your code is > working, > > otherwise the acceptor would crash instantly. > > > > Currently, the symptoms I saw are these: > > - Timeout when trying to reach port 80 (on which Hunchentoot is > listening) > > - No error message in either *standard-error* nor *standard-output* logs > > - Only one thread (hunchentoot listener?) running in addition to the main > > process thread (as shown by htop) > > > > This problem occurs in production environment after a few hours of > > operations. I restarted it once again with a swank server this time. > > Next time the problem occurs, I will try to dig in it to gather more > > informations on the state of my software. > > > > I still don't know what is causing the problem or how to reproduce it, > but I > > will try to check it up and keep you posted. > > > > Thanks for your time and help. > > Mathieu. > > > > > > 2012/8/15 Hans H?bner > >> > >> Hi Mathieu, > >> > >> I have mined the Subversion repository for the commits that you > >> referenced, and I found them to represent this patch: > >> > >> Index: taskmaster.lisp > >> =================================================================== > >> --- taskmaster.lisp (revision 4427) > >> +++ taskmaster.lisp (revision 4435) > >> @@ -127,9 +127,21 @@ > >> > >> #-:lispworks > >> (defmethod handle-incoming-connection ((taskmaster > >> one-thread-per-connection-taskmaster) socket) > >> - (bt:make-thread (lambda () > >> - (process-connection (taskmaster-acceptor > >> taskmaster) socket)) > >> - :name (format nil "Hunchentoot worker \(client: > >> ~A)" (client-as-string socket)))) > >> + ;; We are handling all conditions here as we want to make sure that > >> + ;; the acceptor process never crashes while trying to create a > >> + ;; worker thread. One such problem exists in > >> + ;; GET-PEER-ADDRESS-AND-PORT which can signal socket conditions on > >> + ;; some platforms in certain situations. > >> + (handler-case > >> + (bt:make-thread (lambda () > >> + (process-connection (taskmaster-acceptor > >> taskmaster) socket)) > >> + :name (format nil "Hunchentoot worker \(client: > >> ~A)" (client-as-string socket))) > >> + > >> + (error (cond) > >> + ;; Need to bind *ACCEPTOR* so that LOG-MESSAGE can do its work. > >> + (let ((*acceptor* (taskmaster-acceptor taskmaster))) > >> + (log-message *lisp-errors-log-level* > >> + "Error while creating worker thread for new > >> incoming connection: ~A" cond))))) > >> > >> ;; LispWorks implementation > >> > >> Now, the patch actually does not apply, but it was on trunk and has > >> been evolved into the current version: > >> > >> https://github.com/edicl/hunchentoot/blob/master/taskmaster.lisp#L355 > >> > >> If I understand you correctly, you say that the code does not work. > >> Can you somehow provide a way to reproduce the problem? > >> > >> Thanks, > >> Hans > >> > >> On Wed, Aug 15, 2012 at 3:29 PM, Mathieu Lemoine > >> wrote: > >> > Hi, > >> > > >> > I am currently hitting the problem described in the following thread: > >> > > http://lists.common-lisp.net/pipermail/tbnl-devel/2009-July/004768.html > >> > > >> > I'm using SBCL 1.0.58 with hunchentoot 1.2.3 and usocket 0.5.5 (both > >> > from > >> > quicklisp). > >> > > >> > I understand the long-term fix will be some hard work. However, in > this > >> > thread, Hans is referencing a few diff hunks : > >> > > >> > http://bknr.net/trac/changeset/4430?format=diff&new=4430 > >> > http://bknr.net/trac/changeset/4434?format=diff&new=4434 > >> > http://bknr.net/trac/changeset/4435?format=diff&new=4435 > >> > > >> > Since the old trac is not working anymore, could it be possible to > >> > either > >> > point me toward the equivalent commits in the github repository or the > >> > content of these patches. > >> > > >> > By the way, Peter, if you're still around, did you need to apply only > >> > 4435 > >> > or both this and 4434 in order to solve the second issue (unbound > >> > variable > >> > on *acceptor*)? > >> > > >> > If anybody has information with respect to these issues, including the > >> > corresponding bug report in usocket, so I can do some follow up with > >> > them, > >> > it would be greatly appreciated. > >> > > >> > Thanks in advance for your help and time. > >> > > >> > Mathieu. > >> > > >> > _______________________________________________ > >> > tbnl-devel site list > >> > tbnl-devel at common-lisp.net > >> > http://common-lisp.net/mailman/listinfo/tbnl-devel > >> > >> _______________________________________________ > >> tbnl-devel site list > >> tbnl-devel at common-lisp.net > >> http://common-lisp.net/mailman/listinfo/tbnl-devel > > > > > > > > _______________________________________________ > > tbnl-devel site list > > tbnl-devel at common-lisp.net > > http://common-lisp.net/mailman/listinfo/tbnl-devel > > _______________________________________________ > tbnl-devel site list > tbnl-devel at common-lisp.net > http://common-lisp.net/mailman/listinfo/tbnl-devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hans.huebner at gmail.com Fri Aug 17 12:54:10 2012 From: hans.huebner at gmail.com (=?ISO-8859-1?Q?Hans_H=FCbner?=) Date: Fri, 17 Aug 2012 14:54:10 +0200 Subject: [hunchentoot-devel] Hunchentoot release 1.2.4 Message-ID: Hi, I have released Hunchentoot 1.2.4 yesterday. It contains an important bug fix for multi threaded servers supplied by Mathieu Lemoine as well as a bunch of smaller fixes and documentation enhancements. Complete CHANGELOG: https://raw.github.com/edicl/hunchentoot/7508f9a3e4ed3af8e4a4f62a39ee0b12d7cc7146/CHANGELOG Download: https://github.com/downloads/edicl/hunchentoot/hunchentoot-1.2.4.tar.gz -Hans From mlemoine at mentel.com Mon Aug 20 14:36:15 2012 From: mlemoine at mentel.com (Mathieu Lemoine) Date: Mon, 20 Aug 2012 10:36:15 -0400 Subject: [hunchentoot-devel] Patch from the past: Ref. [c.2009-07] Hunchentoot effective DOS-attack In-Reply-To: References: Message-ID: Hi Hans, Just to let you know, the server did not have any further problem after this fix. So it looks like everything is running smoothly. I'm going to wait a few more days to make sure everything's stable and make a small announce on the ML regarding my software in case it's interesting to someone. Thanks for your help solving this issue. Mathieu. 2012/8/16 Mathieu Lemoine > Hi Hans, > > Yes, that is the problem. > > Here is the result of my swank session (*cheshire* being the variable > holding my acceptor): > > CL-USER> (hunchentoot::taskmaster-max-thread-count >> (hunchentoot::acceptor-taskmaster *cheshire*)) >> 100 >> CL-USER> (hunchentoot::taskmaster-request-count >> (hunchentoot::acceptor-taskmaster *cheshire*)) >> 100 >> CL-USER> (hunchentoot::taskmaster-max-accept-count >> (hunchentoot::acceptor-taskmaster *cheshire*)) >> 120 >> CL-USER> (setf (hunchentoot::taskmaster-request-count >> (hunchentoot::acceptor-taskmaster *cheshire*)) 0) >> 0 >> CL-USER> (hunchentoot::note-free-connection >> (hunchentoot::acceptor-taskmaster *cheshire*)) >> 0 >> > > After that, the server resumed serving requests without any further > problems. > > What is weird, however, is that I never have the "Can't handle a new > request, too many request threads already" error message in either of my > logs. > > This may be a long shot, but by reading handle-incoming-connection ( > https://github.com/edicl/hunchentoot/blob/master/taskmaster.lisp#L288) > and create-request-handler-thread ( > https://github.com/edicl/hunchentoot/blob/master/taskmaster.lisp#L355), I > think that if an error arises while creating a thread (such as the > "getpeername": ENOTCONN [...] message), the counter is not decremented > because the thread is never started (thus doesn't reach the dynamic scope > of the unwind-protect). > > I sent a pull request with the patch for this idea and will keep you > posted if that solved the bug or not. I should be fixed by tomorrow since > the bug occurs after roughly 9 hours of operations. > > Thanks for your help. > > > 2012/8/15 Hans H?bner > >> Hi Mathieu, >> >> you may want write a log entry in the DO clause of the loop in this >> function: >> >> https://github.com/edicl/hunchentoot/blob/master/taskmaster.lisp#L250 >> >> It could be that somehow, you're running out of worker threads, but >> the code fails to correctly decrement the free worker count. You can >> also inspect request-count slot of the taskmaster when you see the >> hanging server again. >> >> Keep us posted about what you find. >> -Hans >> >> On Wed, Aug 15, 2012 at 6:02 PM, Mathieu Lemoine >> wrote: >> > Hi Hans, >> > >> > I though this was the problem because of the error message I find in my >> > *standard-error* log: >> > Error while creating worker thread for new incoming connection: Socket >> error >> > in "getpeername": ENOTCONN (Transport endpoint is not connected) >> > However, I noticed reading your patches, this is the error message >> > introduced by the patch. >> > Since the error message is present many times, I think your code is >> working, >> > otherwise the acceptor would crash instantly. >> > >> > Currently, the symptoms I saw are these: >> > - Timeout when trying to reach port 80 (on which Hunchentoot is >> listening) >> > - No error message in either *standard-error* nor *standard-output* logs >> > - Only one thread (hunchentoot listener?) running in addition to the >> main >> > process thread (as shown by htop) >> > >> > This problem occurs in production environment after a few hours of >> > operations. I restarted it once again with a swank server this time. >> > Next time the problem occurs, I will try to dig in it to gather more >> > informations on the state of my software. >> > >> > I still don't know what is causing the problem or how to reproduce it, >> but I >> > will try to check it up and keep you posted. >> > >> > Thanks for your time and help. >> > Mathieu. >> > >> > >> > 2012/8/15 Hans H?bner >> >> >> >> Hi Mathieu, >> >> >> >> I have mined the Subversion repository for the commits that you >> >> referenced, and I found them to represent this patch: >> >> >> >> Index: taskmaster.lisp >> >> =================================================================== >> >> --- taskmaster.lisp (revision 4427) >> >> +++ taskmaster.lisp (revision 4435) >> >> @@ -127,9 +127,21 @@ >> >> >> >> #-:lispworks >> >> (defmethod handle-incoming-connection ((taskmaster >> >> one-thread-per-connection-taskmaster) socket) >> >> - (bt:make-thread (lambda () >> >> - (process-connection (taskmaster-acceptor >> >> taskmaster) socket)) >> >> - :name (format nil "Hunchentoot worker \(client: >> >> ~A)" (client-as-string socket)))) >> >> + ;; We are handling all conditions here as we want to make sure that >> >> + ;; the acceptor process never crashes while trying to create a >> >> + ;; worker thread. One such problem exists in >> >> + ;; GET-PEER-ADDRESS-AND-PORT which can signal socket conditions on >> >> + ;; some platforms in certain situations. >> >> + (handler-case >> >> + (bt:make-thread (lambda () >> >> + (process-connection (taskmaster-acceptor >> >> taskmaster) socket)) >> >> + :name (format nil "Hunchentoot worker \(client: >> >> ~A)" (client-as-string socket))) >> >> + >> >> + (error (cond) >> >> + ;; Need to bind *ACCEPTOR* so that LOG-MESSAGE can do its work. >> >> + (let ((*acceptor* (taskmaster-acceptor taskmaster))) >> >> + (log-message *lisp-errors-log-level* >> >> + "Error while creating worker thread for new >> >> incoming connection: ~A" cond))))) >> >> >> >> ;; LispWorks implementation >> >> >> >> Now, the patch actually does not apply, but it was on trunk and has >> >> been evolved into the current version: >> >> >> >> https://github.com/edicl/hunchentoot/blob/master/taskmaster.lisp#L355 >> >> >> >> If I understand you correctly, you say that the code does not work. >> >> Can you somehow provide a way to reproduce the problem? >> >> >> >> Thanks, >> >> Hans >> >> >> >> On Wed, Aug 15, 2012 at 3:29 PM, Mathieu Lemoine >> >> wrote: >> >> > Hi, >> >> > >> >> > I am currently hitting the problem described in the following thread: >> >> > >> http://lists.common-lisp.net/pipermail/tbnl-devel/2009-July/004768.html >> >> > >> >> > I'm using SBCL 1.0.58 with hunchentoot 1.2.3 and usocket 0.5.5 (both >> >> > from >> >> > quicklisp). >> >> > >> >> > I understand the long-term fix will be some hard work. However, in >> this >> >> > thread, Hans is referencing a few diff hunks : >> >> > >> >> > http://bknr.net/trac/changeset/4430?format=diff&new=4430 >> >> > http://bknr.net/trac/changeset/4434?format=diff&new=4434 >> >> > http://bknr.net/trac/changeset/4435?format=diff&new=4435 >> >> > >> >> > Since the old trac is not working anymore, could it be possible to >> >> > either >> >> > point me toward the equivalent commits in the github repository or >> the >> >> > content of these patches. >> >> > >> >> > By the way, Peter, if you're still around, did you need to apply only >> >> > 4435 >> >> > or both this and 4434 in order to solve the second issue (unbound >> >> > variable >> >> > on *acceptor*)? >> >> > >> >> > If anybody has information with respect to these issues, including >> the >> >> > corresponding bug report in usocket, so I can do some follow up with >> >> > them, >> >> > it would be greatly appreciated. >> >> > >> >> > Thanks in advance for your help and time. >> >> > >> >> > Mathieu. >> >> > >> >> > _______________________________________________ >> >> > tbnl-devel site list >> >> > tbnl-devel at common-lisp.net >> >> > http://common-lisp.net/mailman/listinfo/tbnl-devel >> >> >> >> _______________________________________________ >> >> tbnl-devel site list >> >> tbnl-devel at common-lisp.net >> >> http://common-lisp.net/mailman/listinfo/tbnl-devel >> > >> > >> > >> > _______________________________________________ >> > tbnl-devel site list >> > tbnl-devel at common-lisp.net >> > http://common-lisp.net/mailman/listinfo/tbnl-devel >> >> _______________________________________________ >> tbnl-devel site list >> tbnl-devel at common-lisp.net >> http://common-lisp.net/mailman/listinfo/tbnl-devel >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hans.huebner at gmail.com Fri Aug 24 15:46:38 2012 From: hans.huebner at gmail.com (=?ISO-8859-1?Q?Hans_H=FCbner?=) Date: Fri, 24 Aug 2012 17:46:38 +0200 Subject: [hunchentoot-devel] Multithread overload fixes by Mathieu Lemoine Message-ID: Hi, Mathieu Lemoine kindly fixed several issues that made Hunchentoot fail ungracefully under overload conditions in multi threaded mode. I have pulled these changes into the master branch on github and would like to make a new release soon (quicklisp day is upcoming). https://github.com/edicl/hunchentoot/commit/8a790177997ec2c419355eeb7ae5f0023ef9036a https://github.com/edicl/hunchentoot/commit/cae0b8ed074290d53dfa801874012be7fc923421 According to my tests, Hunchentoot is now much more load resilient, but more testing would be great. So if you have some time, please clone the git repository, start a simple server like (defparameter *test* (make-instance 'easy-acceptor :port 8080)) (define-easy-handler (root :uri "/") () (setf (content-type*) "text/plain") (sleep 1) (format nil "OK~%")) (start *test*) and use your http load generation tool of choice to hammer the server. Any feedback would be greatly appreciated. Thanks to Mathieu, and thanks to anyone who helps testing! Hans