[hunchentoot-devel] Patch from the past: Ref. [c.2009-07] Hunchentoot effective DOS-attack

Thu Aug 16 13:42:20 UTC 2012

Hi Hans,

Yes, that is the problem.

Here is the result of my swank session (*cheshire* being the variable
holding my acceptor):

CL-USER> (hunchentoot::taskmaster-max-thread-count
> (hunchentoot::acceptor-taskmaster *cheshire*))
> 100
> CL-USER> (hunchentoot::taskmaster-request-count
> (hunchentoot::acceptor-taskmaster *cheshire*))
> 100
> CL-USER> (hunchentoot::taskmaster-max-accept-count
> (hunchentoot::acceptor-taskmaster *cheshire*))
> 120
> CL-USER> (setf (hunchentoot::taskmaster-request-count
> (hunchentoot::acceptor-taskmaster *cheshire*)) 0)
> 0
> CL-USER> (hunchentoot::note-free-connection
> (hunchentoot::acceptor-taskmaster *cheshire*))
> 0
>

After that, the server resumed serving requests without any further
problems.

What is weird, however, is that I never have the "Can't handle a new
request, too many request threads already" error message in either of my
logs.

This may be a long shot, but by reading handle-incoming-connection (
https://github.com/edicl/hunchentoot/blob/master/taskmaster.lisp#L288) and
create-request-handler-thread (
https://github.com/edicl/hunchentoot/blob/master/taskmaster.lisp#L355), I
think that if an error arises while creating a thread (such as the
"getpeername": ENOTCONN [...] message), the counter is not decremented
because the thread is never started (thus doesn't reach the dynamic scope
of the unwind-protect).

I sent a pull request with the patch for this idea and will keep you posted
if that solved the bug or not. I should be fixed by tomorrow since the bug
occurs after roughly 9 hours of operations.

Thanks for your help.

2012/8/15 Hans Hübner <hans.huebner at gmail.com>

> Hi Mathieu,
>
> you may want write a log entry in the DO clause of the loop in this
> function:
>
> https://github.com/edicl/hunchentoot/blob/master/taskmaster.lisp#L250
>
> It could be that somehow, you're running out of worker threads, but
> the code fails to correctly decrement the free worker count.  You can
> also inspect request-count slot of the taskmaster when you see the
> hanging server again.
>
> Keep us posted about what you find.
> -Hans
>
> On Wed, Aug 15, 2012 at 6:02 PM, Mathieu Lemoine <mlemoine at mentel.com>
> wrote:
> > Hi Hans,
> >
> > I though this was the problem because of the error message I find in my
> > *standard-error* log:
> > Error while creating worker thread for new incoming connection: Socket
> error
> > in "getpeername": ENOTCONN (Transport endpoint is not connected)
> > However, I noticed reading your patches, this is the error message
> > introduced by the patch.
> > Since the error message is present many times, I think your code is
> working,
> > otherwise the acceptor would crash instantly.
> >
> > Currently, the symptoms I saw are these:
> > - Timeout when trying to reach port 80 (on which Hunchentoot is
> listening)
> > - No error message in either *standard-error* nor *standard-output* logs
> > - Only one thread (hunchentoot listener?) running in addition to the main
> > process thread (as shown by htop)
> >
> > This problem occurs in production environment after a few hours of
> > operations. I restarted it once again with a swank server this time.
> > Next time the problem occurs, I will try to dig in it to gather more
> > informations on the state of my software.
> >
> > I still don't know what is causing the problem or how to reproduce it,
> but I
> > will try to check it up and keep you posted.
> >
> > Thanks for your time and help.
> > Mathieu.
> >
> >
> > 2012/8/15 Hans Hübner <hans.huebner at gmail.com>
> >>
> >> Hi Mathieu,
> >>
> >> I have mined the Subversion repository for the commits that you
> >> referenced, and I found them to represent this patch:
> >>
> >> Index: taskmaster.lisp
> >> ===================================================================
> >> --- taskmaster.lisp     (revision 4427)
> >> +++ taskmaster.lisp     (revision 4435)
> >> @@ -127,9 +127,21 @@
> >>
> >>  #-:lispworks
> >>  (defmethod handle-incoming-connection ((taskmaster
> >> one-thread-per-connection-taskmaster) socket)
> >> -  (bt:make-thread (lambda ()
> >> -                    (process-connection (taskmaster-acceptor
> >> taskmaster) socket))
> >> -                  :name (format nil "Hunchentoot worker \(client:
> >> ~A)" (client-as-string socket))))
> >> +  ;; We are handling all conditions here as we want to make sure that
> >> +  ;; the acceptor process never crashes while trying to create a
> >> +  ;; worker thread.  One such problem exists in
> >> +  ;; GET-PEER-ADDRESS-AND-PORT which can signal socket conditions on
> >> +  ;; some platforms in certain situations.
> >> +  (handler-case
> >> +      (bt:make-thread (lambda ()
> >> +                        (process-connection (taskmaster-acceptor
> >> taskmaster) socket))
> >> +                      :name (format nil "Hunchentoot worker \(client:
> >> ~A)" (client-as-string socket)))
> >> +
> >> +    (error (cond)
> >> +      ;; Need to bind *ACCEPTOR* so that LOG-MESSAGE can do its work.
> >> +      (let ((*acceptor* (taskmaster-acceptor taskmaster)))
> >> +        (log-message *lisp-errors-log-level*
> >> +                     "Error while creating worker thread for new
> >> incoming connection: ~A" cond)))))
> >>
> >>  ;; LispWorks implementation
> >>
> >> Now, the patch actually does not apply, but it was on trunk and has
> >> been evolved into the current version:
> >>
> >> https://github.com/edicl/hunchentoot/blob/master/taskmaster.lisp#L355
> >>
> >> If I understand you correctly, you say that the code does not work.
> >> Can you somehow provide a way to reproduce the problem?
> >>
> >> Thanks,
> >> Hans
> >>
> >> On Wed, Aug 15, 2012 at 3:29 PM, Mathieu Lemoine <mlemoine at mentel.com>
> >> wrote:
> >> > Hi,
> >> >
> >> > I am currently hitting the problem described in the following thread:
> >> >
> http://lists.common-lisp.net/pipermail/tbnl-devel/2009-July/004768.html
> >> >
> >> > I'm using SBCL 1.0.58 with hunchentoot 1.2.3 and usocket 0.5.5 (both
> >> > from
> >> > quicklisp).
> >> >
> >> > I understand the long-term fix will be some hard work. However, in
> this
> >> > thread, Hans is referencing a few diff hunks :
> >> >
> >> > http://bknr.net/trac/changeset/4430?format=diff&new=4430
> >> > http://bknr.net/trac/changeset/4434?format=diff&new=4434
> >> > http://bknr.net/trac/changeset/4435?format=diff&new=4435
> >> >
> >> > Since the old trac is not working anymore, could it be possible to
> >> > either
> >> > point me toward the equivalent commits in the github repository or the
> >> > content of these patches.
> >> >
> >> > By the way, Peter, if you're still around, did you need to apply only
> >> > 4435
> >> > or both this and 4434 in order to solve the second issue (unbound
> >> > variable
> >> > on *acceptor*)?
> >> >
> >> > If anybody has information with respect to these issues, including the
> >> > corresponding bug report in usocket, so I can do some follow up with
> >> > them,
> >> > it would be greatly appreciated.
> >> >
> >> > Thanks in advance for your help and time.
> >> >
> >> > Mathieu.
> >> >
> >> > _______________________________________________
> >> > tbnl-devel site list
> >> > tbnl-devel at common-lisp.net
> >> > http://common-lisp.net/mailman/listinfo/tbnl-devel
> >>
> >> _______________________________________________
> >> tbnl-devel site list
> >> tbnl-devel at common-lisp.net
> >> http://common-lisp.net/mailman/listinfo/tbnl-devel
> >
> >
> >
> > _______________________________________________
> > tbnl-devel site list
> > tbnl-devel at common-lisp.net
> > http://common-lisp.net/mailman/listinfo/tbnl-devel
>
> _______________________________________________
> tbnl-devel site list
> tbnl-devel at common-lisp.net
> http://common-lisp.net/mailman/listinfo/tbnl-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.common-lisp.net/pipermail/tbnl-devel/attachments/20120816/eddcd650/attachment.html>