[tbnl-devel] "Leaked children"

Edi Weitz edi at agharta.de
Tue Jul 20 10:40:26 UTC 2004


Hi!

Again, a little patch. This time it's not about leaked sockets but
rather about "leaked children," so to say. It was Stefan Scholl who
actually prompted me to investigate this further and here's what I
found out:

This piece of code

  if (ReadLength < ContentLength || r->connection->aborted)
    {
      char buffer[HUGE_STRING_LEN];
      ContentLength -= ReadLength;
      do
        {
          ReadLength = ForceGets(buffer, (BUFF *) BuffSocket, 
                                 HUGE_STRING_LEN > ContentLength ? ContentLength : HUGE_STRING_LEN);
          ContentLength -= ReadLength;
        }
      while (ReadLength > 0 && ContentLength > 0);
    }

doesn't work if ContentLength is large enough. What's happening is
that the process of emptying BuffSocket (the loop above) always hangs
when exactly the last 8192 (HUGE_STRING_LEN) bytes are waiting to be
removed.

I tried with ap_bread instead of ForceGets but got the same
result. (Note that ap_bgets which is used by ForceGets does CR/LF
handling which is not needed here.)

The result is that this child becomes unusable but is still there, it
is never killed by the Apache root process. An easy way to reproduce
this (with TBNL) is to do

  (asdf:oos 'asdf:load-op :tbnl-test)
  (tbnl:start-test)

with a proper Apache configuration (see TBNL docs) and then call
ApacheBench with large values like so:

  ab -n 2000 -c 200 http://localhost/tbnl/test/image.jpg

(The important point is that image.jpg is big enough - about 20kB in
this case.)

After doing this you'll see a large number of Apache processes with
ps(1) and the same amount of Lisp processes from within your Lisp
image. Call ApacheBench often enough (two or three times) and Apache
will completely stop to respond because it has reached its 'MaxClient'
limit of 150 (unless you've changed it, of course) but none of the 150
clients is usable.

This happens because ApacheBench will abort all pending connections as
soons as its finished with its tests. The pattern can obviously be
used as a DoS attack on mod_lisp.

Now, what to do? One option would probably be to set a timeout before
emptying the buffer (not tested). But I think the better (and faster)
solution is to get rid of the buffer and the socket as well. The next
time the client is used we'll have to open up a new socket to Lisp but
this won't need 300 seconds (the default Apache value for
'Timeout'). So...

  if (ReadLength < ContentLength || r->connection->aborted)
    {
      ap_log_error("mod_lisp", 0, APLOG_WARNING|APLOG_NOERRNO, r->server, "Could not send complete body to client, closing socket to Lisp");
      ap_bclose(BuffSocket);
      KeepSocket = 0;
      LispSocket = 0;
    }

The appended patch also adds a couple of braces to appease gcc and it
removes some code that is never used at all. Hope that's OK.

Cheers,
Edi.


-------------- next part --------------
A non-text attachment was scrubbed...
Name: mod_lisp.patch
Type: text/x-patch
Size: 7734 bytes
Desc: not available
URL: <https://mailman.common-lisp.net/pipermail/tbnl-devel/attachments/20040720/95e09597/attachment.bin>


More information about the Tbnl-devel mailing list