From Alain.Picard at memetrics.com Thu Mar 31 02:09:21 2005 From: Alain.Picard at memetrics.com (Alain.Picard at memetrics.com) Date: Thu, 31 Mar 2005 12:09:21 +1000 Subject: [mod-lisp-devel] Possible bug Message-ID: <16971.23505.494732.122137@memetrics.com> Dear all, I'm using modlisp v 2.42 (someone tell me if I'm supposed to use something better/newer) on Apache 1.3.33. I came in one day to find my apache helpless to render any requests, indicating in its log file: [Wed Mar 30 18:11:00 2005] [error] System: Too many open files (errno: 24) [Wed Mar 30 18:11:08 2005] [error] [client 203.100.236.222] (24)Too many open files: Could not open password file: /w ww/conf/passwords My set up is as follows; I have 4 virtual hosts, each of which has 1 Lispserver directive, e.g. DocumentRoot /www/clients/instance1 ServerName xos.memetrics.com LispServer 192.168.1.116 3000 "instance1" SetHandler lisp-handler and so on for the other 3, with each one having a unique host/port combination, each vhost talking to a different Lisp back end. Now, two of my back ends are currently non-existent, i.e. nobody is listening on ports 192.168.1.118:3002 (say). There is a process which pings, once per second, a URL which apache tries to forward to the (non-existent) lisp back end. Each such ping seems to result in a file descriptor leak; to wit: [root at asp1]/www/conf# lsof |grep http | wc ; date 875 8426 81263 Wed Mar 30 19:51:03 CST 2005 ... some time later [root at asp1]/www/conf# lsof |grep http | wc ; date 920 8876 85448 Wed Mar 30 19:54:51 CST 2005 This does not occur if the Lisp back ends are alive and accepting requests. I've looked through the modlisp code, but no obvious leaks are apparent to me, of course, I've no clue what ap_psocket really does, and it's not documented. :-( Any help is appreciated... --Alain Picard From Alain.Picard at memetrics.com Thu Mar 31 02:27:29 2005 From: Alain.Picard at memetrics.com (Alain.Picard at memetrics.com) Date: Thu, 31 Mar 2005 12:27:29 +1000 Subject: [mod-lisp-devel] Possible bug In-Reply-To: <16971.23505.494732.122137@memetrics.com> References: <16971.23505.494732.122137@memetrics.com> Message-ID: <16971.24593.200297.711853@memetrics.com> Dear all, Sorry to follow up on myself again, but I discovered that, in "int OpenLispSocket(excfg *cfg)", if after the successful socket open, I change the code from /* Check if we connected */ if (ret == -1) return -1; to /* Check if we connected */ if (ret == -1) { ap_pclosesocket(SocketPool, sock); return -1; } The broken behaviour disappears. I do not know if there are other places in the code which may also be leaking descriptors in other call paths. Cheers, --ap From marc.battyani at fractalconcept.com Thu Mar 31 22:02:52 2005 From: marc.battyani at fractalconcept.com (Marc Battyani) Date: Fri, 1 Apr 2005 00:02:52 +0200 Subject: [mod-lisp-devel] Possible bug References: <16971.23505.494732.122137@memetrics.com> <16971.24593.200297.711853@memetrics.com> Message-ID: <1b3301c5363d$63e4ef60$0a02a8c0@marcxp> > Sorry to follow up on myself again, but I discovered that, > in "int OpenLispSocket(excfg *cfg)", > > if after the successful socket open, I change the code > from > > /* Check if we connected */ > if (ret == -1) > return -1; > > to > > /* Check if we connected */ > if (ret == -1) > { > ap_pclosesocket(SocketPool, sock); > return -1; > } > > The broken behaviour disappears. > > I do not know if there are other places in the code which may > also be leaking descriptors in other call paths. Good catch, thanks! Those languages without unwind-protect are a pain ;-) There is at least the equivalent code in mod_lisp2. I will fix it and commit both versions. Generaly though the Apache processes are killed after serving a given number of replies. This is probably why this was not catched before. Cheers, Marc