[hunchentoot-devel] googlebot revisitation rate excessive?

Hans Hübner hans at huebner.org
Mon Jun 30 04:49:47 UTC 2008


On Mon, Jun 30, 2008 at 2:15 AM, Jeff Cunningham <jeffrey at cunningham.net> wrote:
> [2008-06-29 16:25:23 [INFO]] No session for session identifier
> '491:2BB8EA11136C90E6BA7D7F466951E370' (User-Agent: 'Mozilla/5.0
> (compatible; Googlebot/2.1; +http://www.google.com/bot.html)', IP:
> '127.0.0.1')
> [2008-06-29 16:25:23 [WARNING]] Warning while processing connection:
> Unexpected character , after <meta
> [2008-06-29 16:25:43 [ERROR]] Error while processing connection: I/O timeout
> reading #<SB-SYS:FD-STREAM for "a socket" {195190D9}>.
> [2008-06-29 16:27:27 [INFO]] No session for session identifier
> '481:C9244EC27C31213FFE797F9E2ABE1535' (User-Agent: 'Mozilla/5.0
> (compatible; Googlebot/2.1; +http://www.google.com/bot.html)', IP:
> '127.0.0.1')
> [2008-06-29 16:27:27 [WARNING]] Warning while processing connection:
> Unexpected character , after <meta
>
>
> I've looked high and low for bad syntax involving generation of a meta tag,
> but it isn't there. I think it is an artifact of the timeout or something.
> Anyway, I'm wondering if the googlebot doesn't like the response my server
> is giving it, doesn't respond, the server waits, times out, then finally the
> googlebot gets back to it and by that time the session identifier is bad.
> Any thoughts?

I'd set HUNCHENTOOT:*HEADER-STREAM* to *STANDARD-OUTPUT* and
*BREAK-ON-SIGNALS* to 'WARNING, then wait for the Googlebot request to
come in.  The headers printed to the console may give you a clue what
the request looks like and maybe a way to initiate such a failing
request yourself, maybe with Drakma or wget.  You'll may also be able
to get a clue from looking at the backtrace in a debugger.

I find it curious that Google retries every five minutes.  Did you
verify that the request is coming from a Google IP address?  It may
also be a prankster's script gone wild, in which case I'd block the IP
address.

Or ignore the issue.  The Internet _is_ silly, after all.

-Hans



More information about the Tbnl-devel mailing list