[closure-devel] Array access out of bounds in Closure HTML's sgml parser

Keith Browne tuxedo at deepsky.com
Thu Nov 10 20:13:07 UTC 2011


We're using Closure HTML and Drakma to extract information from Web pages. 
We've run across an intermittent fault with one page in particular from 
YouTube.  We had a little difficulty reproducing the bug at first, but we 
discovered that YouTube was sending us different contents each time.  We 
ran our code in a loop and captured several hundred deliveries of the Web 
page in question until we got another instance that failed.

I've put a copy of the HTML that trips the bug up at

http://www.deepsky.com/~tuxedo/youtube-sgml-breaker.html

You can see the problem by loading closure-html and drakma and evaluating 
this form:

(chtml:parse
   (drakma:http-request
      "http://www.deepsky.com/~tuxedo/youtube-sgml-breaker.html")
   (chtml:make-lhtml-builder))

On SBCL 1.0.53, I'm getting this error:

Index 8192 out of bounds for (SIMPLE-ARRAY CHARACTER (8192)), should be 
nonnegative and <8192.
   [Condition of type SB-INT:INVALID-ARRAY-INDEX-ERROR]

The error is raised in SGML::READ-LITERAL.  I only vaguely understand 
what's going on in that function.  I note that it's raising the error when 
it's parsing the big block of flashvar-related stuff on line 244 of the 
HTML file, and if I delete or add an extra character earlier in that line, 
I can make the error go away.  I infer that there's something happening in 
the character decoding at the point where it needs to grow the buffer 
that's making it lose, but I can't figure out just what it is.

Keith Browne





More information about the closure-devel mailing list