FW: [cxml-devel] characters.lisp improvements

Fri Jun 23 20:11:56 UTC 2006

Allrighty fine, I see how it is. ;-)

These patches deprecate characters.lisp in favor of the
xml-name-rune-p.lisp. I pulled over #'VALID-NAME-P and #'VALID-NMTOKEN-P
that were the new functions in characters.lisp, and these functions now use
the ones provided by xml-name-rune-p.lisp.
http://www.unwashedmeme.com/cxml/asd-remove-chars.diff  (diff to cxml.asd)
http://www.unwashedmeme.com/cxml/characters-merge.diff  (against
xml/xml-name-rune-p.lisp)
And characters.lisp can be deleted now.

Before doing any of the changes I made sure that #'RUNE-NAME-CHAR-P and
#'NAME-RUNE-P functions returned #'EQ results for every character between 0
and +max+.

I also changed the compile-time behavior of xml-name-rune-p.lisp, but it
should be run-time equivalent(inlined bit-vector lookup). 

At compile time the code now looks a little bit more like the code in
characters.lisp. The special character ranges are now vectors that are
separate from the code (the binary search) that examine them, rather than
ORing everything together. 

At compile time it will then evaluate the predicates over every possible
value and save the result in a bitvector (this is how it was working before)
and garbage collection should be able to reclaim the character-range
vectors.  This change resulted in the compilation on my machine dropping
from ~8.9s to about ~.25s. 

Changes that might be significant:

PREDICATE-TO-BV 
-                 (dotimes (i #x10000 r)
+                 (dotimes (i +max+ r)
Where +max+ = #xD800. The vector only goes up to max, so there is no point
trying any characters above that right? The predicates that use the
bit-vector fail for anything above max too.

NAME-RUNE-P and NAME-START-RUNE-P (the public inlined functions)
-             (DEFINLINE NAME-RUNE-P (RUNE)
-               (SETF RUNE (RUNE-CODE RUNE))
-               (AND (<= 0 RUNE ,+max+)
-                    (LOCALLY (DECLARE (OPTIMIZE (SAFETY 0) (SPEED 3)))
-                             (= 1 (SBIT ',(predicate-to-bv #'name-rune-p)
-                                        (THE FIXNUM RUNE))))))
+	    (DEFINLINE NAME-RUNE-P (RUNE)
+	      (SETF RUNE (RUNE-CODE RUNE))
+	      (LOCALLY (DECLARE (OPTIMIZE (SAFETY 0) (SPEED 3))
+				(type fixnum rune))
+		  (AND (<= 0 RUNE ,+max+)
+		       (= 1 (SBIT ',(predicate-to-bv #'name-rune-p)
+				RUNE)))))
I moved the locally declarations up by a line to include the <= check and
declare the type of the rune a little bit earlier (so that <= might be able
to take advantage of this). Does anyone know of problems this will cause? 

TEST RESULTS:
Xmlconf:run-all-tests 
0/1829 tests failed; 333 tests were skipped
Domtest:run-all-tests
0/763 tests failed; 43 tests were skipped

Timing the xmlconf tests there wasn't really any significant change in speed
or memory. Profiling it showed a bit of a drop in the memory usage, but not
by much. For the xmlconf tests, these functions are called a decent amount,
but don't contribute all that much to the runtime. These weren't terribly
exact tests, but a part of the point here was to try and clean-up and remove
some of the duped code.
http://www.unwashedmeme.com/cxml/characters-merge-profile.results 

Heh, at least compilation is faster. Whad'ya think?

Nathan

(My apologies if you got this twice, it looks like it didn't send the first
time.)

>-----Original Message-----
>From: David Lichteblau [mailto:david at lichteblau.com]
>Sent: Tuesday, June 13, 2006 2:02 PM
>To: Nathan Bird
>Cc: cxml-devel at common-lisp.net
>Subject: Re: [cxml-devel] characters.lisp improvements
>
>
>However, I have to admit that my characters.lisp duplicates work already
>done by Gilbert.  Gilbert's functions are in xml-name-rune-p.lisp and
>are using inline functions accessing bitvectors.  Embarrassingly, I
>didn't notice the latter file before writing characters.lisp and then
>just stuck a comment into the file instead of fixing it right away.
>My apologies for that -- and sorry for writing that comment in german.  :-(

That's why we have google translate... to get bad translations of offhand
comments :-)