[s-xml-devel] Changes
Sven Van Caekenberghe
scaekenberghe at common-lisp.net
Tue Jan 31 11:57:46 UTC 2006
Hi,
This month, David Tolpin contributed a set of interesting changes to
S-XML, which have been integrated into CVS head, awaiting inclusion
into the released version later on (when nobody protests).
Thanks a lot David! You obviously took a good look at the source code
and know a lot about XML.
From the Changelog:
2006-01-19 Sven Van Caekenberghe <svc at mac.com>
* added a set of patches contributed by David Tolpin
dvd at davidashen.net : we're now using char of type
Character and #\Null instead of null, read/unread instead of peek/
read and some more declarations for
more efficiency - added hooks for customizing parsing attribute
names and values
Copied and pasted from email conversations with David:
> attached are my patches to s-xml, against the current CVS versions.
> Most changes are type fixes and optimizations: char is declared as
> character and uses #\Null as exceptional value instead of nil (XML
> cannot contain #\Null character). This allows to declare char as
> character, as well as fixes type errors in case of faulty XML
> files: in a few places the original code contains
>
> (char= (read-char stream nil nil) #\SomeChar)
>
> which will yield error if end-of-file is actually met. There is
> also a change in parse-identifier (and probably in other similar
> functions) that replaces peek->read with read->unread sequences.
> The thing is that an XML identifier is probably much more than a
> single character, and thus peek+read requires twice as many
> function calls as read+unread.
>
> One other fix, and I will understand you if you reject it is
> defining callbacks (with fallback to the current behavior)
> *attribute-name-parser* and *attribute-value-parser*. They allow to
> parse attribute instream, without reconsing the attribute list.
> This has been important for me, I use S-XML to read multimegabyte
> files and need to spend at most a second on it.
>
> It helps decrease memory consumption, too, the current call in my
> code is:
>
> (let ((s-xml:*ignore-namespaces* t)
> (s-xml:*attribute-name-parser* #'attn-by-name)
> (s-xml:*attribute-value-parser*
> #'(lambda (name string)
> (declare (type attn name))
> (funcall (attn-parse name) string))))
> (s-xml:start-parse-xml
> input
> (make-instance 's-xml:xml-parser-state
> :seed seed
> :new-element-hook #'new-element-hook
> :finish-element-hook #'finish-element-hook)))
>
> that is, attribute names and values are parsed before being added
> to the attribute list.
>
> I've also changed processing of the attribute list when namespaces
> are turned on so that it is patched in place and not reconsed.
And some clarifications later on:
>> - aren't you misusing *ignore-namespaces* as a toggle for your
>> attribute-[name|value]-parse functionality ?
>
> they are called in different places with and without namespaces.
> Without namespaces, name/value calls can be applied immediately
> when each attribute is read. With namespaces, they must be delayed
> until all attributes are resolved.
>
>> - couldn't we move some of the tests surrounding the attribute-
>> [name|value]-parser funcalls to the (default) implementations ?
>
> I've looked again and don't think so, otherwise non-default
> implementations won't be transparent.
>
>> - isn't
>> (defun parse-attribute-value (name string)
>> "Default parser for the attribute value"
>> (declare (ignore name)
>> (special *ignore-namespace*))
>> (if *ignore-namespaces*
>> (copy-seq string)
>> string))
>> wrong ?
>
> Without namespaces, parse-attribute-value is called on every
> attribute. This means that the default implementation must copy the
> value, but a non-default one does not have to do so, instead, it
> can convert the value into an integer or a symbol. This saves about
> 10 megabytes of consed memory on a 3 Mb source.
>
> With namespaces, the value is already copied before the default
> implementation is called, and there is no sense to copy it again -
> that would, again lose 10 Megabytes on the same 3 Mb file.
>
>> I mean, I think the string should always be copied or never, no ?
>> How does this depend on namespaces being used ?
>
> Because with namespaces, attribute values are always copied in
> parse-*-attributes. Without namespaces, the copying can be avoided.
>
>> - isn't the attribute-[name|value]-parser called twice for each
>> attribute ? I am confused with my own code ! It has been a while
>> since I looked at it.
>
> It is either called when each attribute is parsed (when *ingore-
> attributes* is nil) or when the element is composed, when *ignore-
> attributes* is t. This is purely an efficiency issue, I wanted to
> preserve the performance which I got from S-XML before introduction
> of namespace handling.
It was a long time since I looked at the source code of S-XML and
David had a better view on it ;-)
Sven
--
Sven Van Caekenberghe - http://homepage.mac.com/svc
Beta Nine - software engineering - http://www.beta9.be
"Lisp isn't a language, it's a building material." - Alan Kay
More information about the s-xml-devel
mailing list