[vivace-graph-devel] Recent Babel changes - discussion from github

MON KEY monkey at sandpframing.com
Sun Sep 11 20:00:54 UTC 2011


vivace-graph-v2 + UTF-8

Per kraison's recent inability to build Unicly with SBCL on MacOS and Lispworks
it may be worth considering now how vivace-graph-v2 will handle similar issues.

At issue is whether vivace-graph-v2 should constrain all string data to be
contained of characters encoded as UTF-8.

I would think that this is a reasonable contraint to apply given that
vivace-graph-v2 is intent on targeting RDF-aware and/or RDF-like applications
where UTF-8 has guaranteed ubiquity.

Indeed, for my part I have a direct and standing need to represent triple
subjects and objects as string data in character sets with encodings that extend
beyond ASCII and LATIN-1 and will consider it a deal breaker if vivace-graph-v2
is unable to reliably handle UTF-8.

Regardless, to the extent by which it is deemed desirable for vivace-graph-v2 to
enforce UTF-8 constraints around the string data it manipulates it is worth
considering how the current system might reliably and reasonably enforce such a
constraint should the underlying system prove incapable of internally handling
UTF-8 character encodings.

As it stands now, a method `serialize' in vivace-graph/serialize.lisp relies on
`babel:string-to-octets' and two `deserialize' methods in
vivace-graph-v2/deserialize.lisp reliy on `babel:octets-to-string'.

Currently vivace-graph-v2 has a dependency on the Babel system for converting
strings to/from octets via Babel functions `babel:octets-to-string' and
`babel:string-to-octets' which each default their :ENCODING keyowrd argument to
value `babel-encodings:*default-character-encoding*' which itself defaults to
:UTF-8.  IOW, unless explicitly specified otherwise both
`babel:octets-to-string' and `babel:string-to-octets' will default all
string/octet conversions to :UTF-8 and error in the event that the defaulting
behaviour is not supported by the underlying lisp implementation as per their
defaulting keyword forms: (errorp (not *suppress-character-coding-errors*))

In any event, there may be some potential for vivace-graph-v2's
serialization/deserialization routines to fail at inoportune moments given the
following from the file header of babel/src/strings.lisp

,----
| The usefulness of this string/octets interface of Babel's is very
| limited on Lisps with 8-bit characters which will in effect only
| support the latin-1 subset of Unicode.  That is, all encodings are
| supported but we can only store the first 256 code points in Lisp
| strings.  Support for using other 8-bit encodings for strings on
| these Lisps could be added with an extra encoding/decoding step.
| Supporting other encodings with larger code units would be silly
| (it would break expectations about common string operations) and
| better done with something like Closure's runes.
`----

The Closure system has a direct dependency on both Babel and an indirect
dependency on Flexi-Streams via Closure-html. Which is to say, there is no
reason why either the Babel or Flexi-Streams systems should be prefered over the
other in-so-muchas both are likely to remain dependencies of the vivace-graph-v2
system.

As mentioned already, my personal preference w/r/t to UTF-8 and
character-encoding/character
conversion interop is for the Flexi-Streams system and not the Babel
system. This preference (mostly trivial) is by no means a knock on the Babel
system, and mostly amounts to my belief that Flexi has more transparent argument
signatures with equivalent SBCL procedures.

vivace-graph-v2 currently already has an indirect dependency via its dependency
on the hunchentoot system (albeit currently un-needed) in turn Hunchentoot has a
dependency on flexi-streams.

In the event that vivace-graph-v2 should ever incorporate a direct mechanism for
frobbing RDF data such mechanism is very likely to necessitate a dependency on
the CXML system which currently has a pre-existing dependency on Closure-Html
system which in turn currently has a dependency on the Flexi-Streams system.




More information about the vivace-graph-devel mailing list