[slime-devel] CMUCL unicode strings breaks slime

Wed Oct 6 07:34:47 UTC 2010

* Raymond Toy [2010-10-06 01:07] writes:

> On 10/2/10 3:45 AM, Helmut Eller wrote:
>> * Raymond Toy [2010-10-01 19:49] writes:
>> 
>>> Oh, that's a problem.  In the example, length is 3, but the string
>>> actually has 4 code units, so read-sequence only reads 3 code units,
>>> completely missing the last code unit.
>> 
>> I think we have the following options:
>
> Do you have a preference for any of the options (besides option 1).  I'd
> like to make this work, because it's really annoying when slime crashes.
>  I usually remember not to do these things, but when an error is thrown
> and slime brings up the debugger and displays the string on the
> backtrace, slime crashes, just when I really needed to know what happened.

[Ideally the different Lisp implementations should have the same notion
of "character".  That CMUCL thinks of characters as Unicode code units
while SBCL uses code points is IMO and unfortunate development.  In
Scheme (R6RS) they say that a Scheme character should correspond to one
Unicode scalar value, which seems to be the ranges [0, #xD7FF] and
[#xE000, #x10FFFF].  Java and .NET use code units.  It would not be the
worst idea to adopt one standard; the earlier we do that the less it
costs.]

For now option 2) is probably the simplest.  

In the long run, byte streams would be more flexible.  In theory we
could use something like HTTP chunking, if it's worth the complexity.

Helmut