[slime-devel] CMUCL unicode strings breaks slime

Raymond Toy toy.raymond at gmail.com
Wed Oct 6 12:27:50 UTC 2010


On 10/6/10 3:34 AM, Helmut Eller wrote:
> * Raymond Toy [2010-10-06 01:07] writes:
> 
>> On 10/2/10 3:45 AM, Helmut Eller wrote:
>>> * Raymond Toy [2010-10-01 19:49] writes:
>>>
>>>> Oh, that's a problem.  In the example, length is 3, but the string
>>>> actually has 4 code units, so read-sequence only reads 3 code units,
>>>> completely missing the last code unit.
>>>
>>> I think we have the following options:
>>
>> Do you have a preference for any of the options (besides option 1).  I'd
>> like to make this work, because it's really annoying when slime crashes.
>>  I usually remember not to do these things, but when an error is thrown
>> and slime brings up the debugger and displays the string on the
>> backtrace, slime crashes, just when I really needed to know what happened.
> 
> [Ideally the different Lisp implementations should have the same notion
> of "character".  That CMUCL thinks of characters as Unicode code units
> while SBCL uses code points is IMO and unfortunate development.  In
> Scheme (R6RS) they say that a Scheme character should correspond to one
> Unicode scalar value, which seems to be the ranges [0, #xD7FF] and
> [#xE000, #x10FFFF].  Java and .NET use code units.  It would not be the
> worst idea to adopt one standard; the earlier we do that the less it
> costs.]

It was a tradeoff between space usage (16-bit strings vs 32-bit
strings), compiler complexity (managing 8-bit and 32-bit strings) and
user complexity (base-strings vs strings).
> 
> For now option 2) is probably the simplest. 

Ok.  Can you give some hints on where to start looking at this?

> 
> In the long run, byte streams would be more flexible.  In theory we
> could use something like HTTP chunking, if it's worth the complexity.

If you ever start working on this approach, let me know and I'll try to
help out.

Ray





More information about the slime-devel mailing list