[slime-devel] CMUCL unicode strings breaks slime

Sat Oct 2 16:30:08 UTC 2010

On 10/2/10 3:45 AM, Helmut Eller wrote:
> * Raymond Toy [2010-10-01 19:49] writes:
> 
>> Oh, that's a problem.  In the example, length is 3, but the string
>> actually has 4 code units, so read-sequence only reads 3 code units,
>> completely missing the last code unit.
> 
> I think we have the following options:
> 
> 1) Don't support code points beyond 16 bits.  Clean and easy.

Yes.  I only ever use codepoints outside the BMP when testing unicode.
But it is annoying that slime breaks.
> 
> 2) Introduce variants of length and read-sequence that use the same
>    notion of character as Emacs.  Kinda messy and probably slow, but
>    relatively easy.

I don't know slime internals, but wouldn't you only need a special
version of length and read-sequence for cmucl with unicode?  The normal
length/read-sequence would be fine for everyone else.
> 
> 3) Switch from character streams to binary streams so that we can use
>    byte counts instead of character counts.  This has several
>    advantages:
>     - surrogate pairs are no problem
>     - don't need flexi-streams for Lispworks

Why does Lispworks need flexi-streams?  Does this have to do with using
read-byte on character streams or read-char on binary streams?

>     - it would be easier to switch encoding after connecting
>     - read/write-sequence is probably faster on byte streams
>    disadvantageous:
>     - more consing, and Emacs's GC isn't that good
>     - need a string-to/from-bytearray function for every backend
Doesn't every backend already have such a function?  Of course, someone
has to hook that up, but at least it doesn't have to be written from
scratch.
>     - breaks third party backends
Sounds like a show stopper to me.

Ray