[slime-devel] CMUCL unicode strings breaks slime
Raymond Toy
toy.raymond at gmail.com
Sat Oct 2 16:30:08 UTC 2010
On 10/2/10 3:45 AM, Helmut Eller wrote:
> * Raymond Toy [2010-10-01 19:49] writes:
>
>> Oh, that's a problem. In the example, length is 3, but the string
>> actually has 4 code units, so read-sequence only reads 3 code units,
>> completely missing the last code unit.
>
> I think we have the following options:
>
> 1) Don't support code points beyond 16 bits. Clean and easy.
Yes. I only ever use codepoints outside the BMP when testing unicode.
But it is annoying that slime breaks.
>
> 2) Introduce variants of length and read-sequence that use the same
> notion of character as Emacs. Kinda messy and probably slow, but
> relatively easy.
I don't know slime internals, but wouldn't you only need a special
version of length and read-sequence for cmucl with unicode? The normal
length/read-sequence would be fine for everyone else.
>
> 3) Switch from character streams to binary streams so that we can use
> byte counts instead of character counts. This has several
> advantages:
> - surrogate pairs are no problem
> - don't need flexi-streams for Lispworks
Why does Lispworks need flexi-streams? Does this have to do with using
read-byte on character streams or read-char on binary streams?
> - it would be easier to switch encoding after connecting
> - read/write-sequence is probably faster on byte streams
> disadvantageous:
> - more consing, and Emacs's GC isn't that good
> - need a string-to/from-bytearray function for every backend
Doesn't every backend already have such a function? Of course, someone
has to hook that up, but at least it doesn't have to be written from
scratch.
> - breaks third party backends
Sounds like a show stopper to me.
Ray
More information about the slime-devel
mailing list