[slime-devel] CMUCL unicode strings breaks slime

Sat Oct 2 07:45:30 UTC 2010

* Raymond Toy [2010-10-01 19:49] writes:

> Oh, that's a problem.  In the example, length is 3, but the string
> actually has 4 code units, so read-sequence only reads 3 code units,
> completely missing the last code unit.

I think we have the following options:

1) Don't support code points beyond 16 bits.  Clean and easy.

2) Introduce variants of length and read-sequence that use the same
   notion of character as Emacs.  Kinda messy and probably slow, but
   relatively easy.

3) Switch from character streams to binary streams so that we can use
   byte counts instead of character counts.  This has several
   advantages:
    - surrogate pairs are no problem
    - don't need flexi-streams for Lispworks
    - it would be easier to switch encoding after connecting
    - read/write-sequence is probably faster on byte streams
   disadvantageous:
    - more consing, and Emacs's GC isn't that good
    - need a string-to/from-bytearray function for every backend
    - breaks third party backends

Helmut