[slime-devel] CMUCL unicode strings breaks slime
Helmut Eller
heller at common-lisp.net
Fri Oct 1 10:35:25 UTC 2010
* Raymond Toy [2010-10-01 10:20] writes:
>> What is the length of *s* or (prin1-to-string *s*) now?
>> Should it be 3 not 4?
>
> Good question. The answer now is 4, not 3. There are 4 code units in
> the string, so that is the length. Length would be really slow if it
> had to scan the whole string looking for surrogate pairs and counting
> them as one instead of two.
>
> Is that the reason for the problem? Confusion between emacs and lisp on
> the length of the string? It does appear that the string only has 3
> characters, as displayed by emacs.
Very likely, Emacs uses something like utf-8 internally and counts code points
not code units (expect for line endings which is probably a different
issue).
> Doesn't acl have this problem too? It also uses 16-bit strings like
> cmucl.
Allegro has no lisp:codepoint function and (code-char #x10000)
returns nil. Similar situation in ABCL just that it returns #\null.
In Java, strings have a length method which returns code units and a
codePointCount method for the other use. Maybe CMUCL has something like
that and we should use it in SWANK.
Helmut
More information about the slime-devel
mailing list