[slime-devel] CMUCL unicode strings breaks slime
Raymond Toy
toy.raymond at gmail.com
Fri Oct 1 10:20:37 UTC 2010
On 10/1/10 1:46 AM, Helmut Eller wrote:
> * Raymond Toy [2010-10-01 00:11] writes:
>
>> This has been happening for some time, and it's annoying enough that I
>> want to fix it. With CMUCL 20b and slime 2010-09-20, try the following:
>>
>> (defvar *s* (make-string 4))
>> *s*
>> (setf (lisp:codepoint *s* 0) #x10000))
>>
>> Upto now, everything is ok. Now print the string:
>>
>> *s*
>>
>> At this point, the string is displayed, with a rectangular box for the
>> codepoint #x10000 followed by two ^@ for the two null characters.
>> (Recall that unicode strings in cmucl are utf-16 strings, so the first
>> two elements of *s* are the surrogate pair for #x10000.)
>
> What is the length of *s* or (prin1-to-string *s*) now?
> Should it be 3 not 4?
Good question. The answer now is 4, not 3. There are 4 code units in
the string, so that is the length. Length would be really slow if it
had to scan the whole string looking for surrogate pairs and counting
them as one instead of two.
Is that the reason for the problem? Confusion between emacs and lisp on
the length of the string? It does appear that the string only has 3
characters, as displayed by emacs.
Doesn't acl have this problem too? It also uses 16-bit strings like cmucl.
Ray
More information about the slime-devel
mailing list