[ansi-test-devel] Unicode, CHAR-UPCASE/CHAR-DOWNCASE and char-upcase.1/char-upcase.2
Raymond Toy
toy.raymond at gmail.com
Wed Apr 14 16:32:35 UTC 2010
On 4/4/10 8:04 AM, Erik Huelsmann wrote:
> Hi Sam,
>
> On Sun, Apr 4, 2010 at 10:58 AM, Sam Steingold <sds at gnu.org> wrote:
>
>> On 4/3/10, Erik Huelsmann <ehuels at gmail.com> wrote:
>>
>>> However, in section 13.1.10, there seems to be an escape hatch:
>>> "Documentation of implementation-defined scripts". A script is a
>>> subtype of CHARACTER, nothing more nothing less. An
>>> implementation-defined script gets to document the effect on
>>> CHAR-UPCASE and CHAR-DOWNCASE.
>>>
>> I don't think this gives you a license to discard the round-tripping invariant.
>>
> I read the same section again and on second reading I think the
> section indeed does not allow that freedom.
>
FWIW, CMUCL fails these tests because char-upcase does whatever Unicode
says the uppercase character would be.
>
>>> there's no need to have the round-tripping requirement apply to most
>>> of unicode - as can't be expected, see latin-small-letter-dotless-i
>>> for an example.
>>>
>> why not make it its own upper case?
>> this is not exactly correct from the unicode pov, but, I think, it is
>> better that the alternative.
>> this round-tripping requirement is, i think, pretty important in symbol i/o.
>>
> I hadn't thought about the reader and printer behaviours regarding
> *readtable-case* and *print-case*. However, it would be logical by
> analogy that if a string doesn't get recoded in a round-trip, then the
> symbol name won't either.
>
This brings up another issue. CMUCL fails some symbol tests because
cmucl converts the string to Unicode NFC form before creating the symbol.
Ray
More information about the ansi-test-devel
mailing list