[ansi-test-devel] Unicode, CHAR-UPCASE/CHAR-DOWNCASE and char-upcase.1/char-upcase.2

Wed Apr 14 16:32:35 UTC 2010

On 4/4/10 8:04 AM, Erik Huelsmann wrote:
> Hi Sam,
>
> On Sun, Apr 4, 2010 at 10:58 AM, Sam Steingold <sds at gnu.org> wrote:
>   
>> On 4/3/10, Erik Huelsmann <ehuels at gmail.com> wrote:
>>     
>>>  However, in section 13.1.10, there seems to be an escape hatch:
>>>  "Documentation of implementation-defined scripts". A script is a
>>>  subtype of CHARACTER, nothing more nothing less. An
>>>  implementation-defined script gets to document the effect on
>>>  CHAR-UPCASE and CHAR-DOWNCASE.
>>>       
>> I don't think this gives you a license to discard the round-tripping invariant.
>>     
> I read the same section again and on second reading I think the
> section indeed does not allow that freedom.
>   

FWIW, CMUCL fails these tests because char-upcase does whatever Unicode
says the uppercase character would be.
>   
>>>  there's no need to have the round-tripping requirement apply to most
>>>  of unicode - as can't be expected, see latin-small-letter-dotless-i
>>>  for an example.
>>>       
>> why not make it its own upper case?
>> this is not exactly correct from the unicode pov, but, I think, it is
>> better that the alternative.
>> this round-tripping requirement is, i think, pretty important in symbol i/o.
>>     
> I hadn't thought about the reader and printer behaviours regarding
> *readtable-case* and *print-case*. However, it would be logical by
> analogy that if a string doesn't get recoded in a round-trip, then the
> symbol name won't either.
>   

This brings up another issue.  CMUCL fails some symbol tests because
cmucl converts the string to Unicode NFC form before creating the symbol. 

Ray