[armedbear-devel] Unicode, CHAR-UPCASE/CHAR-DOWNCASE and char-upcase.1/char-upcase.2

Sat Apr 3 23:00:16 UTC 2010

Ever since ABCL raised its CHAR-CODE-LIMIT from 256 to #x10000, 2
tests started failing: char-upcase.1 and char-upcase.2.

These 2 tests iterate through all integers between 0 and
CHAR-CODE-LIMIT. While doing so, they test for the property that
upcasing and downcasing returns the same character again
("round-tripping"). This property of characters is specified in
section 13.1.4.3
(http://www.lispworks.com/documentation/lw51/CLHS/Body/13_adc.htm)
"Characters with case". In short: characters with case are defined in
pairs; additional characters with case have to be defined in pairs
too.

The spec provides char-upcase and char-downcase to convert
characters-with-case to their 'other-case equivalent'.

However, in section 13.1.10, there seems to be an escape hatch:
"Documentation of implementation-defined scripts". A script is a
subtype of CHARACTER, nothing more nothing less. An
implementation-defined script gets to document the effect on
CHAR-UPCASE and CHAR-DOWNCASE.

Now, if I were to define our Unicode script to be every character
except those in the base set, char-upcase and char-downcase may have
different semantics, except for the standard characters. That way,
there's no need to have the round-tripping requirement apply to most
of unicode - as can't be expected, see latin-small-letter-dotless-i
for an example.

In the light above, is it really portable for the tests to assume all
characters must be round-tripped? I think it's not.

What are your opinions?

Bye,

Erik.