[Ecls-list] Re: base-string patch

Fri May 19 03:47:12 UTC 2006

> I got your patches. They were too big for the mailing list but they
> arrived my inbox. They seem pretty innocent and perhaps it is a good
> idea to have a base_string lisp type. I will be testing them in the
> weekend.

Well, the key point is that what you are calling 'string' is really
'base-string'.
So there's no harm in calling it that internally -- with unicode
turned off, we can have 'string' map to 'base-string' without cost.

> However there are some design ideas I would like to discuss. I will
> write you an email later, or maybe during the weekend -- after I read
> some Unicode manuals --. My feeling is that one should still encapsulate
> a minimal facility on the ecl_* macros or functions, which would provide
> an abstraction layer on top of your script dispatch mechanism and should
> work similarly for Unicode-aware and bare-bones ECL.

I don't think that you can usefully abstract on top of this mechanism,
but this is what I'd suggest.

a) The core contains the current ecl_* operators which do not care
about scripts, and just decide that characters out of the range of
base-char on some arbitrary basis. (Ie, a non-switching base-char
script). -- This imposes zero cost.

b) The script module replaces the standard operators when loaded with
an identical set which do operator replacement by script selection
when they receive an extended-char. -- This imposes a small code cost,
and a small run-time cost for people who use base-chars only, but they
shouldn't be loading the script module, anyhow.

c) Individual modules load in additional and replacement operators and
tell the script registry about them. -- This imposes a cost on the
user depending on the modules they load. You might have a module for
chinese-simplification, which provides a simple-char operator, and a
different module for japanese on/kun reading maps. The user would then
decide which modules they need to load to do their work. An example of
a more general, non-CL operator might be char-titlecase, and you could
supply a titlecase module which handles all of unicode fairly easily,
since there aren't many such conversions.

Eg.

(define-script-method char-upcase base-char (char)
  (let ((code (char-code char)))
    (if (<= 97 code 122)
       (code-char (- code 32))
       ; otherwise
       (script:dispatch-operator 'char-upcase char)))

dispatch-operator would then do something like
 (setf (symbol-function 'script:char-upcase) found-operator)
 (funcall found-operator char)
for this case.

> With this macrology, for instance, you would not need to care about
> writing p->base_string.self[i++] = c or p->string.self[i++] = c, but
> rather use ecl_setf_schar(p, c). Besides, things like ecl_toupper(c) can
> be a function that provides a simple output for (c < 127), as expected
> for ASCII, and otherwise dispatches to a script function.

I'd prefer to avoid an extra dispatch on each character access.

With the above operator replacement scheme, we can provide defaults
which dispatch like that, while allowing additional specialization --
ie, a default string-upcase which calls char-upcase on each character
vs. a specialized string-upcase which dispatches once on the string
type and then does whatever, directly.

Regards,
Brian.