[asdf-devel] source file encoding

Mon Apr 9 00:37:10 UTC 2012

On Sun, Apr 8, 2012 at 15:28, Nikodemus Siivola
<nikodemus at random-state.net> wrote:
> On 8 April 2012 17:36, Faré <fahree at gmail.com> wrote:
>
>> I think requiring a few marginal hackers doing weird things
>> to specifiy :encoding :default is a small price to pay for everyone to be able to specify
>
> I disagree. Consider this:
>
> X has a system that used to be in, say, LATIN-9. He uses latin-9 at
> home, and everything works fine. His users either use it as well, or
> at least another single-byte encoding.
>
> ASDF is updated, and X's user reports breakage. Everything works fine
> for X, because he didn't update ASDF yet. So he updates ASDF, and X
> updates his system to specify :LATIN-9 (or :DEFAULT, or whatever).
>
> Now another of his users reports breakage, because /they/ didn't
> update ASDF yet -- and their ASDF doesn't support :ENCODING, so things
> break. They update ASDF, which in turn breaks another :LATIN-N system
> they were using.
>
> The potential cost is non-trivial, and I really don't pretend to know
> eg. how many Japanese hackers user non-UTF-encodings in their source.
>
> IMO encouraging people to add :encoding :utf-8 is much saner.
>
I agree that transition costs must be considered.

Let's examine the two scenarios,
where the default is :default vs where the default is :utf-8.

In both cases, crucial points follow:
(a) currently :encoding is NOT supported by ASDF.
(b) therefore, whenever anyone modifies his defsystem to use :encoding,
 his system will NOT be backward compatible anymore.
(c) we want to make most code as backward-compatible as possible.
(d) the application programmer controls what version of ASDF is installed,
 the library developer doesn't.

If the default is :utf-8 (my recommendation), then
* A few programmers of non-UTF-8 applications may hit a snag upgrading ASDF;
* then they can either use asdf-encodings or use :encoding :default.
* Their code is then not compatible with older ASDFs anymore, but
* as application programmers, they fully control which ASDF they use, and
* even if they need to support old CL implementations,
 ASDF still supports them (the exception being GCL, that looks quite dead).
* Meanwhile, library authors can already start migrating to UTF-8,
 and everyone who upgrades ASDF can reliably enjoy now
 the benefits of non-ASCII, while preserving backwards-compatibility.

If the default is :default (your recommendation), then
* library developers can't ensure their code use a predictable encoding;
* this makes any attempt to actually use of non-ASCII characters unreliable.
* Sure, they might be tempted to use :encoding :utf-8, but then
 their libraries will be gratuitously incompatible
 with anyone who hasn't upgraded his ASDF, which is a pain to users.
* thus, library developers can do nothing but wait for EVERYONE
 to be using a recent ASDF before they can do anything.
* Therefore, noone will enjoy any benefit of :encoding for a year,
 and when we do, it will cause massive backward incompatibility.

Admittedly, in either case, library developers
could use such conditional reading as in
  #+asdf-unicode #:asdf-unicode :encoding :utf-8
or
  #+asdf-unicode :encoding #:asdf-unicode :latin1
to make their libraries safer in a backwards-compatible way.

In both cases, library developers are encouraged to use UTF-8,
which already most people do, if only that tends to be the default these days
for users of SBCL and they send bug reports to library authors.

A default of :default allows a few odd non-UTF8 application developers
to continue using unportable hacks for a few months, while
forcing everyone else to wait to do anything and bringing massive
backwards-incompatibility of libraries in the end.

A default of :utf-8 forces these few odd non-UTF8 application developers
to do a documented step before they continue doing what they are doing,
at their own upgrade pace (they control when to upgrade ASDF);
they can then replace a lot of non-portable hacks with a portable :encoding.
Meanwhile, everyone starts enjoying reliable non-ASCII today.

I agree there's no solution that makes everyone happy.
I believe that a default of :utf-8 doesn't actually make anyone
terribly unhappy, and empowers everyone to make the changes they need,
without requiring for anyone to wait for other people to make changes
(except indeed for a few stray libraries).
And so my plans are unchanged for now
(but of course, please keep sending the complaints
if you think it's wrongheaded; it's still time to not do it).

NB: I'd especially like the opinion of people who actually
develop non-ASCII and non-UTF8 libraries or applications.

PS: I just made asdf-encodings much less dumb.
I added good support for: sbcl, clozure, clisp, ecl, cmu
I added dubious support for: abcl, allegro, scl, lispworks
I think these will remain 8-bit only: cormanlisp, gcl, genera, rmcl, xcl.
Precious little testing so far,
except that it doesn't break everything on SBCL.
Help welcome to test and expand it.
	ssh://common-lisp.net/project/asdf/git/asdf-encodings.git
	git://common-lisp.net/projects/asdf/asdf-encodings.git
        http://common-lisp.net/gitweb?p=projects/asdf/asdf-encodings.git

—♯ƒ • François-René ÐVB Rideau •Reflection&Cybernethics• http://fare.tunes.org
Every major horror of history was committed in the name
of an altruistic motive. — Ayn Rand