[asdf-devel] source file encoding

Faré fahree at gmail.com
Mon Apr 9 22:05:55 UTC 2012


On Mon, Apr 9, 2012 at 11:37, Douglas Crosher <dtc-asdf at scieneer.com> wrote:
> Won't library authors need to wait until their user base has upgraded ASDF
> before they can start migrating to UTF-8?
>
No. Library authors have *already* largely adopted UTF-8.
See previous analysis by Orivej Desh:
	"I did a ckeck of quicklisp systems.
        There are 263 lisp files in 107 systems which assume non-ASCII,
        and only 31 of them in 20 systems assume non-UTF-8"
That's out of 700 libraries in Quicklisp.
Only 9 have been found to be an actual problem, and two are fixed already.
	https://github.com/orivej/asdf-encodings/wiki/Tracking-non-UTF-8-lisp-files-in-Quicklisp

The only issue is to make the results *reliable*
for these systems that depend on UTF-8.

> I do see a concern that if developers are required to change their definitions
> to add :encoding :default then they will be forced to also make sure their user
> base has upgraded now.  Further if their users do upgrade ASDF then it breaks
> again - there is no migration path for them.
>
Yes. No one in their right mind would use :encoding :default for a library.
Each author knows what encoding he uses, say :latin1, :koi8-r, :mac-roman
or :euc-jp, and would specify just that, not :default.

I was thinking of :default
1- because I hadn't written asdf-encodings yet, and
 needed *some* way to support those setups
2- for full backwards compatibility:
 "if it's not backwards, it's not compatible"

> Perhaps the difference is that portable UTF-8 source is new source and requires
> an upgrade of ASDF anyway, whereas making the default :utf-8 forces :encoding :default
> on current users and affects legacy code that is already written without a migration
> path.
>
UTF-8 is not just for new source. It doesn't require an upgrade of ASDF.
There is plenty of UTF-8 source already, though mostly for comments
(but not only for comments: see e.g. λ-reader).
All modern implementations support UTF-8, though not always as the default.
Let's just make it a reliable default so we can WORM (write once run
everywhere).
And the migration path is clear:
	recode l1..u8 foo.lisp

>> * thus, library developers can do nothing but wait for EVERYONE
>>  to be using a recent ASDF before they can do anything.
>
> Wouldn't this be the reality for portable libraries no matter which default is chosen?
>
Whatever the default encoding is,
libraries can't use :encoding until all their users use a recent ASDF.
But if :utf-8 becomes the default and they use it,
they can already enjoy the benefits of deterministic encoding,
and tell users who have encoding issues "just upgrade your ASDF".

>> * Therefore, noone will enjoy any benefit of :encoding for a year,
>>  and when we do, it will cause massive backward incompatibility.
>
> I don't appreciate the 'massive backward incompatibility' so perhaps do not understand
> your perspective?  I see that future projects using UTF-8 source would need to declare
> this in the system definition, but this would not seem to qualify.
>
If the default is :default and you want to enjoy reliable utf-8,
then you'll need to specify :encoding :utf-8, at which point
your library ceases to be compatible with users who haven't upgraded ASDF.
I call that massive backward incompatibility.

If the default is :utf-8 and your library has a latin1 character,
you use recode, and your new code still works on old ASDFs as well as new ones.
That's massive backward compatibility.

> Choosing :default would seem to cause the least backward incompatibility as this
> is the current behaviour, and offers a migration path to get ASDF upgrades in place.
>
It's compatible for now, but setting us up for massive incompatibility later.


>> Admittedly, in either case, library developers
>> could use such conditional reading as in
>>   #+asdf-unicode #:asdf-unicode :encoding :utf-8
>> or
>>   #+asdf-unicode :encoding #:asdf-unicode :latin1
>> to make their libraries safer in a backwards-compatible way.
>
> It would be great if some suggestions like this could be offered to ease the transition.
>
I inserted this suggestion in the ASDF documentation.
I can't retroactively modify old ASDF installations
to point people at precisely the paragraph they need to consult in the docs
when they upgrade and things break for them,
but I trust that Google will help them.

> Most portable libraries are ASCII, and there would be some benefit in libraries
> needing UTF-8 support to declare this in the system definition.
>
ASCII libraries will work everywhere anyway whatever we do about the default.
That is, until some maniac writes a Lisp using EBCDIC;
and still making UTF-8 the default will ensure he can still
just download source from the net and use it
without having to transcode it for his implementation.
Of course, a lot of code that assumes ASCII or ASCII-like continuity
of letter ranges with fail, but that's a given if he uses EBCDIC.

> There may be a concern that their users would have to upgrade ASDF now.
>
No. Making :utf-8 the default means no one needs to upgrade ASDF now,
but a few people may have to upgrade a few libraries when they upgrade ASDF.

Making :default the default and forcing people to use :encoding :utf-8
to enjoy any reliability means people who use libraries that want to
be reliable will be forced to upgrade ASDF.

> How can everyone enjoy reliable non-ASCII today,
> without the user base having upgraded ASDF?
>
Mostly, they can setup their system defaults to UTF-8
and enjoy most Lisp code already on most implementations.
When they stray from this default setup I want to formalize,
nothing works reliably today.

—♯ƒ • François-René ÐVB Rideau •Reflection&Cybernethics• http://fare.tunes.org
Merely having an open mind is nothing; the object of opening the mind,
as of opening the mouth, is to shut it again on something solid.
	— G.K. Chesterton




More information about the asdf-devel mailing list