[asdf-devel] source file encoding

Faré fahree at gmail.com
Sat Jan 29 21:02:55 UTC 2011


On 29 January 2011 15:42, Anton Vodonosov <avodonosov at yandex.ru> wrote:
> You are right! Now I remember, when I worked on that project several years ago
> I just opened asdf.lisp, found the compile-file call and introduced the
> *compile-file-external-format* there, and then passed the encoding via this variable.
>
Since it's for both compile-op and load-source-op, a better name is
required than compile-file-*. Maybe *cl-source-file-external-format* ?

> I am not undertaking the patch now, because the project I am working on will only be
> started on my development machine and my server, and I can use some easy workaround,
> e.g. most lisps accept default encoding as a command line argument.
>
Makes sense.

> Also, several notes, which may be useful later, when someone will implement the
> patch eventually.
>
> In 99.9% of cases it is enough to specify encoding for the whole system,
> not for separate files. Only in some extraordinary case the system author
> would chose to store source files in different encodings.
>
Yes, but as you note below, the decision is rightly done per-system,
rather than globally. Whoever writes the files is who knows their
encoding, not anyone else. ASDF2 follows the principle: he who knows
is he who specifies.

>> Also it might or might not be a good idea to store the external-format
>> in a slot of cl-source-file, and to have a proper :initform in it with
>> a valid default value to be used when upgrading ASDF.
>
> How the slots are populated from the defsystem expression?
>
> E.g. if I have
>
>   (:file "package" :enc :utf-8)
>
> will the :enc :utf-8 be passed as initargs to (make-instance 'cl-source-file)?
>
> Or for
>
> (defsystem :mysystem
>  :version "0.1.0"
>  :serial t
>  :enc :utf-8
>   ....
Yes, except that :external-format or :encoding is to be used instead
of :enc. New abbreviations are evil.

> Are these attributes passed to the component instantiation as initargs?
>
Yes they are.

>> The problem for you will be to reasonably support 11 implementations
>> existing implementations or so.
>
> Actually, not a big problem. We will just create a mapping from the encoding
> specifications allowed in .asd files to the encoding specification of the underlying compiler.
>
Not big, but painful to get right. At the very least, unsupported
implementations should keep the previous behavior rather than be
broken.

> (defun enc (enc)
>  (case enc
>    ((:utf8 :utf-8) #+:clisp 'charset:utf-8 #+:sbcl :utf8 #+ccl :utf-8 ....)
>    ((:cp1251 :cp-1251) #+:clisp 'charset:cp1251 #+:sbcl :cp1251 #+ccl :cp-1251) ...)
>    ...
> )
Ouch. I'd rather we leave only the bare minimum in asdf itself, i.e.
the default, utf-8 or whatever. Any such function, etc., should be
imported in some asdf extension that itself uses the default. All .asd
files should be using the default. Only lisp files can be customized.

> Would you accept a patch with support only 7-10 the most important encodings (all unicodes +
> several the most frequent single-byte encodings)?
>
Yes.

> If start improvements, IMHO enforcing UTF-8 is a good start and should be enough (the
> option 4 listed by Fare).
>
Yes, I think UTF-8 is the way to go, these days.

> If more is needed, a complete solution allowing per .asd encoding specification is better.
> We need to chose a good notation, that will allow reasonably simple implementation.
>
I suppose you mean that the .asd file specifies per-system default
encoding of lisp files. The .asd file itself will be loaded before the
encoding may be specified, so will always be loaded with the default,
which will presumably be UTF-8. People who don't like UTF-8 encoding
for extra characters should stick to US-ASCII. They can have whatever
other character sets in their Lisp files - just not the .asd file.

> It might be either Emacs comment in the first line
>  ;;; -*- coding: utf-8; -*-
>
> Or special lisp form:
>   (asdf:asd-file-encoding :utf-8)
>
> But interpretation of that form will require switching encoding of the lisp reader stream,
> which I believe will be problematic on some Lisps.
>
> Therefore it will require feeding
> the reader from our custom input stream implementation, like flexi-streams. And
> still it will be not good enough, because only ASDF will create that special
> stream for the .asd files, when you execute it from REPL/SLIME, the meaning
> of that expression is unclear.
>
I will NEVER commit that to asdf. That's lots of crazy non-portable
infrastructure for precious little gain.

> Another alternative, is naming conventions for .asd files: mysystem.utf-8.asd.
> It's simple to implement, and after some thinking, it seems better than the
> two suggestions above.
>
Still bad, requires asdf to know about all the potential encoding names. Crazy.

> But again, we should decide if the problem really exists and avoid solving problems
> that we don't have. I personally never use national characters in .asd files.
>
Let's make that compulsory. You want it otherwise, fork ASDF. The only
thing we need to do about it is document it.

[ François-René ÐVB Rideau | Reflection&Cybernethics | http://fare.tunes.org ]
Many people believe "in the name of the (nation, poor, god, nature...)"
like "simon says" justifies or damns political opinions when said or omitted.




More information about the asdf-devel mailing list