[asdf-devel] Make the CL syntax predictable

Mon Mar 17 23:06:37 UTC 2014

>> Agreed, and until the dust settles, "strict mode" cannot be the default.
>> I'd argue it should become the default eventually (i.e. next year).
>>
>
> I doubt that's even feasible, since the strict mode will have to cross
> systems.  It will be easier to have systems say "I know I behave
> according to the strict policy, so I will LOCALLY ask to be built and
> loaded strictly."
>
The problem is that strictness is a GLOBAL thing. Unless every single
library is strict, then any readtable pollution will put the whole
build at risk, and we're back to the tight readtable restrictions
(i.e. no changing a standard character, no two libraries changing the
same character). If I want my system to go beyond these restrictions,
I have to ensure that all other libraries are protected by default.

It's like changing the default encoding from :default to :utf-8: if I
change the implementation's current *external-format* to :latin1, and
all those utf-8 libraries are not marked :encoding :utf-8 and the
default is still :default, then there's going to be mojibake. Worse:
if I need to change the current encoding to EBCDIC because that's what
my program does, then all those libraries are going to not even
compile correctly. The only sane thing is if files get compiled in the
encoding they are intended to be compiled with, rather than "whatever
the user happens to be using right now".

So yes, we could *demand* that every single system should specify its
readtable, but that'd be *more* disruptive than providing a sane
default, and *more* error-prone if it goes unenforced.

> I think one reason you think that this change is feasible and I don't is
> that you have a stronger sense that there's a kind of "one
> library/system == one ASDF file."
>
No. I have a sense that it's trivial for those who depend on the
current behavior to fix their systems, just like it was trivial for
people using non-utf-8 encodings to specify :encoding :latin1 or
:encoding :shift-jis, etc. I understand that it's breaking
compatibility, which is a big migration matter even for a small
change; but that's for the sake of making the build much more robust
and of enabling things that were not possible before.

> In our bigger multi-component systems that are based on ASDF, I find
> that we typically end up with multiple ASDF systems, because we have
> subsystems that use different libraries, and want to specify their own
> dependencies.
>
> AFAIK you can't do this without introducing new top level systems, since
> modules cannot depend on libraries.
>
A few simple solutions:

1- tight knit systems can go in the same .asd file using
secondary/system syntax, and they'll all inherit the same *readtable*
from the .asd file.

2- otherwise, use named-readtables or cl-syntax and have each system
change the syntax in its first file. That's the sane thing to do, and
it's a two line change per user system, and a cleanup in whichever
system defines the new readtable.

I guess we could even automatically detect the offenders:

1- check that the *readtable* is one of the initial readtable or the
standard one, at the end of the build. If not, then the system has
leaked readtable state.

2- (harder) check that the initial readtable hasn't changed. That's
easier if we always use the standard read-only readtable initially,
but that is already a form of strictness that might break a lot of
clients — we shouldn't enable that by default until after we get a
clean cl-test-grid and give users a lot of notice.

> That's why, for example, we have systems that bleed readtable across
> their boundaries: the systems really aren't stand alone entities, but
> limitations on ASDF expressiveness, and the desire to get more
> modularity than the (necessarily in-line) :MODULE will permit, causes a
> proliferation of systems that are NOT libraries, and have meaning only
> in context.
>
> I'm not going to move ASDF towards breaking such systems.
>
These systems can be trivially fixed with a two line change, and this
will enable a world where every system, library or not, can freely
pick their favorite readtable and not be afraid of breaking other
things.

>> Unhappily, strict mode is a global flag: the question is "which
>> readtable is this system going to be read with?". The only reasonable
>> answer is: the readtable it was meant to be read with, which the
>> author knows, and should be the standard readtable by default, unless
>> explicitly overridden by the author. The backward-compatible (if it's
>> not backward, it's not compatible) is "whichever readtable was active
>> at the time", with sometimes comical consequences, especially when the
>> user was using a non-standard one at the REPL.
>
> I don't see that.
>
> If you know you aren't going to want to bleed readtable entries out of
> your library, and you don't want stuff creeping in, it seems to me
> eminently possible to mark your system as strict-mode wrt the readtable.
>
> Why is that impossible?
>
Nobody EVER wants "stuff creeping in", just like a program written in
utf-8 NEVER, EVER wants to be compiled in EBCDIC mode, or even latin1
for the matter.

Whatever readtable you designed your code to be read with, by
definition, you don't want your code to be read with another one. At
no point, does anyone ever want the current modified REPL readtable to
leak into the code he is compiling from the REPL. If I use EBCDIC, I
still want my libraries to compile using whatever encoding they were
designed to be read with. And I do want to be able to use EBCDIC or a
new readtable.

There are 700 libraries in Quicklisp, and not a single of them "wants
stuff creeping in"; yet if safety is an "opt in" feature, then every
single one you use needs to opt in safety so you may safely change the
readtable at the REPL and call anything that might load a system.

The entire readtable feature is crippled if the coding conventions
preclude any significant modification and make any concurrent use of
the same character an error.

I think the point you don't see is that the readtable *of the REPL* is
going to affect every single system being compiled, and there is no
way to opt out.

OK, I have two (separable or combinable) proposals that might provide
both enough enough hygiene to allow for radical readtable modification
while allowing for traditional unhygienic style of development.

My main constraint is as follows:

0- what readtable a system is compiled with must not depend on
anything but its declared dependencies, and in particular MUST NOT
EVER be affected by whichever readtable the user is currently using at
the REPL.

 enforcing the constraint that the readtable used to compile a system
only depends on its declared dependencies.

A compromise situation might be as follows:

1- ASDF maintains an asdf:*global-readtable*, which is the *readtable*
object at the time it was loaded.

2- This *global-readtable* is subject to the current restrictions:
  A- no modifying any standard character,
  B- no two dependencies assigning different meaning to the same
non-standard character.
  C- libraries need to document any change to the readtable
  D- free software libraries will register these changes on the page on cliki.

3- Unhappily, there is no cheap way to enforce these restrictions, but
that's no regression with respect to the current situation.

4- ASDF wraps any compile-op and load-source-op in this
asdf:*global-readtable*, but probably not load-op, to preserve
combine-fasl linking semantics.

5- Systems that want to do crazier things with the readtable that may
violate (2) must arrange to use their own private readtable, but can
otherwise do it safely. It is an error (unhappily not enforceable) to
modify the current readtable in these ways.

6A- ASDF binds *read-table* to the *global-readtable* at the start of
every system's compilation (and loading?), and around the entire
asdf:operate, leaving the *readtable* unchanged at the end.

This easily supports systems that "modify the current readtable data structure".

However, that doesn't systems that "bind *readtable* to a new value",
because the changes they make will shadow the changes that other
systems following this style make and depend on. To allow such, an
idiom, we must also do the following:

6B- ASDF binds *read-table* to a proper "entry readtable" at the start
of every system's compilation, and record an "exit readtable" at the
end of the system's loading.

7- maintain a partial order on these readtable objects, assuming that
each system's exit readtable supersedes the entry readtable. The least
readtable is the *global-readtable*. It's enough to store for each new
exit readtable, identified by the name of system that created it, the
set of its inferior readtables, as a list or eq-hash-table, or an
equal hash-table, with each readtable being identified by the name of
the system that created it.

8- before a system is compiled or loaded, compute the maximum
readtable of all the exit readtables of its dependencies. If this
maximum is unique, then it will be the entry readtable of the system.
If there is not a unique readtable that is more than all the other
ones, that's an error, and we refuse to load the system.

9- after a system is loaded, check its exit readtable, if it already
exists, check that this doesn't create a cycle or issue an error; it
it doesn't already exist, add it to the set of all known exit
readtables.

10- ASDF either
 A- binds the *read-table* to the *global-readtable* around the entire
asdf:operate, leaving the *readtable* unchanged at the end, or
 B- always side-effects the *read-table* to correspond to the exit
readtable of the loaded system, or
 C- operate does the binding around thing, but load-system does the
side-effect after it's done operate'ing.

Does that strike you as complex? Because it is. That's the price of
*safely* supporting this "systems can bind a new value to *readtable*"
style. Unhappily some of the constraints are not enforceable (2A and
2B), but that's the very same as now.

So my next question is: do you want to safely support these
conventions? Do your systems modify the current *readtable* structure,
or do they bind *readtable* to a new value?

PS: thank you for making me come up with better solutions. I care
about enough hygiene to use readtables safely, and I also care about
supporting legacy systems if possible.

—♯ƒ • François-René ÐVB Rideau •Reflection&Cybernethics• http://fare.tunes.org
An insult may sometimes adequately fit the person who is insulted.
However, it can only ever possibly tarnish but the person who insults.