[asdf-devel] Releasing asdf 3.1.1 ?

Fri Jan 3 15:22:35 UTC 2014

Faré wrote:
> On Thu, Jan 2, 2014 at 11:44 PM, Robert P. Goldman <rpgoldman at sift.info> wrote:
>> While there are bug fixes waiting to reach our users, I'm quite
>> concerned by the loss of backwards compatibility in systems that defined
>> their own OPERATION subclasses.
>>
> This backward incompatibility already happened a year ago:
> it was consubstantial with the refactoring that begat ASDF3,
> and necessary to fix the broken dependency model of ASDF2.
> NO, most operation MUST NOT propagate downward, or even sideways;
> indeed some like prepare-op must propagate upward instead.
> The downward and sideways propagations were baked into TRAVERSE;
> that was one of the deep conceptual bugs in the ASDF model.
> Now they are configurable via the (ill-named) COMPONENT-DEPENDS-ON;
> and the previous behavior is mere inheritance specification away,
> with the DOWNWARD-OPERATION and SIDEWAYS-OPERATION mixins.

I get this, and the recoding is a HUGE improvement.  Furthermore, you
have convinced me that we can't unwind this change in any way that won't
create yet more damage.

However:
a.  This change is not backward-compatible
b.  It's not announced to maintainers of ASDF systems in a way that gets
it done and
c.  The bugs it gives rise to are subtle and confusing.  There's
virtually no way a person whose system suddenly stops performing the
expected operations is going to say "I bet I need to change the
superclasses of my specialized OPERATION subclasses."

Unfortunately, because we reused the old name, OPERATION, it is *very*
difficult to trap this in a helpful way.  Ideally, I would suggest doing
something like raising some sort of signal when the user makes a new
OPERATION subclass to suggest that s/he might want DOWNWARD- or
UPWARD-OPERATION instead.  But I don't see that alley as open to us.

So what can we do for the poor programmer whose system suddenly starts
to perform only a subset of the behaviors it did in the past?  Expecting
that programmer to grovel over the ASDF source and figure out what went
wrong is too much.

I'm relatively familiar with ASDF's guts, and an update to Allegro
(getting ASDF 3 in place of 2) still cost my company about four
programmer days, and was only resolved because I thought to IM you about
my problems.
> 
> (The others deep conceptual bugs in the ASDF model were
> the lack of transitive timestamp checking, and
> the mess of DEPENDS-ON vs DO-FIRST dependencies.
> These were loosely related deep bugs.
> Then there were shallower bugs like the IF-COMPONENT-FAILS horror,
> the inconsistency between system :depends-on and other :depends-on,
> and probably more small bugs I can't remember.)
> 
>> As far as I can tell, *all* such systems will break, since the old
>> solution was to subclass OPERATION and the new solution is to subclass
>> DOWNWARD-OPERATION to achieve the same results.  *ALL* programmers'
>> locally-defined operation subclasses are now broken by this change.
>>
> We broke those that depended on downward and/or sideways propagation
> (don't forget sideways-operation). But we *FIXED* all those
> who didn't want it, and had to deal with it anyway!

I get it, but breaking bug-compatibility is still breaking
compatibility.  For people whose code created new OPERATION subclasses,
and whose code *worked*, we have broken their code, and done so in a way
that quietly does garbage-in-garbage-out, instead of indicating an error.

[...snip...]

>>
> Did you see the previous email where I audited the damage (and the fixes)?
> 	http://thread.gmane.org/gmane.lisp.asdf.devel/3581/focus=3582
> Only 4-5 victims (depending on how you count),
> I fixed one (dependency-op),
> two (clean-op and revert-op) were never really working,
> and one (parenscript-compile-op) indeed needs love,
> but the fix is trivial, and no one complained so far.

You only audited publicly available libraries.  Applications, like ours,
which perform tasks of no interest to the general world, but critical to
us, are invisible to you.

At any rate, an upgrade to the build system that requires the build
system's author to review all the available community libraries -- not
to mention libraries that are not shared with the community! -- is
simply not an approach that will scale.

We need a patch to ASDF that will indicate to programmers in a clear way
that the behavior of ASDF has changed, and give them a hint about how to
fix this change.

Ideally, we would be able to fix this socially, rather than with code:
we would simply shout off the rooftops that the OPERATION classes had
changed, indicate the needed revisions, and all would be well.

Unfortunately, we do not have the capability to shout off the rooftops
in this way:  ASDF quietly slips out into the community mediated by the
implementation suppliers.

I suppose one could introduce an ASDF banner that would print once, as a
means of shouting off the rooftops.

I'm open to alternative suggestions; this one does not seem excellent.

One possibility might be adopting Allegro's style of incompatible
update.  When they changed the behavior of the reader for consecutive
reader macros being able to make multiple s-expressions skippable (like
#-allegro #-allegro :foo :bar), they introduced a new variable,
something like

EXCL:*READER-MACRO-COMPATIBILITY*

If this had a value like :OLD, it would raise a continuable error when
it encountered one of these consecutive reader, explaining the change in
behavior.  Once you knew about the behavior, you would set (or bind)
this variable, and the warning would go away.

So people who made new OPERATION subclasses would get a warning until
they introduced a form into their system definition files indicating
that they knew about the change.

That would let us explain the situation to programmers, and shut up when
the programmers understand the situation.

> 
> Meanwhile 5-10 extensions were fixed by this change,
> and 4-5 were unaffected (or, for 1-2 of them, fixed already).
> 
> 
>> A somewhat drastic solution would be to make the name OPERATION now
>> denote DOWNWARD-OPERATION (which would remain as a canonical name), and
>> rename the common superclass of DOWNWARD-OPERATION and UPWARD-OPERATION
>> to something like ABSTRACT-OPERATION or COMMON-OPERATION.
>>
> No, no, no. That would be really bad.
> The real solution is to fix the handful of broken extensions.
> 
>> The current refactoring is quite problematic, since it moves some of the
>> previously-existing characteristics of OPERATION out and into a
>> sub-class that no one has ever heard of before.
>>
> That's what ASDF 3 has been doing for a year, and
> if no one has complained about those handful of broken systems,
> it's probably easier to fix them than to rebreak
> all the MANY MORE extensions that were either fixed or unaffected,
> and would now need to be fixed instead.
> The minimal change is to keep fixing things,
> not to revert to ASDF2 brokenness.

I agree that it's easier to fix such libraries -- after all, it's
typically just replacing

OPERATION

with

#-asdf3 OPERATION  #+asdf DOWNWARD-OPERATION

To me the key issue (to reiterate) is how do we find the people who need
to do this, and make it known to them that they need to make this change?

> 
>> Unfortunately, the above solution is not ready for prime-time, either,
>> since if we add COMMON-OPERATION, all programmers' methods that dispatch
>> on OPERATION will break if used with PREPARE-OP.  On the one hand,
>> that's probably not a big deal, since no one will have been customizing
>> UPWARD-OPERATIONs, since they haven't existed.  On the other hand,
>> programmers who want to write extensions that really are generic to
>> *all* types of operation (e.g., EXPLAIN type methods) would be broken by
>> this proposed repair.
>>
> Yes, many extensions rely on OPERATION being the top of the hierarchy,
> and you don't want to break all of them. That includes
> POIU, ITA's now published QUUX through its qres-build system, and
> at least 6 quicklisp systems that I can easily find grepping through
> ~/quicklisp/dist/quicklisp/software/**/*asd*.
> Several of these defmethods are probably obsolete,
> since most extensions shouldn't specialize operation-done-p anymore.
> 
> Really, the current ASDF3 architecture is much improved over ASDF2,
> though indeed it there are still active issues.
> 
> (This reminds me how deferred warnings broke 50-odd systems in quicklisp,
> out of which only about 25 were fixed, and
> 25 had unresponsive authors, even a year afterwards.
> In the end, I had the deferred warnings disabled by default.
> Good luck if you want to enable the feature at long last,
> either by getting everything fixed, or
> allowing out-of-band disabling of the warnings.)
> 
>> This problem also exposes a HUGE hole in our regression-testing methods:
>> we have nothing that tests extensions to the ASDF protocol.
>>
> I disagree. ASDF is defined in an incremental way, and
> all the code in ASDF itself is "extensions" to the protocol
> as defined by previous pieces.
> Consider asdf-bundle and concatenate-source, if nothing else.
> It is a testament to the overall good design of the original ASDF
> and its CLOS based architecture that the code is so clean and small,
> compared to other code that does equivalent things much worse
> (have you looked at mk-defsystem? Ugh!
> And let's not discuss some horrors in C or Java).
> I salute Dan Barlow, who did a lot of experimentation, and
> whose bigger success overshadows his smaller failures.

[...snip...]

> 
> Now, we can always add more tests to the ASDF test suite.

Yes, this is one of the things I would most like to see happen.

[...snip...]

>> Finally, as the responsible party now, I'm not comfortable sending out
>> another release until I have come to understand the new protocol better
>> than I do now.  Indeed, it was culpably negligent of me to release the
>> last couple of versions, and I apologize for doing so.
>>
> I understand your concern. On the other hand, consider that
> 
>  * This particular change has been here for a year with no bug report,
>    despite indeed breaking a handful of extensions that no one uses.
>    Only one known useful piece of code remains broken.

At least for us, the reason that there has been no bug report is that
Allegro pushed an intermediate version of ASDF that had a broken
EXCL:RUN-SHELL-COMMAND.  When we figured this out, we ripped out that
patch, so all of our production code has been running on ASDF2 until
this past week.

I don't think silence in this case can be taken to indicate the absence
of bugs.  There can be a huge lag between ASDF releases and ASDF
penetration into the user community.
> 
>  * Your standard should probably not be perfection,
>    but improvement and non-regression,
>    and I believe the current release candidate meets it.
>    The failures that we are experiencing are either news tests, or
>    new platforms that were previously untested or non-supported;
>    meanwhile plenty of bugs have been fixed, with many tests added,
>    and new functionality is at hand.

I'm willing to see less than perfection, but a key desideratum for me is
"fail loudly and obviously," instead of "quietly and confusingly."  I
want the quiet and confusing OPERATION failures to be moved to being
understandable before the next release.

While I see the advantages of getting bug fixes out there, I don't
believe that we get that many opportunities to achieve uptake through
the implementations (with the possible exception of SBCL).  So I want
the most improvement/release possible, and if there's something that
seems critical to me, I'm disinclined to let it slip past a release.

> 
>  * Throughout all the history of ASDF1 and ASDF2,
>    all authors and maintainers including danb and including me
>    obviously didn't have a clear understanding of the old protocol,
>    since it was so fundamentally buggy. ASDF3 fixed the protocol.
>    Your lack of understanding of the new protocol is not worse than
>    the previous maintainer's lack of understanding of the old protocol.
> 
> Of course, you're in charge now, and
> may validly decide that it's a blocking issue.

There was at least a time when I understood the *actual* (as opposed to
the desirable) protocol.  This has slipped away.  I'd like it to come
back before I put my name behind another release.

I definitely welcome your offer of a walkthrough.  I think that will be
hugely helpful.

Thanks for all of your work,
R