[armedbear-devel] Java Collections, ABCL, and you
Alessio Stalla
alessiostalla at gmail.com
Fri Feb 3 21:54:23 UTC 2012
Hello list,
I'd like to talk a bit about our Java Collections API integration, and
especially a topic that has been lumbering in my mind for quite a lot
of time and which has been raised today by Rudi Schlatte on #abcl.
Warning: the mail turned out to be quite long.
As you probably know, some time ago I added to ABCL an implementation
of Christophe Rhodes' extensible sequences proposal[1], until then
only implemented in SBCL (as far as I know). It's not loaded by
default, but you can get it with (require :extensible-sequences). Some
time later I used that API to integrate ABCL with the Java Collections
framework, so that (certain) Java collections are Lisp SEQUENCEs.
Again, to get it you need to (require :java-collections).
However, there are, I think, two interesting limitations in the ext.
seq. protocol, which I'm going to discuss.
Since it was designed as an extension to CL's sequences, the protocol
builds directly upon the concept of CL sequences (lists and vectors) -
and in particular it assumes sequences are ordered (accessible by
index) and mutable. The only basic operations one must absolutely
define to provide a custom sequence implementation are ELT, (SETF
ELT), and LENGTH; anything else is defined by default upon those
operations, albeit inefficiently.
That concept of sequences roughly matches that of Java Lists (except
the fact that mutability of Lists is optional). List is a subtype of
Collection which implies ordering and access by index. Collection is
basically an Iterable thing with a length and optional mutability (in
the form of adding elements to the end and removing elements
destructively).
The concept of a Collection or better an Iterable is missing from the
extensible sequences protocol, although it provides an iterator
sub-protocol. This makes it unsuitable for those kinds of sequences
that do not fully respect CL's definition of sequence (e.g. because
they are not mutable of because they can only be iterated once and not
accessed by index, like a stream), but that would nevertheless benefit
from adhering at least in part to the sequence protocol (because then
you might LOOP or, shameless plug, DO+ over them, for example). That
would also allow to expose arbitrary Java Collections, and not just
Lists, as Lisp sequences - including thus Sets and various specialized
Collection types.
So the first point is: I'd like to extend the protocol with the
concept of an Iterable. How? Ideally, Iterable would be a more basic
concept than Sequence, but we cannot modify the class hierarchy of
SEQUENCE because it's defined by the standard. So, either ITERABLE
must be completely disjoint from SEQUENCE, or it must be a subtype of
it. Of the two, I favor the latter, because many sequence operations
are supposed to signal errors in case their argument is not a proper
sequence, and deviating from that would probably constitute a
violation of the standard. So I would define an Iterable as a special
SEQUENCE that is defined in terms of an iterator; by default ELT,
(SETF ELT) and LENGTH create a new iterator and, very inefficiently,
use it to walk the Iterable and do their work. Of course specialized,
more efficient implementations are possible. Java Collections would be
Iterables, and Java Lists would retain their specialized
implementation of ELT and (SETF ELT) which do not necessarily walk the
list each time (depending on the type of list).
The second idea is that I think it would be beneficial to explicitly
state that mutability of sequences is optional. The CL standard
doesn't say anything about it (or so it seems to me), although you can
clearly see that in certain passages it assumes that list and vector
(which are both mutable) are the only subtypes of sequence, whereas
list and vector are explicitly said not to constitute an exhaustive
partition of the type sequence[2]. Currently a number of sequence
functions in the protocol implementation assume mutability, for
example SUBSEQ by default creates a copy of the same type and sets
elements one by one using an iterator. However, to eliminate the
mutability requirement, at least another piece is missing: a
generalization of CONS, i.e. an operation that appends a single
element to the beginning of a sequence nondestructively. This is
roughly what Clojure calls conj (except that conj does not specify
where the element is added, which is a very awkward decision to me)
and could be defined as (concatenate (class-of sequence) (list item)
sequence)... if only concatenate wasn't implemented in terms of (SETF
ELT) :P
The possible negative sides of such a decision are: 1. it could in the
end be an aspect of non-conformity to the ANSI standard (although only
relevant to non-standard sequence subtypes) 2. it would make the
default implementation of certain sequence operations even more
inefficient for custom mutable sequences (that could be mitigated by
checking for a marker mixin class MUTABLE-NONSTANDARD-SEQUENCE) 3. it
would make ABCL's implementation of sequences a bit more complex.
Note also that lifting mutability would NOT benefit Java collections
because, inconsistently, they provide no general means to extend a
collection without mutating it.
What do you think?
Cheers,
Alessio
[1] doc.gold.ac.uk/~mas01cr/papers/ilc2007/sequences-20070301.pd
[2] http://www.lispworks.com/documentation/HyperSpec/Body/t_seq.htm
More information about the armedbear-devel
mailing list