[Ecls-list] Changes in sequences

Wed May 26 05:07:35 UTC 2010

On Tue, May 25, 2010 at 4:44 PM, Juan Jose Garcia-Ripoll
<juanjose.garciaripoll at googlemail.com> wrote:
> On Mon, May 24, 2010 at 8:25 PM, Gabriel Dos Reis
> <gdr at integrable-solutions.net> wrote:
>>
>> I'm ignorant of what ECL's collector works.  I have so far assumed that
>> it just reuses the defaul H-W collector; but apparently, that is not the
>> case?
>
> Well, the Boehm-Weiser garbage collector is a pretty flexible library. It
> includes a lot of features:
> * Precise collection
> * Generational garbage collection with write barriers
> * Parallel garbage collection shared among threads
> * Macros for thread-local allocation
> * Macros for inlining allocation without locks
> * Facilities to extend the mark phase
> * Finalization

Many thanks for the summary.

> ...
> the list is huge and most of it is not well documented. Making an optimal
> use requires a man-month of continuous investigation -- but only _after_ all
> other performance sinks have been eliminated, for we may run the risk of
> premature optimization choosing the wrong strategy.

Agreed.

>
>>
>> I understand that second point.  Even if the client's algorithms are
>> suboptimal (in terms of theoretical complexity), should we still expect
>> order of magnitude between runs with ECL and other Lisps on the same
>> algorithm implementation?
>
> I am not sure I expressed myself properly. What I meant is that the user
> code may be making assumptions about the implementation and it strengths
> leading indeed to significant performance differences.

OK, thanks for clarifying.

> The first reason is that you might be exercising functions that are / were
> also suboptimal in ECL itself. For instance using too much DELETE vs. other
> loops, or using DELETE-DUPLICATES. In particular those functions were
> operating on sequences using ELT. That means DELETE and friends could move
> from O(n) to O(n^2) which for a list with 10 elements is a 10x in time.

The OpenAxiom compiler and interpreter are heavy users of those functions.

> Indeed the times for runnings the ANSI tests with sequences went from 1
> second per file to about 0.09 seconds. Not that it matters much because the
> whole test suite is dominated by other things, such as the compiler tests,
> but it gives you an idea of the impact of simple things on the overall
> performance.
> Sometimes it is just a simpler things --gentemp I recall, was a huge
> bottleneck because of using FORMAT-- that make destroy the user's experience
> and cause the impression that the whole implementation is a disaster. Here I
> am probably to be blamed because I do not have the time or resources to
> profile all those bottlenecks, but they are easy to find with time and
> patience.

Indeed, OpenAxiom uses FORMAT a lot and is a heavy user of string
concatenation functions, substring operations, and symbol interning.

> The second reason why things can get pretty bad performancewise you have
> already experienced: the compiler. Here again stupid design choices, such as
> the algorithms that decide when to unbox or not, or the expectation from
> user code that type inference is there and works, thus removing all explicit
> type declarations, or even an improper use of declarations (we had this
> problem in pprint.lsp, where automatic type checking of arguments was
> stupidly activated on all functions), may decrease performance by a factor
> 10 or more just because of introducing boxed values, additional function
> calls, excesive branching, ...

OK.

> But again there is no fundamental limitation. The proof is that the sequence
> functions that I implemented run *faster* in ECL with appropriate type
> declarations than in other implementations. Have a look at them: *now* they
> are not fundamentally different from the algorithms that SBCL uses, same
> asymptotic complexity and all that.

As I just reported in another message, I am seeing a measurable speed up
-- not as terrific as the ones you see in the ANSI test, but still
quite measurable.

Thanks!

-- Gaby