[PATCH] Use a more efficient algorithm for list shuffling

salvador fandino sfandino at gmail.com
Fri May 8 10:21:37 UTC 2015


On Fri, May 8, 2015 at 12:06 AM, Svante v. Erichsen <
Svante.v.Erichsen at web.de> wrote:

> Hi!
>
> When I read “sort with random comparator”, I get a bad feeling.  I have not
> examined the code deeply yet, but do you have some reference for an
> explanation
> or perhaps even a proof that this is not biased?
>

A pseudo-formal justification of the algorithm follows:

How can you define a proper shuffling algorithm for a list L of n elements?

The first idea is that for any element E the probability of it ending at
some position P is 1/N. But this is not enough, because there could be
correlations between the assigned positions (in example, rotating the list
(random n) positions honors that condition but is a quite bad shuffling
algorithm).

A better definition of a shuffling algorithm is as follows:

Given...

1) a set S = {s0, s1, ..., sm} containing m elements of L such that m < n
2) a set Q = {q0, q1, ..., qm} containing m positions (integers in the
range [0, n)).
3) an element E of S such that E doesn't belong to S
4) a position P in [0, n) such that P doesn't belong to Q

We say that and algorithm A is a proper shuffling algorithm iff the
probability of the assigned position to the element E being P conditioned
by the position of the elements of S being Q (i.e.: pos(s0) = q0, pos(s1) =
q1,..., pos(sm) = qm) is 1/(n-m) for every S, Q, E, P abiding the
conditions above.

In simpler words, once we have fixed the final positions of m elements of
the list, the rest of the elements are distributed among the remaining free
positions with equal probability 1/(n-m).

Now, from the relation above, we can easily derive another one, that when
applied constructively results on the algorithm I have implemented in
shuffle-sublist:

Given (1), (2), (3), (4) and ...

5) a partition of [0, n) formed by the two intervals A = [0, o) and B = [o,
n) such that 0 < o < n.
6) QA = the intersection of Q and A. and size(QA) its number of elements.

The probability of E getting assigned any position in A is (o - size(QA)) /
(n - m).

Or again in simpler words, the probability of some element E assigned to
any position on A is equal to the number of free positions in A divided by
the total number of free positions.

And that's what shuffle-sublist does, it selects o as (floor n 2) creating
two partitions a and b of [0, n) and then distributes the elements from
list among the two using the probability above (the only tricky thing, is
that it does it destructively reusing the cons cells on list).

Then recursively, it calls itself until the sublist are short enough to
make Fisher-Yates efficient.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.common-lisp.net/pipermail/alexandria-devel/attachments/20150508/ae6fa21f/attachment.html>


More information about the alexandria-devel mailing list