<div dir="ltr"><br><div class="gmail_extra"><div class="gmail_quote">On Fri, May 8, 2015 at 12:06 AM, Svante v. Erichsen <span dir="ltr"><<a href="mailto:Svante.v.Erichsen@web.de" target="_blank">Svante.v.Erichsen@web.de</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi!<br>

<br>

When I read “sort with random comparator”, I get a bad feeling.  I have not<br>

examined the code deeply yet, but do you have some reference for an explanation<br>

or perhaps even a proof that this is not biased?<br></blockquote><div><br>A pseudo-formal justification of the algorithm follows:<br><br>How can you define a proper shuffling algorithm for a list L of n elements?<br><br>The

 first idea is that for any element E the probability of it ending at 

some position P is 1/N. But this is not enough, because there could be 

correlations between the assigned positions (in example, rotating the 

list (random n) positions honors that condition but is a quite bad 

shuffling algorithm).<br><br>A better definition of a shuffling algorithm is as follows:<br><br>Given...<br><br>1) a set S = {s0, s1, ..., sm} containing m elements of L such that m < n<br>2) a set Q = {q0, q1, ..., qm} containing m positions (integers in the range [0, n)).<br>3) an element E of S such that E doesn't belong to S<br>4) a position P in [0, n) such that P doesn't belong to Q<br><br>We

 say that and algorithm A is a proper shuffling algorithm iff the 

probability of the assigned position to the element E being P 

conditioned by the position of the elements of S being Q (i.e.: pos(s0) =

 q0, pos(s1) = q1,..., pos(sm) = qm) is 1/(n-m) for every S, Q, E, P 

abiding the conditions above.<br><br>In simpler words, once we have 

fixed the final positions of m elements of the list, the rest of the 

elements are distributed among the remaining free positions with equal 

probability 1/(n-m).<br><br>Now, from the relation above, we can easily 

derive another one, that when applied constructively results on the 

algorithm I have implemented in shuffle-sublist:<br><br>Given (1), (2), (3), (4) and ...<br><br>5) a partition of [0, n) formed by the two intervals A = [0, o) and B = [o, n) such that 0 < o < n.<br>6) QA = the intersection of Q and A. and size(QA) its number of elements.<br><br>The probability of E getting assigned any position in A is (o - size(QA)) / (n - m).<br><br>Or

 again in simpler words, the probability of some element E assigned to 

any position on A is equal to the number of free positions in A divided 

by the total number of free positions.<br><br>And that's what 

shuffle-sublist does, it selects o as (floor n 2) creating two 

partitions a and b of [0, n) and then distributes the elements from list

 among the two using the probability above (the only tricky thing, is 

that it does it destructively reusing the cons cells on list).<br><br>Then recursively, it calls itself until the sublist are short enough to make Fisher-Yates efficient.<br><br></div></div></div></div>