[alexandria-devel] [RFC] collect-duplicates
Mark Cox
markcox80 at gmail.com
Tue Dec 7 04:33:32 UTC 2010
Hi Andreas,
There are cases where the group of duplicate values is of use. How about a more general function GROUP-BY which could be used to implement COLLECT-DUPLICATES.
TMP-PACKAGE> (group-by '((a 1) (a 2) (a 3) (b 4) (b 1) (c 1) (c 4)) #'eq :key #'first)
(((A 1) (A 2) (A 3)) ((B 4) (B 1)) ((C 1) (C 4)))
TMP-PACKAGE> (group-by '((a 1) (a 2) (a 3) (b 4) (b 1) (c 1) (c 4)) #'eq :key #'second)
(((A 1) (B 1) (C 1)) ((A 2)) ((A 3)) ((B 4) (C 4)))
Mark
(defun separate-according-to (sequence predicate &key (key #'identity))
"Returns two lists as values where the first contains the items in
SEQUENCE for which PREDICATE is not NIL and the second contains the
rest."
(let ((group-equal nil)
(group-not-equal nil))
(flet ((assign-to-group (object)
(if (funcall predicate (funcall key object))
(push object group-equal)
(push object group-not-equal))))
(map nil #'assign-to-group sequence))
(values (nreverse group-equal)
(nreverse group-not-equal))))
(defun group-by (sequence test-function &key (key #'identity))
"Groups items in SEQUENCE according to the TEST-FUNCTION. Each item
in the returned LIST contains each group. i.e. TEST-FUNCTION returns
non NIL for all values in a group."
(labels ((group-by/recursive (sequence result)
(case (length sequence)
(0
(nreverse result))
(1
(nreverse (cons (list (elt sequence 0)) result)))
(t
(let* ((object (elt sequence 0))
(object-key (funcall key object))
(fn (lambda (other-object)
(funcall test-function object-key (funcall key other-object)))))
(unless (funcall fn object)
(error "Invalid TEST-FUNCTION as one item fails test with itself."))
(multiple-value-bind (equal-to not-equal-to) (separate-according-to (subseq sequence 1) fn)
(group-by/recursive not-equal-to (cons (cons object equal-to)
result))))))))
(group-by/recursive sequence nil)))
(defun collect-duplicates (sequence &key (test #'eq) (key #'identity) (only-duplicates t))
"Takes a sequence and a test, and returns as two values a sequence
of duplicates and a sequence of the duplicate count. The first
sequence matches the input sequence's type."
(flet ((one-or-less-p (v)
(<= v 1)))
(let ((groups (group-by sequence test :key key)))
(let ((groups (if only-duplicates
(remove-if #'one-or-less-p groups :key #'length)
groups)))
(values (map-into (make-sequence (class-of sequence) (length groups)) #'first groups)
(map 'list #'length groups))))))
On 02/12/2010, at 10:03 AM, Andreas Fuchs wrote:
> Hi there,
>
> Prompted by a discussion on #lisp, I've made the following
> collect-duplicates function: Takes a sequence and a test, and returns
> as two values a sequence of duplicates and a sequence of the duplicate
> count (matching their sequence type to the input sequence's).
>
> (defun collect-duplicates (sequence &key (test #'eql))
> ;; This will work only for TESTs that are valid hash tests:
> (let ((dupes (make-hash-table :test test))
> (known (make-hash-table :test test)))
> (map nil
> (lambda (elt)
> (if (gethash elt known)
> (incf (gethash elt dupes 1))
> (setf (gethash elt known) t)))
> sequence)
> (let ((duplicates (make-sequence (class-of sequence)
> (hash-table-count dupes)))
> (counts (make-sequence (class-of sequence) (hash-table-count dupes)))
> (i 0))
> (maphash (lambda (k v)
> (setf (elt duplicates i) k
> (elt counts i) v)
> (incf i))
> dupes)
> (values duplicates counts))))
>
> Issues that I see with it:
>
> * Needs a :key argument.
> * :test should allow for non-hash-table tests, as well.
> * (class-of ...) is not really a good idea, as it doesn't preserve
> the element type for arrays.
>
> If I fixed these issues, would there be interest in having this
> function in alexandria?
>
> Cheers,
> --
> Andreas Fuchs, (http://|im:asf@|mailto:asf@)boinkor.net, antifuchs
>
> _______________________________________________
> alexandria-devel mailing list
> alexandria-devel at common-lisp.net
> http://common-lisp.net/cgi-bin/mailman/listinfo/alexandria-devel
More information about the alexandria-devel
mailing list