[alexandria-devel] [RFC] collect-duplicates

Mark Cox markcox80 at gmail.com
Tue Dec 7 04:33:32 UTC 2010


Hi Andreas,

There are cases where the group of duplicate values is of use. How about a more general function GROUP-BY which could be used to implement COLLECT-DUPLICATES.

TMP-PACKAGE> (group-by '((a 1) (a 2) (a 3) (b 4) (b 1) (c 1) (c 4)) #'eq :key #'first)
(((A 1) (A 2) (A 3)) ((B 4) (B 1)) ((C 1) (C 4)))

TMP-PACKAGE> (group-by '((a 1) (a 2) (a 3) (b 4) (b 1) (c 1) (c 4)) #'eq :key #'second)
(((A 1) (B 1) (C 1)) ((A 2)) ((A 3)) ((B 4) (C 4)))

Mark

(defun separate-according-to (sequence predicate &key (key #'identity))
  "Returns two lists as values where the first contains the items in
SEQUENCE for which PREDICATE is not NIL and the second contains the
rest."
  (let ((group-equal     nil)
	(group-not-equal nil))
    (flet ((assign-to-group (object)
	     (if (funcall predicate (funcall key object))
		 (push object group-equal)
		 (push object group-not-equal))))
      (map nil #'assign-to-group sequence))

    (values (nreverse group-equal) 
	    (nreverse group-not-equal))))

(defun group-by (sequence test-function &key (key #'identity))
  "Groups items in SEQUENCE according to the TEST-FUNCTION. Each item
in the returned LIST contains each group. i.e. TEST-FUNCTION returns
non NIL for all values in a group."
  (labels ((group-by/recursive (sequence result)
	     (case (length sequence)
	       (0
		(nreverse result))
	       (1
		(nreverse (cons (list (elt sequence 0)) result)))
	       (t
		(let* ((object     (elt sequence 0))
		       (object-key (funcall key object))
		       (fn         (lambda (other-object)
				     (funcall test-function object-key (funcall key other-object)))))
		  (unless (funcall fn object)
		    (error "Invalid TEST-FUNCTION as one item fails test with itself."))
		  (multiple-value-bind (equal-to not-equal-to) (separate-according-to (subseq sequence 1) fn)
		    (group-by/recursive not-equal-to (cons (cons object equal-to)
							   result))))))))
    (group-by/recursive sequence nil)))

(defun collect-duplicates (sequence &key (test #'eq) (key #'identity) (only-duplicates t))
  "Takes a sequence and a test, and returns as two values a sequence
of duplicates and a sequence of the duplicate count. The first
sequence matches the input sequence's type."
  (flet ((one-or-less-p (v)
	   (<= v 1)))
    (let ((groups (group-by sequence test :key key)))
      (let ((groups           (if only-duplicates
				  (remove-if #'one-or-less-p groups :key #'length)
				  groups)))
	(values (map-into (make-sequence (class-of sequence) (length groups)) #'first groups)
		(map 'list #'length groups))))))

On 02/12/2010, at 10:03 AM, Andreas Fuchs wrote:

> Hi there,
> 
> Prompted by a discussion on #lisp, I've made the following
> collect-duplicates function: Takes a sequence and a test, and returns
> as two values a sequence of duplicates and a sequence of the duplicate
> count (matching their sequence type to the input sequence's).
> 
> (defun collect-duplicates (sequence &key (test #'eql))
>  ;; This will work only for TESTs that are valid hash tests:
>  (let ((dupes (make-hash-table :test test))
>        (known (make-hash-table :test test)))
>    (map nil
>         (lambda (elt)
>           (if (gethash elt known)
>               (incf (gethash elt dupes 1))
>               (setf (gethash elt known) t)))
>         sequence)
>    (let ((duplicates (make-sequence (class-of sequence)
> (hash-table-count dupes)))
>          (counts (make-sequence (class-of sequence) (hash-table-count dupes)))
>          (i 0))
>      (maphash (lambda (k v)
>                 (setf (elt duplicates i) k
>                       (elt counts i) v)
>                 (incf i))
>               dupes)
>      (values duplicates counts))))
> 
> Issues that I see with it:
> 
> * Needs a :key argument.
> * :test should allow for non-hash-table tests, as well.
> * (class-of ...) is not really a good idea, as it doesn't preserve
> the element type for arrays.
> 
> If I fixed these issues, would there be interest in having this
> function in alexandria?
> 
> Cheers,
> -- 
> Andreas Fuchs, (http://|im:asf@|mailto:asf@)boinkor.net, antifuchs
> 
> _______________________________________________
> alexandria-devel mailing list
> alexandria-devel at common-lisp.net
> http://common-lisp.net/cgi-bin/mailman/listinfo/alexandria-devel





More information about the alexandria-devel mailing list