[vivace-graph-devel] Rucksack Btree number-indexes on UUID integer-128 representations -- a sketch

MON KEY monkey at sandpframing.com
Mon Sep 12 22:56:48 UTC 2011


I've made some more timings to initially gauge how long it might take
to do a "key-<" lookup on a v4 UUID bit-vector.

Also, i've included some examples which indicate where fanout in a
Btree of v4-uuids should occur.

Following parameter is used by function `find-first-bit' defined
below:

(defparameter *bit-vector-bit-table* (make-array 129))

Add 127 bit-vectors each with one bit set at an offset in range 0,127:

(flet ((set-one-bit (idx)
         (let ((bv (make-array 128 :element-type 'bit)))
           (setf (sbit bv idx) 1)
           bv)))
  (loop
     for idx from 0 below 128
     do (setf (aref *bit-vector-bit-table* idx)
              (set-one-bit idx))
     finally (setf (aref *bit-vector-bit-table* 128)
(unicly::uuid-bit-vector-128-zeroed))))

;; Find the first non-zero bit in UUID-BV.  This function is slower
;; than `find-first-bit-by-bit' but is more readily adaptable for use
;; with Btrees where there is a requirement to descend nodes.  Walk
;; each bit-vector bv-N in *bit-vector-bit-table* taking the
;; `cl:bit-and' of bv-N and UUID-BV.  We put the return value of each
;; `cl:bit-and' on the local loop var maybe-not-zeroed.  As soon as
;; maybe-not-zeroed is not equal the equivalent of
;; (unicly::uuid-bit-vector-128-zeroed) we return the index of the
non-zero bit found.
(defun find-first-bit (uuid-bv)
  (loop
     with always-zeroed = (aref *bit-vector-bit-table* 128)
     with maybe-not-zeroed = (unicly::uuid-bit-vector-128-zeroed)
     for bv across *bit-vector-bit-table*
     for cnt from 0 below 128
     do (bit-and bv uuid-bv maybe-not-zeroed)
     until (not (equal always-zeroed maybe-not-zeroed))
     finally (return cnt)))

;; find the first non-zero bit in uuid-bv and return its position.
(defun find-first-bit-by-bit (uuid-bv)
  (loop
     for x across uuid-bv
     for y from 0 below 128
     when (plusp (sbit uuid-bv y))
     do (return y)))

This example shows a toy "bv-key->" in which we test only the MSB bit
(the 0 bit) for two v4 UUID bit-vectors. Note, we still have +/- 127
more bits to traverse before we might find which node the bit-vector
belongs to:

(loop
   repeat 100
   collect  (> (find-first-bit-by-bit (unicly::uuid-to-bit-vector
(unicly::make-v4-uuid)))
               (find-first-bit-by-bit (unicly::uuid-to-bit-vector
(unicly::make-v4-uuid)))))


;; An array of 100k elts. We use it to find the distribution of
;; first-bit non-zero bits in v4 UUIDs
(defparameter *tt--sample-bv-array* (make-array 100000))

;; Poplute the array of variable *tt--sample-bv-array* with 100k new
;; v4 uuid bit-vectors.
(defun make-new-random-uuid-table ()
  (loop
     for x from 0 below 100000
     do (setf (aref *tt--sample-bv-array* x)
              (unicly:uuid-to-bit-vector (unicly:make-v4-uuid)))))

;; Do it now.
(make-new-random-uuid-table)

;; (aref *tt--sample-bv-array* 99999)

Timing for `find-first-bit' for all 100k elts of `*tt--sample-bv-array*':

(sb-ext:gc :full t)
(time
 (loop
    for x from 0 below 100000
    do (find-first-bit (aref *tt--sample-bv-array* x))))
;;
;; Evaluation took:
;;   0.049 seconds of real time
;;   0.048993 seconds of total run time (0.041994 user, 0.006999 system)
;;   100.00% CPU
;;   147,111,712 processor cycles
;;   4,798,088 bytes consed

;; timing for `find-first-bit-by-bit' for all 100k elts of
;; `*tt--sample-bv-array*'
(sb-ext:gc :full t)
(time
 (loop
    for x from 0 below 100000
    do (find-first-bit-by-bit (aref *tt--sample-bv-array* x))))
;;
;; Evaluation took:
;;   0.009 seconds of real time
;;   0.007999 seconds of total run time (0.007999 user, 0.000000 system)
;;   88.89% CPU
;;   27,299,580 processor cycles
;;   0 bytes consed

;; Evaluating `get-random-uuid-table-distribution' should give an idea
;; of the inital fanout we might expect.
(defun get-random-uuid-table-distribution ()
  (let ((cnt-table (make-hash-table)))
    (loop
       for x from 0 below 128
       do (setf (gethash x cnt-table) 0))
    (loop
       initially (make-new-random-uuid-table)
       for x from 0 below 100000
       for y = (find-first-bit-by-bit (aref *tt--sample-bv-array* x))
       do  (incf (gethash y cnt-table))
       finally (return (loop
                          for x from 0 below 128
                          collect (cons x (gethash x cnt-table)) into idx-counts
                          finally (return (remove-if #'null
                                                     (map 'list
                                                          #'(lambda (x)
                                                              (and
(plusp (cdr x))
                                                                   x))
                                                          idx-counts))))))))
(dotimes (i 10)
  (terpri)
  (print (get-random-uuid-table-distribution)))

; ((0 . 50005) (1 . 24972) (2 . 12341) (3 . 6323) (4 . 3123) (5 . 1628)
;  (6 . 789) (7 . 420) (8 . 192) (9 . 114) (10 . 47) (11 . 27) (12 . 9)
;  (13 . 4) (14 . 2) (15 . 3) (19 . 1))
;
; ((0 . 50063) (1 . 25070) (2 . 12402) (3 . 6229) (4 . 3052) (5 . 1630)
;  (6 . 769) (7 . 384) (8 . 197) (9 . 115) (10 . 51) (11 . 18) (12 . 10)
;  (13 . 6) (14 . 3) (15 . 1))
;
; ((0 . 49870) (1 . 25112) (2 . 12435) (3 . 6268) (4 . 3187) (5 . 1550)
;  (6 . 795) (7 . 393) (8 . 189) (9 . 105) (10 . 46) (11 . 25) (12 . 16)
;  (13 . 1) (14 . 5) (16 . 3))
;
; ((0 . 49986) (1 . 24947) (2 . 12571) (3 . 6243) (4 . 3121) (5 . 1538)
;  (6 . 783) (7 . 411) (8 . 200) (9 . 101) (10 . 48) (11 . 26) (12 . 12)
;  (13 . 6) (14 . 4) (15 . 1) (16 . 1) (17 . 1))
;
; ((0 . 49920) (1 . 25063) (2 . 12631) (3 . 6290) (4 . 3060) (5 . 1472)
;  (6 . 780) (7 . 377) (8 . 193) (9 . 108) (10 . 44) (11 . 29) (12 . 16)
;  (13 . 4) (14 . 5) (15 . 5) (16 . 3))
;
; ((0 . 50068) (1 . 24900) (2 . 12508) (3 . 6245) (4 . 3150) (5 . 1585)
;  (6 . 761) (7 . 395) (8 . 194) (9 . 99) (10 . 41) (11 . 35) (12 . 13)
;  (13 . 3) (14 . 3))
;
; ((0 . 50170) (1 . 24891) (2 . 12450) (3 . 6221) (4 . 3210) (5 . 1571)
;  (6 . 726) (7 . 383) (8 . 189) (9 . 95) (10 . 51) (11 . 21) (12 . 8)
;  (13 . 5) (14 . 6) (15 . 2) (17 . 1))
;
; ((0 . 50115) (1 . 24924) (2 . 12751) (3 . 6090) (4 . 3049) (5 . 1541)
;  (6 . 769) (7 . 381) (8 . 185) (9 . 88) (10 . 54) (11 . 25) (12 . 14)
;  (13 . 9) (14 . 1) (15 . 2) (17 . 2))
;
; ((0 . 49698) (1 . 25089) (2 . 12637) (3 . 6268) (4 . 3143) (5 . 1595)
;  (6 . 753) (7 . 382) (8 . 209) (9 . 114) (10 . 59) (11 . 28) (12 . 9)
;  (13 . 9) (14 . 4) (15 . 2) (16 . 1))
;
; ((0 . 49755) (1 . 25134) (2 . 12420) (3 . 6368) (4 . 3104) (5 . 1596)
;  (6 . 802) (7 . 421) (8 . 179) (9 . 103) (10 . 51) (11 . 28) (12 . 21)
;  (13 . 12) (14 . 4) (16 . 2))

:NOTE One Thing to take into cosideration is that a Btree scheme that
frobs the UUID bit-vector might want to take care to be
unicly::uuid-version-bit-vector aware.  E.g. the output from following
example makes it pretty clear that any node branching on the value of
bit 49 is gonna always contain every v4 UUID. Note also that this is
equally true of the uuid-integer-128 representation...

After evaluating form below you should see a line of 1's at column 52
in your slime-repl:

(dotimes (i 100 (terpri))
  (terpri)
  (unicly:uuid-print-bit-vector t (unicly:make-v4-uuid)))




More information about the vivace-graph-devel mailing list