Parsing big XML files with klacks and sbcl

Mark Janssen mpc.janssen at gmail.com
Tue May 29 16:33:57 UTC 2018


I am trying to parse a big xml file (around several GBs) and I am
using klacks because of the the size.

However it seems that there is some leak during parsing because the
memory use continuously increase until sbcl runs out of memory.

What am I missing?

Regards,
Mark

Some info:

$ sbcl --version
SBCL 1.4.8

The script:

(ql:quickload 'cxml)
(defparameter *src*  (cxml:make-source (pathname "huge.xml")))
(loop while t do
      (klacks:consume *src*))


A (room t) call when breaking to the debugger:

0] (gc)
; No debug variables for current frame: using EVAL instead of EVAL-IN-FRAME.
NIL
0] (room t)
; No debug variables for current frame: using EVAL instead of EVAL-IN-FRAME.
Dynamic space usage is:   231,363,600 bytes.
Immobile space usage is:   15,866,480 bytes (116,720 bytes overhead).
Read-only space usage is:           0 bytes.
Static space usage is:            704 bytes.
Control stack usage is:         9,648 bytes.
Binding stack usage is:         2,064 bytes.
Control and binding stack usage is for the current thread only.
Garbage collection is currently enabled.

Summary of spaces: dynamic immobile static

CONS:
    198,982,960 bytes, 12,436,435 objects, 100% dynamic.

CODE:
    13,755,264 bytes, 22,368 objects, 100% immobile, 0% dynamic.

SIMPLE-VECTOR:
    10,533,136 bytes, 80,217 objects, 100% dynamic.

INSTANCE:
    7,169,776 bytes, 126,568 objects, 2% immobile, 98% dynamic.

SIMPLE-ARRAY-UNSIGNED-BYTE-64:
    3,423,232 bytes, 1,867 objects, 100% dynamic.

SIMPLE-ARRAY-UNSIGNED-BYTE-8:
    2,874,208 bytes, 39,494 objects, 100% dynamic.

SIMPLE-BASE-STRING:
    2,031,264 bytes, 40,187 objects, 100% dynamic.

SYMBOL:
    1,778,320 bytes, 37,048 objects, 0% static, 67% immobile, 33% dynamic.

BIGNUM:
    1,327,760 bytes, 40,630 objects, 100% dynamic.

SIMPLE-CHARACTER-STRING:
    1,156,736 bytes, 13,489 objects, 100% dynamic.

SIMPLE-ARRAY-UNSIGNED-BYTE-32:
    888,800 bytes, 24,915 objects, 100% dynamic.

FDEFN:
    663,360 bytes, 20,730 objects, 0% static, 100% immobile.

CLOSURE:
    595,344 bytes, 16,722 objects, 100% dynamic.

SIMPLE-ARRAY-UNSIGNED-BYTE-16:
    588,608 bytes, 4,832 objects, 100% dynamic.

SIMPLE-ARRAY-UNSIGNED-BYTE-31:
    243,616 bytes, 4 objects, 100% dynamic.

SIMPLE-ARRAY-SIGNED-BYTE-8:
    196,208 bytes, 6,131 objects, 100% dynamic.

FUNCALLABLE-INSTANCE:
    160,368 bytes, 4,149 objects, 44% immobile, 56% dynamic.

SIMPLE-BIT-VECTOR:
    44,544 bytes, 100 objects, 100% dynamic.

SIMPLE-ARRAY-SIGNED-BYTE-16:
    15,072 bytes, 208 objects, 100% dynamic.

SIMPLE-ARRAY-SIGNED-BYTE-32:
    8,096 bytes, 194 objects, 100% dynamic.

VALUE-CELL:
    5,584 bytes, 349 objects, 100% dynamic.

SIMPLE-ARRAY-FIXNUM:
    2,960 bytes, 7 objects, 100% dynamic.

ARRAY-HEADER:
    2,208 bytes, 28 objects, 100% dynamic.

RATIO:
    1,024 bytes, 32 objects, 100% dynamic.

DOUBLE-FLOAT:
    704 bytes, 44 objects, 100% dynamic.

WEAK-POINTER:
    448 bytes, 14 objects, 100% dynamic.

SAP:
    256 bytes, 16 objects, 100% dynamic.

SIMPLE-ARRAY-UNSIGNED-BYTE-2:
    96 bytes, 2 objects, 100% dynamic.

SIMPLE-ARRAY-UNSIGNED-FIXNUM:
    80 bytes, 3 objects, 100% dynamic.

COMPLEX-DOUBLE-FLOAT:
    64 bytes, 2 objects, 100% dynamic.

COMPLEX:
    32 bytes, 1 object, 100% dynamic.

COMPLEX-SINGLE-FLOAT:
    32 bytes, 2 objects, 100% dynamic.

SIMD-PACK:
    32 bytes, 1 object, 100% dynamic.

SIMPLE-ARRAY-NIL:
    32 bytes, 2 objects, 100% dynamic.

SIMPLE-ARRAY-UNSIGNED-BYTE-4:
    16 bytes, 1 object, 100% dynamic.

SIMPLE-ARRAY-UNSIGNED-BYTE-7:
    16 bytes, 1 object, 100% dynamic.

SIMPLE-ARRAY-UNSIGNED-BYTE-15:
    16 bytes, 1 object, 100% dynamic.

SIMPLE-ARRAY-UNSIGNED-BYTE-63:
    16 bytes, 1 object, 100% dynamic.

SIMPLE-ARRAY-SIGNED-BYTE-64:
    16 bytes, 1 object, 100% dynamic.

SIMPLE-ARRAY-SINGLE-FLOAT:
    16 bytes, 1 object, 100% dynamic.

SIMPLE-ARRAY-DOUBLE-FLOAT:
    16 bytes, 1 object, 100% dynamic.

SIMPLE-ARRAY-COMPLEX-SINGLE-FLOAT:
    16 bytes, 1 object, 100% dynamic.

SIMPLE-ARRAY-COMPLEX-DOUBLE-FLOAT:
    16 bytes, 1 object, 100% dynamic.

Summary total:
    246,450,368 bytes, 12,916,800 objects.

Top 10 dynamic instance types:
  COMPILED-DEBUG-FUN           1,534,912 bytes,  23,983 objects.
  COMPILED-DEBUG-FUN-EXTERNAL  1,468,160 bytes,  22,940 objects.
  COMPILED-DEBUG-INFO            986,016 bytes,  20,542 objects.
  DEFINITION-SOURCE-LOCATION     241,056 bytes,   7,533 objects.
  FAST-METHOD-CALL               239,952 bytes,   4,999 objects.
  SLOT-INFO                      220,272 bytes,   4,589 objects.
  VOP-PARSE                      190,624 bytes,     851 objects.
  VOP-INFO                       183,456 bytes,     819 objects.
  FUN-TYPE                       155,520 bytes,   1,620 objects.
  ARG-INFO                       140,928 bytes,   1,468 objects.

  Other types                  1,691,632 bytes,  36,184 objects.
  Dynamic instance total       7,052,528 bytes, 125,528 objects.

Top 10 immobile instance types:
  LAYOUT                   119,616 bytes, 1,068 objects.
  PACKAGE                    4,224 bytes,    33 objects.

  Immobile instance total  123,840 bytes, 1,101 objects.

Top 10 static instance types:

  Static instance total  0 bytes, 0 objects.



More information about the cxml-devel mailing list