[elephant-devel] Why does an empty BDB repository take 40 MB?

Alex Mizrahi killerstorm at newmail.ru
Thu Apr 22 10:20:01 UTC 2010


 ??>> SLSIA.  I create a brand new BDB-backed elephant repository
 ??>> and it takes up 40MB of disk space.  Why?

 LPP> Elephant creates some btrees as part of repository initialization.

 LPP> What you're seeing is probably a combination of BDB log files (try to
 LPP> invoke db_archive with the -d switch[1]) and preallocated disk space
 LPP> (to avoid excessive fragmentation when the tree is filled).

You're right, but files it pre-allocates are sparse, which means it's much 
less of a problem w.r.t. disk space.
Here're results of my investigation (as seen in comp.lang.lisp):

----
BerkeleyDB creates large files for its work:

alex at debetch:~/foobla$ ls -l
-rw-r----- 1 alex alex    24576 2010-04-21 01:37 __db.001
-rw-r----- 1 alex alex  1327104 2010-04-21 01:37 __db.002
-rw-r----- 1 alex alex 26222592 2010-04-21 01:37 __db.003
-rw-r----- 1 alex alex    98304 2010-04-21 01:37 __db.004
-rw-r----- 1 alex alex   557056 2010-04-21 01:37 __db.005
-rw-r----- 1 alex alex   253952 2010-04-21 01:37 __db.006
-rw-r----- 1 alex alex    40960 2010-04-21 01:33 %ELEPHANT
-rw-r----- 1 alex alex    16384 2010-04-21 01:32 %ELEPHANTDUP
-rw-r----- 1 alex alex    16384 2010-04-21 01:33 %ELEPHANTOID
-rw-r----- 1 alex alex 10485760 2010-04-21 01:33 log.0000000001

But those files are sparse, they do not eat space on disk until they are
populated:

alex at debetch:~/foobla$ du -h
2.0M    .

alex at debetch:~/foobla$ du -h *
12K     __db.001
1.1M    __db.002
296K    __db.003
24K     __db.004
364K    __db.005
16K     __db.006
40K     %ELEPHANT
16K     %ELEPHANTDUP
16K     %ELEPHANTOID
104K    log.0000000001

alex at debetch:~$ tar czf foobla.tgz foobla
alex at debetch:~$ ls -l foobla.tgz
-rw-r--r-- 1 alex alex 86029 2010-04-21 01:40 foobla.tgz

Well, if you use filesystem which supports sparse files.
If you don't like this anyway, you can configure BDB to allocate smaller
files.
File __db.003 seems to be related to cache size, default cache in
config.sexp is 20MB.
If you set it to 256KiB (default for BDB) in my-config.sexp:

 (:BERKELEY-DB-CACHESIZE . 262144)

You won't have that large file:
-rw-r----- 1 alex alex    24576 2010-04-21 01:54 __db.001
-rw-r----- 1 alex alex   385024 2010-04-21 01:54 __db.002
-rw-r----- 1 alex alex   335872 2010-04-21 01:54 __db.003
-rw-r----- 1 alex alex    98304 2010-04-21 01:54 __db.004
-rw-r----- 1 alex alex   557056 2010-04-21 01:54 __db.005
-rw-r----- 1 alex alex   253952 2010-04-21 01:54 __db.006
-rw-r----- 1 alex alex    40960 2010-04-21 01:54 %ELEPHANT
-rw-r----- 1 alex alex    16384 2010-04-21 01:54 %ELEPHANTDUP
-rw-r----- 1 alex alex    16384 2010-04-21 01:54 %ELEPHANTOID
-rw-r----- 1 alex alex 10485760 2010-04-21 01:54 log.0000000001

There is also 10 MiB log file. It is default size for BDB. There is no good
way to tweak in Elephant, but there is BDB API for this, so it's possible to
implement it, if you think it is really needed.
---- 






More information about the elephant-devel mailing list