[elephant-devel] Collection paging
Ian Eslick
eslick at csail.mit.edu
Fri Oct 19 13:04:45 UTC 2007
Can we come up with a list of functions like this that people would
like to see?
Right now I have:
- feature (difficulty)
- size of btree (moderate)
- size of sub-range? (harder)
- go to N values from btree starting at the Kth key in btree
Ian
On Oct 3, 2007, at 11:07 PM, lists at infoway.net wrote:
> I have to say that Mariano is hitting some of the issues we will be
> facing soon as our quest to learn Lisp and Elephant continues and
> we continue working on migrating some of our SQL-based applications
> over. This particular need of his is also a real need we have since
> it's something we offer to our application users. For example, in
> the application we are working on migrating, we have a table with
> over 7 million rows. This obviously has many thousands of 50-row
> pages to navigate through. Our user interface offers the "usual"
> search, sort by any column(s), and page navigation (First, Last,
> Next, Previous, or manually input a page number).
>
> The way we handle this in our code is something like SELECT COUNT
> (*) FROM table_name. From the count, figure out the number of
> pages. Then compute an offset based on the page the user wants to
> view (e.g. assuming 50 rows per page and wanting to view page 90,
> the offset would be (50 * 90) - 1 = 4499) and formulate the SQL
> query as something like SELECT * FROM table_name ORDER BY
> {sort_order} OFFSET {computed_offset} LIMIT 50 (note that all this
> assuming a 50-row page size, and the user also has the ability to
> change the page size via the web interface)
>
> From a SQL data manipulation language perspective, it's pretty
> straight forward. From a SQL internal execution path, I really have
> no idea how it's implemented and don't know if it does any linear
> scanning to return the results. The fact is that our application
> allows you to navigate through the 7+ million row table in under 2
> seconds per page no matter which page you wish to view or sort
> order. From a user perspective, 2 seconds for a browser-based
> screen refresh is more than acceptable. Will Elephant allow to
> "refresh" as quickly if in the current model it needs to do a
> linear scan? We haven't gotten there yet, but maybe someone can
> comment on that.
>
> Thanks
>
> On Oct 3, 2007, at 9:13 PM, Ian S Eslick wrote:
>
>> When you say indexes are not sequential, do you mean UIDs are not
>> sequentially allocated? I think there is a BDB sequence issue
>> that I've never worried about that jumps to the nearest 100 when
>> you reconnect. However, if you create anything other than a user
>> object, you will also have gaps in the UID sequence so that's a
>> fundamental issue. Don't assume anything about UIDs other than
>> the fact that they are unique.
>>
>> You could create and index your own field which is a sequential ID
>> for creation ordering, but it sounds like you probably want to
>> return a sublist based on some sort order like alphabetical by
>> name or by date. In this case, at least doing the last page is
>> easy, map from end and count the # of users you want before you
>> terminate, but to find an element that is N elements away from the
>> first or last element in less than O(n) time isn't possible with
>> the underlying B-Trees we're using.
>>
>> The first question is whether you database is guaranteed to be so
>> big that you can't just do this linear time. When you start to
>> face performance issues, then you can look at building that
>> additional data structure.
>>
>> Otherwise, you will have to implement a data structure that
>> maintains this information on top of the Elephant infrastructure.
>>
>> The first idea that occurs to me is to drop the idea of using an
>> indexed class or standalone btrees and just build a red-black tree
>> using object slots (you can inherit from a base class that
>> implements the RB tree functionality). This simultaneously solves
>> the count problem and the access element # N problem. The O(log
>> (base 2) N) lookup time will have a higher fixed cost per level
>> traversal, but if you start getting really large dbs (1000's to
>> 10k's?) then it will certainly beat a linear map-index approach.
>> i.e.
>>
>> http://en.wikipedia.org/wiki/Red-black_tree
>>
>> There is a lisp example of this data structure here:
>>
>> http://www.aviduratas.de/lisp/progs/rb-trees.lisp
>>
>> Now there is a problem that you'll need one of these for each
>> sorted order which for a list sorted many different ways is a
>> problem. Anyone know how SQL query systems implement this?
>>
>> Just remember that premature optimization is one of the four
>> horseman of the apocalypse for the effective programmer.
>>
>> Ian
>> ----- Original Message -----
>> From: Mariano Montone
>> To: Elephant bugs and development
>> Sent: Wednesday, October 03, 2007 6:57 PM
>> Subject: [elephant-devel] Collection paging
>>
>> Hello, it's me again :S.
>>
>> I would like to know how I can access persistent collection pages
>> efficiently.
>>
>> What I'm trying to do is making work a web list component with
>> elephant. The list component is supposed to support well known
>> navigation commands, like look at the collection in pages, support
>> for first, last, next, previous buttons, and display of collection
>> size.
>>
>> The collection size problem was treated here: http://common-
>> lisp.net/pipermail/elephant-devel/2007-October/001162.html.
>>
>> But now I have a problem with building the pages.
>>
>> My first try was:
>> (let*
>> ((start (* (current-page self) (page-size self)))
>> (end (+ start (page-size self)))
>> )
>> (<:ul
>> (elephant:map-btree #'(lambda (key elem) (declare (ignore
>> key))
>> (let ((elem-text (make-elem-text self elem)))
>> (<:li
>> (if (slot-value self 'selectable)
>> (<ucw:a :action (answer elem) (<:as-
>> html elem-text))
>> (<:a (<:as-html elem-text))))))
>> (model self) :start start :end end)
>> )
>>
>> with start and end previously fixed based in the current page
>> number and size.
>>
>> But I realized indexes were not sequential when I created new
>> objects, as this shows:
>>
>> ASKIT> (with-btree-cursor (cursor (find-class-index 'user))
>> (iter
>> (for (values exists? k v) = (cursor-next cursor))
>> (while exists?)
>> (format *standard-output* "~A -> ~A ~%" k v)))
>> 2 -> #<USER name: dssdf {B043379}>
>> 3 -> #<USER name: ttttt {B045C69}>
>> 5 -> #<USER name: ff {B048179}>
>> 6 -> #<USER name: other {B04A451}>
>> 7 -> #<USER name: guest {AD61271}>
>> 100 -> #<USER name: qqq {B053001}>
>> 101 -> #<USER name: {B055721}>
>> 102 -> #<USER name: {B057E01}>
>> 103 -> #<USER name: {B05A529}>
>> 104 -> #<USER name: {B05CCF1}>
>> 105 -> #<USER name: {B05F579}>
>> 106 -> #<USER name: {B063E91}>
>> 107 -> #<USER name: qqq {B066851}>
>> 200 -> #<USER name: {B069519}>
>> 201 -> #<USER name: {B06C009}>
>> 300 -> #<USER name: {B06EBA1}>
>> 301 -> #<USER name: aaa {B0717D1}>
>> NIL
>>
>> I don't think this is a bug, it must have to do with how Elephant
>> manages btrees; but then how am I supposed to access through pages?
>> I would like to have to access all the objects from the beggining
>> just to discard them instantly (imagine a large collection and the
>> user wanting to see the last page).
>>
>> Thank you again :)
>>
>> Mariano
>>
>>
>> _______________________________________________
>> elephant-devel site list
>> elephant-devel at common-lisp.net
>> http://common-lisp.net/mailman/listinfo/elephant-devel
>> _______________________________________________
>> elephant-devel site list
>> elephant-devel at common-lisp.net
>> http://common-lisp.net/mailman/listinfo/elephant-devel
>
> _______________________________________________
> elephant-devel site list
> elephant-devel at common-lisp.net
> http://common-lisp.net/mailman/listinfo/elephant-devel
More information about the elephant-devel
mailing list