[CDBI] Make CDBI go fast

Brad Bowman list at bereft.net
Wed Mar 21 23:22:27 GMT 2007


> On 3/21/07, Dave Howorth <dhoworth at mrc-lmb.cam.ac.uk> wrote:
>> Don't you just need to run through the file once, updating the current
>> first 11 at each step? I.e. compare each element against the current
>> 11th and swap the element into the current 11 if appropriate.
>> Potentially a lot less comparisons and exchanges than sorting the
>> complete file.

Yes.  Although the pathological case were every row updates the current
set of 11 may be quite common.  Eg. if the table is ordered ASC on some key
and you're building a DESC partial index  (or a composite DESC, ASC as 
described earlier in the thread)

Perrin Harkins wrote:
> The big win though is to make it so your sort can be handled by
> indexes.  Once it goes to filesort on any large amount of data, it's
> not going to be fast.

That's what I was thinking, do one table scan keeping a sorted set of
the top $N, comparing each scanned row with the last of the current
row set.  Then the set is then kept in an index and maintained during
insertions (easy) and deletions (if one of the set is deleted a scan
may be needed) and updates.  Then the filesort could be avoided within
the partial index limit.

The benefits would be less space used and lower maintenance overhead
than a full index.  Given this, it may become practical to maintain
indexes for a larger number of ordering combinations.  If queries could 
be answered from the indexes alone it may be faster still.

This does assume that most of the time your application isn't going
to continue past the $N limit.  

Brad Bowman

-- 
 Even if it seems certain that you will lose, retaliate. Neither wisdom
 nor technique has a place in this. A real man does not think of victory
 or defeat. He plunges recklessly towards an irrational death. By doing
 this you will awaken from your dreams. -- http://bereft.net/hagakure/



More information about the ClassDBI mailing list