[CDBI] Make CDBI go fast

Brad Bowman list at bereft.net
Thu Mar 22 05:15:07 GMT 2007


Perrin Harkins wrote:
> On 3/21/07, Brad Bowman <list at bereft.net> wrote:
>> That's what I was thinking, do one table scan keeping a sorted set of
>> the top $N, comparing each scanned row with the last of the current
>> row set.  Then the set is then kept in an index and maintained during
>> insertions (easy) and deletions (if one of the set is deleted a scan
>> may be needed) and updates.
> 
> That's exactly what an index is.  Once it has been sorted, keeping it
> sorted is not that big a hit.  It's probably a BTree.

True.  My point is that it would be *possible* to maintain a LIMITed
version of an index.  Whether it's could be useful would depend on the
situation.  The index could be a B-Tree based also, just using a
different update process.

The reduced size and marginally reduced maintenance overhead
could pay off in a situation where many partial indexes using a variety
of orderings would be useful.  If full indexes for each combination
of columns and orderings was unacceptable, partial indexes may
do the job.

That said, I can't think of a situation where I have needed
this.  Also, without database support, the administration pain
probably sinks the idea.

Furthermore, the whole approach may just be wrong-headed and these problems
better addressed in another manner.  Perhaps with caching or a different
database design.

>> The benefits would be less space used and lower maintenance overhead
>> than a full index.
> 
> Once you've done the initial sort when building the index, I doubt it
> makes much difference.

I'd guess the memory use would be similar, assuming only the B-Tree
nodes related to the top $N were being hammered.  The disk use could be
substantially reduced though.

I don't really know about the index maintenance, so I'll take your
word for it.

> The Pg partial index thing doesn't sound applicable.  I suspect that
> if you tried to give it criteria that would limit based on a sort
> order, it would either fail or have to sort the whole table every time
> you did a delete.  By definition, you wouldn't have an index that it
> can use to determine the next row after the indexed ones, so when one
> of them goes away...

Right.  The Pg partial index seems only do where clauses, not order or
limiting.  I'm not sure if these else could be shoe-horned in.
(I still love Postgres though :), and SQLite)

I was just thinking about the delete case and reached a similar conclusion.
The best solution I could think of was to start with slightly more than
the top $N, allowing a certain number of deletes before a re-sort
would be required.  (The re-sort could theoretically be partial)

Say with 2*$N in the initial index, you would need $N deletes from the
top group without compensating inserts to cause a resort.  That seems
ok from a hand-waving level of analysis.

Brad

-- 
   A man who will criticize you openly carries no connivance.
                       -- Hagakure http://bereft.net/hagakure/



More information about the ClassDBI mailing list