[CDBI] Make CDBI go fast
list at bereft.net
Thu Mar 22 05:15:07 GMT 2007
Perrin Harkins wrote:
> On 3/21/07, Brad Bowman <list at bereft.net> wrote:
>> That's what I was thinking, do one table scan keeping a sorted set of
>> the top $N, comparing each scanned row with the last of the current
>> row set. Then the set is then kept in an index and maintained during
>> insertions (easy) and deletions (if one of the set is deleted a scan
>> may be needed) and updates.
> That's exactly what an index is. Once it has been sorted, keeping it
> sorted is not that big a hit. It's probably a BTree.
True. My point is that it would be *possible* to maintain a LIMITed
version of an index. Whether it's could be useful would depend on the
situation. The index could be a B-Tree based also, just using a
different update process.
The reduced size and marginally reduced maintenance overhead
could pay off in a situation where many partial indexes using a variety
of orderings would be useful. If full indexes for each combination
of columns and orderings was unacceptable, partial indexes may
do the job.
That said, I can't think of a situation where I have needed
this. Also, without database support, the administration pain
probably sinks the idea.
Furthermore, the whole approach may just be wrong-headed and these problems
better addressed in another manner. Perhaps with caching or a different
>> The benefits would be less space used and lower maintenance overhead
>> than a full index.
> Once you've done the initial sort when building the index, I doubt it
> makes much difference.
I'd guess the memory use would be similar, assuming only the B-Tree
nodes related to the top $N were being hammered. The disk use could be
substantially reduced though.
I don't really know about the index maintenance, so I'll take your
word for it.
> The Pg partial index thing doesn't sound applicable. I suspect that
> if you tried to give it criteria that would limit based on a sort
> order, it would either fail or have to sort the whole table every time
> you did a delete. By definition, you wouldn't have an index that it
> can use to determine the next row after the indexed ones, so when one
> of them goes away...
Right. The Pg partial index seems only do where clauses, not order or
limiting. I'm not sure if these else could be shoe-horned in.
(I still love Postgres though :), and SQLite)
I was just thinking about the delete case and reached a similar conclusion.
The best solution I could think of was to start with slightly more than
the top $N, allowing a certain number of deletes before a re-sort
would be required. (The re-sort could theoretically be partial)
Say with 2*$N in the initial index, you would need $N deletes from the
top group without compensating inserts to cause a resort. That seems
ok from a hand-waving level of analysis.
A man who will criticize you openly carries no connivance.
-- Hagakure http://bereft.net/hagakure/
More information about the ClassDBI