[CDBI] Re: Make CDBI go fast

Edward J. Sabol sabol at alderaan.gsfc.nasa.gov
Thu Feb 15 02:49:18 GMT 2007

Michael G Schwern wrote:
> * Make iterators fetch rows on demand. (Done: see rt.cpan.org 24959)
> In case y'all didn't know, CDBI pre-fetches all the rows out of a database
> when it searches and sticks that list into the iterator. It then creates
> objects on demand from that list. This is really inefficient if you do,
> say, CDBI->search(...)->first and search() does a SELECT which returns a
> lot of rows.
> I've put in a patch which makes iterators fetch from a statement handle on
> demand.

This sounds cool, but the change makes me nervous. Some DBMSs and/or DBDs
(Sybase, for example) don't like unfinished statement handles hanging around.
Could be problematic in a mod_perl environment?

What about CDBI->search() in list context? I hope there would be no change in
implementation there. I can certainly see real-world scenarios where
pre-fetching would beat fetching/inflating on demand.

> * Inflate columns into objects on demand.
> CDBI inflates columns as soon as they are fleshed out. If the
> inflated column is never accessed then its a plain waste. If a class
> has relationships it can result in a cascade of inflations and
> object loads.

This would be nice!

> * Speed up id() and primary_column() (probably by caching
> primary_column)
> In profiling I've found CDBI spends too much time calculating an
> object's id. A lot of this is determining the primary columns and
> then calculating an id from it. This all can probably be cached in
> the object.

A subtle, but small change which should improve performance nicely. I'm
looking forward to this!

> * Allow searches to pre-flesh out column groups (so it takes 1 query
> instead of N)
> Right now, searches will only fetch the essential columns in its
> query. Then if you use a non-essential column it must do another
> query for each object. For lots of objects with several groups this
> can get expensive. It would be handy if there was a syntax like:
>     my @objects = CDBI->search( col => 42, { flesh_out => \@groups } );
> Which, in one query, will select the columns in the essential group
> and any additional columns in the given @groups.

Honestly, this doesn't sound all that useful, but I don't see the harm, I
guess. Personally, I would just put these other columns in the Essential
group in the first place.

What about retrieve()? Similar support for 'flesh_out' there as well?

> * Somehow make joins go fast
> * Construct a hierarchy of CDBI objects from a SELECT

These features would be very welcome, I'm sure, but I fear they will prove
difficult or hack-ish to implement and maintain (c.f. CDBI::Sweet).

> * Allow CDBI to work without a table
> This is not so much about putting a CDBI class around a non-SQL
> database, though that would be handy, but to create objects from
> customized queries with calculated rows. Particularly when a GROUP
> BY is involved.

I'm not sure I understand what you are proposing here.

> * A bulk insert method
> * A bulk delete method
> Calling ->insert over and over again is inefficient. Having to load
> an object only to delete it is even worse. Bulk insert and delete
> methods would be handy.

For some reason, I just don't like the sound of these methods. I would
suggest going the CDBI::Plugin route with them instead. Not everything needs
to be in the core module.

Hope this feedback is useful.


More information about the ClassDBI mailing list