[CDBI] Make CDBI go fast

Michael G Schwern schwern at gmail.com
Thu Feb 15 01:50:20 GMT 2007

I've been handed a project to make Class::DBI go fast.  CDBI was never meant to be particularly fast so this should be interesting.  Its nice to get to work on CDBI again.  I hope I can get some ideas/feedback from folks on the list.

Amongst the performance issues are...

* Make iterators fetch rows on demand. (Done: see rt.cpan.org 24959)

In case y'all didn't know, CDBI pre-fetches all the rows out of a database when it searches and sticks that list into the iterator.  It then creates objects on demand from that list.  This is really inefficient if you do, say, CDBI->search(...)->first and search() does a SELECT which returns a lot of rows.

I've put in a patch which makes iterators fetch from a statement handle on demand.

* Inflate columns into objects on demand.

CDBI inflates columns as soon as they are fleshed out.   If the inflated column is never accessed then its a plain waste.  If a class has relationships it can result in a cascade of inflations and object loads.

* Reduce the amount of object stringifications inside CDBI.

Operator overloading is sluggish in Perl compared to a regular method calls.  Avoid using it as much as possible inside CDBI.

* Speed up id() and primary_column() (probably by caching primary_column)

In profiling I've found CDBI spends too much time calculating an object's id.  A lot of this is determining the primary columns and then calculating an id from it.  This all can probably be cached in the object.

* Allow searches to pre-flesh out column groups (so it takes 1 query instead of N)

Right now, searches will only fetch the essential columns in its query.  Then if you use a non-essential column it must do another query for each object.  For lots of objects with several groups this can get expensive.  It would be handy if there was a syntax like:

    my @objects = CDBI->search( col => 42, { flesh_out => \@groups } );

Which, in one query, will select the columns in the essential group and any additional columns in the given @groups.

* Somehow make joins go fast

The existing system of relationships doesn't take advantage of joins.

* Construct a hierarchy of CDBI objects from a SELECT

Use the results of one custom query to be used to create several objects.  In one query which returns columns from several tables you could create several objects per row.  This is probably related to making joins efficient.

* Allow CDBI to work without a table

This is not so much about putting a CDBI class around a non-SQL database, though that would be handy, but to create objects from customized queries with calculated rows.  Particularly when a GROUP BY is involved.

* A bulk insert method
* A bulk delete method

Calling ->insert over and over again is inefficient.  Having to load an object only to delete it is even worse.  Bulk insert and delete methods would be handy.

For insert the syntax it could be as simple as...

    Class->bulk_insert({ foo => 42 }, { foo => 23 }, { foo => 99 });

I suggest a different method name to allow additional options to be passed to insert() at some point.  Delete would be much the same.

More information about the ClassDBI mailing list