[CDBI] Make CDBI go fast

Michael G Schwern schwern at gmail.com
Thu Feb 15 17:30:41 GMT 2007

Perrin Harkins wrote:
>> Its a large, normalized schema on the order of 100+ tables.  Its
>> replicating existing business logic which cannot easily be changed so
>> its a bit crazy in places.  Some tables have in the order of a
>> million+ rows.
> That shouldn't be a problem, except when you execute a query that
> matches 500,000 rows and DBD::mysql tries to load them all into
> memory.  That's when you need to set "mysql_use_result".

The large datasets become a problem in two cases...

1) Creating large reports and thus having to create lots of CDBI objects

2) Searches which return large numbers of rows and must be ordered.

An example of #2 would be:

    SELECT name, address, zip, phone, last_contact
    FROM   customers
    WHERE  name LIKE '%Smith%';
    ORDER BY name asc, last_contact desc
    LIMIT 11
    OFFSET 0

For an example of "search for everyone named Smith and show me the first 10 ordered by name and the last time we had contact with them".  This sort of search can easily return 100,000 rows which then must be sorted (which MySQL does with a temp table and a filesort) in order to know what the first 11 are (10 + 1 so we know if there's another page's worth).

In reality its much worse because searches often involve REGEXP (in order to do a proper word search rather than substring) which is much slower than LIKE and/or MySQL's full text search engine (which isn't that great) and lots of joins and the fields to sort by are user selected and difficult to index.  MySQL doesn't use indexes to do ORDER BY when it has both ASC and DESC anyway.  The search code is very hairy to attempt to optimize all this.

So that's not a fetch issue, profiling shows the time is spent executing.  I mention it because maybe someone might know a better way.

>> Performance is generally good with most pages coming back in less than
>> 4 seconds but some can take hundreds.
> I doubt that the kind of changes you've talked about so far will make
> the difference between 10 seconds and hundreds of seconds.  That kind
> of change usually happens at the SQL/schema level.  It should be
> pretty easy to find out though, by profiling.

I'm hunting things down, but really the slow query log isn't showing much.  When it is the database its usually because MySQL had to do a filesort.  I'm examining the server I/O logs to see if the disks are getting bogged down.

You'd be surprised how long it takes to create 100,000 CDBI objects, inflate their columns, use their accessors, flesh out their groups....

There is, also, plenty of application code to blame.  Its a big app and I have a big stack of performance issues to look into.  CDBI is being looked at hard to see if we can get any site-wide speed ups from it.

> One thing that I noticed when looking at how our CDBI app performs is
> the way the database handles are always asked for with a method call,
> even in places where passing them would be really easy.  Many tiny
> methods in CDBI need a handle for something, and call db_Main, causing
> a $dbh ping.  I haven't checked how these add up in the profiler
> though.

Hasn't shown up in profiling.

More information about the ClassDBI mailing list