[CDBI] How well does Class::DBI scale?

Rick Welykochy rick at praxis.com.au
Sun Jan 22 04:30:19 GMT 2006

Thanks to all on the list who responded to my inquiries about
improving spped and efficiency in an app built around CDBI.

Rick Welykochy wrote:

> The problem is that for a relatively simple report of, say, 20 rows from
> the top-most table (hierachically speaking), upwards of 4000 (or more)
> individual SQL statements are fired off to the DB server (PostgreSQL in
> this case). This is because the top-most table refers to relatively
> trivial great-grandchild tables often for simple things like the name
> of something whoe primary key is a (serial) ID.

To summarise:

1. I did hand-optimise some queries, then manually create the required
    CDBI data objects using the construct() method. This is a very time-
    consuming process, and requires lots of testing and fiddle-farting
    around with rather large SQL statements. The results are worth it if
    you have the time to spend on it. Some of the optimisations results in
    one SQL statement creating hundreds of CDBI objects - a great improvement
    in speed and responsiveness. But, this approach kinda defeats the purpose
    of  CDBI, don't you think?

2. I audited the use of Essential columns in the app. Practically non-existent.
    Since I was really getting tired of step 1. above, I diverted my attention
    for a day and added all columns that were not bytea (or BLOB) to the
    Essential columns lists for all 50 or so tables. The speed improvement
    was anywhere from 1.5 to 2.5 X faster (with correspondingly less SQL
    being fired at the PostgreSQL server). I'm happy with that, and it
    took less than a days work and required very little testing since the
    application code was not touched.

My client is happy with the results and I can now move on to writing
more application code.

My conclusion and advice: in these days of large memories, negafast CPUs
and heaps of disk space, err on the side of too many columns in Essential,
i.e. does it really matter if your query loads 5 columns or 25? Nope, I don't
think so. It is far more efficient (programmer-wise) to simply load
all columns that ain't really really big (ignore blobs) for all rows
from all tables. This guarantees that only rarely will CDBI have to return
to the database a second time to get more data for a row.

Like many optimisations, I could spend months more getting another
speed increase by a factor of 2 X or even 10 X on certain pages, but
it is not worth the effort.


Rick Welykochy || Praxis Services

Web (hosting), security and high-performance computing are the
three areas where Linux has more strength.
      -- Bob Muglia, senior VP in charge of Windows Server development

More information about the ClassDBI mailing list