Fetch-on-demand iterators (was Re: [CDBI] Make CDBI go fast)

Tim Bunce Tim.Bunce at pobox.com
Thu Feb 15 19:52:23 GMT 2007


On Thu, Feb 15, 2007 at 11:40:02AM -0500, Michael G Schwern wrote:
> Here's a lump reply to the concerns about having iterators fetch on demand.
> 
> * Make iterators fetch rows on demand. (Done: see rt.cpan.org 24959)
> 
> > Keep in mind that the behavior of the DBD classes when fetching rows
> > from a handle varies.  With MySQL, it will fetch all the rows into
> > memory as soon as you execute the statement, unless you explicitly
> > tell it not to.  With PostgreSQL, I think you have to use cursors to
> > prevent it from loading all the rows when you execute.

It's worth distinguishing between db client memory and perl memory.
The fetch-all-by-default behaviour of the mysql client is *far* more
memory efficient than the fetch-all-by-default behaviour of CDBI.

> Whatever the backend implementation, it sure sped things up.  The performance of search()->first when the search returns a million rows went from about 2 seconds to almost negligable.  Remember, even if the database is doing the same amount of work it still has to transfer it to your process and then run it through DBI and then bind it all into a list of hashes.
> 
> > I'd rather use the memory then do extra trips to the database
> 
> Do not be fooled!  There really is no "give me everything" bulk row loading optimization in DBI.  selectall/fetchall is just doing $sth->fetch in a loop!

True. What selectall/fetchall saves you is crossing the DBI dispatch
threshold for every row. In practice, though, that cost is small and
the cost of allocating the extra memory to store the row is often larger.

See third chunk of http://search.cpan.org/src/TIMB/DBI_AdvancedTalk_2004/sld017.htm

> This new iterator code does not do any extra trips to the database.  It should result in pretty much the same performance as before even when you do run through the whole iterator (its currently a bit slower.  I'm working on that).

It's possibly slower because you're using fetchrow_hashref.

If you weren't then it might even be faster than the old code
because you're avoiding the need to allocate storage for the rows.

Tim.



More information about the ClassDBI mailing list