[egenix-users] cursor.rowcount abnormalities and performance question

Wed Jul 30 15:48:39 CEST 2003

> -----Original Message-----
> From: egenix-users-bounces at lists.egenix.com
> [mailto:egenix-users-bounces at lists.egenix.com]On Behalf Of
> stuff at mailzilla.net
> Sent: Wednesday, July 30, 2003 2:43 PM
> To: egenix-users at lists.egenix.com
> Subject: [egenix-users] cursor.rowcount abnormalities and performance
> question
>
[questions on setting of rowcount deferred to Masrc-Andre]

> I generally use rowcount to determine whether I need to retreive
> records using fetchone inside a while loop (no need to
> execute the loop
> if rowcount is 0).  It's a little cleaner if I can reliably
> use rowcount
> but if the value doesn't sync with the actual result set then
> I can safely
> do something like this:
>
> while 1:
>    row = c.fetchone()
>    if not row: break
>    processTheRecord(row)
>
> If rowcount can be relied upon, I could do this, which is
> much cleaner:
>
> for i in range(c.rowcount):
>    row = c.fetchone

	row = c.fetchone()

>    processTheRecord(row)
>
>
> Performance question- Since I need to perform some
> manipulation on each
> record I believe it is more efficient to perform a while loop and
> fetchone rather than getting everything via fetchall and then
> iterating
> over that result (since theoretically, fetchall would be
> iterating to get
> each record so technically speaking that would require
> iterating twice).
> If this assumption is false, please let me know.  Perhaps
> there are some
> internal ODBC/mxODBC optimizations that allow fetchall to
> actually perform
> much faster than several fetchone's.  My concern is that if fetchall
> iterates through the result set to produce the list of rows
> and then I am
> iterating through the fetched list then I can accomplish the
> same goal w/
> 1 less iteration via a series of fetchone's.  Also, I imagine
> it would be
> safer to process each record individually then fetching the
> entire result
> set at once in terms of system resource utilization (i.e. if
> the result
> set were millions of records there would need to be data
> structs allocated
> to hold them).
>
In fact by using fetchone() you are likely to be less efficient, though
I don't have specific knowledge of the buffering employed by SQL Server.
I have certainly corresponded with other users who have started using
fetchall() on my advice and seen large performance improvements.

There's also a middle way, using fetchmany() to retrieve chunks of
several rows at a time. This avoids the necessity to buffer all query
output on the client side, which is the main inefficiency you'd be
likely to induce with indiscriminate use of fetchall(). But unless
*very* large datasets are involved, fetchall() is an easy way out:

	data = curs.fetchall()
	for row in data:
		processTheRecord(row)

> I'm curious to see if my reluctance to use fetchall is well-founded or
> overly paranoid.
>
> Thanks for any info,
>
Well, timings and memory usage measurements are probably going to be
your best guides, but I'd be prepared to bet that for record sets of
less than five thousand rows (and rows less than a kilobyte in size) the
convenience of fetchall() won't bring any noticeable penalty.

regards
--
Steve Holden                                 http://www.holdenweb.com/
Python Web Programming                http://pydish.holdenweb.com/pwp/