ColdFusion Muse

Funny Coding Errors - Take 2

I was reminded of this very humorous incident recently while chatting with some friends. A very good friend of mine who was a novice web programmer was just beginning to stretch his wings using Coldfuion and SQL. The site he was working on (which shall remain nameless) used an Interbase DB server. My friend Bob (let's call him Bob) was interested in tracking page views for some news stories and articles he was writing. He created a new table with a few columns and wrote to it with each request - logging the page id, news story and IP address. It seemed to work splendidly... at least for a while....

A few weeks later Bob called me frantically. His news pages were slowing to a crawl. it could take as long as 200 seconds to display a story! What was wrong? I told him I would look at it. I had the sys admin turn on debugging briefly and I examined the debug code. It took 2 seconds to see that there was a single query taking up most of the execution time all by itself . Upon further examination I found this code:

<cfquery name="getLog" datasource="myDsn">
   SELECT    *
   FROM   logTable
   ORDER BY log_id asc
</cfquery>

<Cfset logIdList = valueList(getLog.log_id)>

<Cfset newLogId = listlast(logIdList) + 1>

<cfquery name="writeLog" datasource="myDsn">
   INSERT INTO logTable
      (log_id, page_id, story_id, Ip)
   VALUES
      (#newLogid#,#page_id#,#story_id#,#ip#)
</cfquery>

Those of you who see whats wrong may already be chortling a bit. Bob was using an int for a primary key - "log_id". Since there was presumably no identity field or autonumber field in Interbase, Bob was forced to "figure out" what the next ID in the sequence should be. He needed to get the last "log_id" that was added to the table, add one to it and use that for the log_id on his insert. His solution utilized what he knew - Coldfusion list functions. He selected all the data from the log table and ordered it by the ID from lowest to highest. He made a list of the ids using the valuelist function. He knew from experience that the last item on that list would be the maximum number in the table. He added 1 to it and Viola! - he had his new key.

Now this actually worked for Bob. The site performed pretty well for at least a week. But over time it got progressively slower and slower. Why? Because there were a lot of rows building up in that table. By the time he called me there were 214,000 rows being selected every time the news page was called. In fact, I was surprised that the page ever completed at all. It seems that Interbase is capable of passing large datasets to CF and CF is capable of building a list of 214,000 elements - who knew! The fix? Something like this (I don't remember if the interbase syntax was different).

<cfquery name="getLog" datasource="myDsn">
   SELECT    max(log_id) + 1 AS newId
   FROM   logTable
   ORDER BY log_id asc
</cfquery>


<cfquery name="writeLog" datasource="myDsn">
   INSERT INTO logTable
      (log_id, page_id, story_id, Ip)
   VALUES
      (#getLog.newId#,#page_id#,#story_id#,#ip#)
</cfquery>

Bob was just happy to have his page go from 190 seconds down to 200 milliseconds, But it brings up a point. If you are looking to expand your skills as a developer, you will be well-served to learn all you can about SQL. What's the number 1 performance problem on CF sites (or on any web site)? It's interaction with the database. It's shocking how often I run into code from otherwise good developers where they do things like - select all the rows in a table to get the "recordcount" attribute, or loop through rows calculating an average or adding things together. Let the database work for you! Learn about aggregate functions, grouping, unions, subqueries, stored procedures, defaults, functions, UDFs etc. If you are using a sophisticated DB system like MS SQL server, Oracle, DB2, (and many others), take the time to learn how to leverage those platforms. Such products are among the most finely tuned and optimized pieces of business software ever created. Why not let that work for you!

Comments
Mike Rankin's Gravatar I don't know anything about interbase, but mssql has a set of handy identity functions that can be used as well. If you have multiple tables that get sequential inserts, you can write stored procedures that utilize the scope_identity() function to synchronize your key fields. The proc can even return the next id for use in your cf code. The best thing is it's thread-safe
# Posted By Mike Rankin | 6/10/05 2:42 PM
mark kruger's Gravatar Great extra tip - thanks!
# Posted By mark kruger | 6/10/05 2:48 PM
Daniel Roberts's Gravatar Something unrelated to SQL or novice programmers, etc but related to execution time. Building up a CSV file was taking a long time. This solution was published just when I needed it. Java StringBuffer concatenation ended up reducing the time for building up a large file from minutes to seconds.

http://www.findarticles.com/p/articles/mi_m0MLU/is...
# Posted By Daniel Roberts | 6/10/05 3:20 PM
Mark Kruger's Gravatar Great tip - thanks Daniel.
# Posted By Mark Kruger | 6/11/05 9:22 AM
Tim Lucas's Gravatar You should probably be wrapping these two in a transaction, or put the query directly into the insert statement. You could have a race condition resulting in trying to insert two records with the same id.

If your DBMS supports it:
&lt;cfquery name="writeLog" datasource="myDsn"&gt;
&nbsp;&nbsp;INSERT INTO logTable
&nbsp;&nbsp;&nbsp;(log_id, page_id, story_id, Ip)
&nbsp;&nbsp;VALUES
&nbsp;(
&nbsp;&nbsp;(SELECT max(log_id) + 1 FROM logTable),
&nbsp;&nbsp;#page_id#,
&nbsp;&nbsp;#story_id#,#ip#
&nbsp;)
&lt;/cfquery&gt;
# Posted By Tim Lucas | 6/11/05 12:24 PM
Tim Lucas's Gravatar I never know what to enter...

Here's what it was meant to be:
<cfquery name="writeLog" datasource="myDsn">
INSERT INTO logTable (log_id, page_id, story_id, Ip)
VALUES
(
(SELECT max(log_id) + 1 FROM logTable),
#page_id#,
#story_id#,
#ip#
)
</cfquery>
# Posted By Tim Lucas | 6/11/05 12:30 PM
mark kruger's Gravatar I'm not sure the target DB (interbase) supports sub queries - and as I said in the post, it was a few years ago. exactly HOW I solved it was not really my point. Using cftransaction or some other mechanism to support key integrity and avoid deadlocks is a given - It just wasn't the point of the post.
# Posted By mark kruger | 6/11/05 5:46 PM
Tim Lucas's Gravatar Yep no worries... it was more for the benefit of those less enlightened.

Thanks for the good post btw.

I've had similar experiences in my consulting where they've moved from Access to SQL Server but haven't spent adequate time to grok the differences and advantages of the new platform.

As most CFMX apps are so DB centric, I think it really does pay to know your DBMS like the back of your hand.
# Posted By Tim Lucas | 6/11/05 7:37 PM
Daryl Banttari's Gravatar Actually, wrapping it in a transaction isn't enough. You're still left with the race condition where two pages running simultaneously get the same ID.

I just posted a method that works in a DBMS-independent and thread-safe manner here:

http://www.webapper.net/index.cfm/2007/3/1/A-Datab...
# Posted By Daryl Banttari | 3/1/07 12:29 PM
ike's Gravatar Just saw this linked from your most recent blog. Was a little surprised to see that you didn't mention the race condition... should be a named cflock around that query or at the very least an isolated transaction. And I've never used Interbase, but unless I needed the log id for something other than logging the hit, I'd probably have tried to write it all in a single cfquery pair using the select statement as a subquery. I'd expect that to perform even better, although it might not be noticeable to a human observer. I definitely agree that people should learn all they can about SQL - the DataFaucet ORM I've been working on for a few years actually supports unions for example, although it's not often that my schema needs one.
# Posted By ike | 8/13/08 8:44 AM
ike's Gravatar Oh... 'guess I should have read the other comments first... :) in too much of a hurry I guess...
# Posted By ike | 8/13/08 8:46 AM



Blog provided and hosted by CF Webtools. Blog Sofware by Ray Camden.