ColdFusion Muse

Why You should worry about your execution plan

No I'm not talking about dead man walking. I'm talking about your database execution plan. I want to give fair warning to all of you Microsoft haters out there (and you know who you are) that I'm going to use lingo from Microsoft SQL Server 2000. It's a ubiquitous and full-featured database with good documentation regarding this subject. Much of what is said here applies to other databases as well (no doubt using different lingo). So please, feel free to post comments of how Oracle or MySQL or PostgreSQL or Interbase or your flat file - all have a great way of doing this. But please don't post about how your favorite DB is so great and Microsoft is the spawn of Satan. That's not helpful and it's makes me want to poke out my eye with an ice pick! Whew! Now that that's out of the way.

The database execution plan is the series of steps that a database takes to deliver a particular query or task. These steps are cached on the DB server. When you run a query it looks in the cache for a plan that matches. If it finds one, it uses it. If not, it creates a new one. Why is this important? Because the more often your DB Server finds a matching plan in the cache, the better it performs. In fact, it can run significantly faster when it is not tasked with constantly building execution plans from scratch. Here's the rub, much of the query code written in Coldfusion requires the RDBMS to compile a new execution plan. Here's why.

The Execution Context

When you run a query the database determines the execution context. The execution context is used to hold the data necessary to run the query. Think of it as an array of parameters. Consider this example.
<cfquery name="get" datasource="#datadsn#">
   SELECT * FROM users
   WHERE    username = 'BOB'
   AND    password = 'BOBSPASSWORD'
   AND      active = 1
</cfquery>
The execution context here is:
paramater 1 = Bob
parameter 2 = BobsPassword
parameter 3 = 1
You can see there are really 2 parts to the query that the DB needs to run the query. There is the context - the parameters to use in the query, and there is the execution code. This is comprised of the keywords and syntax. The DB server parses the execution code, inserting the context variables in the right spot, and then builds a query plan.

The query plan

The query plan is the steps necessary to run the query. In it's simplest form:

<cfquery name="get" datasource="#datadsn#">
   SELECT * FROM users
   WHERE    username = [context param 1]
   AND    password = [context param 2]
   AND      active     = [context param 3]
</cfquery>
To use the plan, the server only needs the context - the rest is already pre-compiled. So, if could pass in an array of three parameters with a query that matches this one I could get the plan to match and hit the cache. Cool!

Coldfusion Gotcha

You probably knew there was a catch didn't you. You see there is one more piece of data that the DB server needs to make a hit on the cache. It needs to know the "type" of the data in the context. Why? For 2 reasons. 1) Databases support conversion. For example, if you pass in 2.43 to a field that holds an INT you will end up with 2. The DB will know what to do with it. If you search "username = 1" (no single quotes) - the db may know enough to treat it as a string. 2) Databases have to "sort out" the list of key words from the list of parameters. If you do not specify a "type" for a parameter you force the DB to parse through the whole string and sort out the execution code from the context.

Let me illustrate it with some code. If you write a query like the one above, here's what the SQL server will see.

<cfquery name="get" datasource="#datadsn#">
   SELECT * FROM users
   WHERE    username = [some unknown type of variable]
   AND    password = [some unknown type of variable]
   AND      active     = [some unknown type of variable]
</cfquery>
The DB server will then have to take what it does know (the column names) and "look up" the type of the column. So it would first look and see that username and password are both strings, and treat parameters 1 and 2 as strings. Then it would look up the column "active" and examine parameter 4 to "see" if it tested correctly as an INT. Then it would compile an execution plan based on what it had discovered. You are not "Telling" the DB enough to be efficient.

The Solution - CFQUERYPARAM

The solution is CFQUERYPARAM. With it you provide a "typed execution context". You tell the DB enough to look up the plan straightway. It doesn't even need to look for the column name types and match them with the context. Why? Because if it finds a cached execution plan it means that this query has been run successfully before, and the bindings ensure that the context data is typed correctly. It has a measure of guarantee that the plan will succeed. So, using the same query:

<cfquery name="get" datasource="#datadsn#">
   SELECT * FROM users
   WHERE    username = <cfqueryparam cfsqltype="CF_SQL_CHAR" value="BOB">
   AND    password = <cfqueryparam cfsqltype="CF_SQL_CHAR" value="BOBSPASSWORD">
   AND      active = <cfqueryparam cfsqltype="CF_SQL_INTEGER" value="1">
</cfquery>
we "tell" the DB server enough that it can skip a few steps and hit the cached plan. Here's what the server sees.
<cfquery name="get" datasource="#datadsn#">
   SELECT * FROM users
   WHERE    username = [a String variable]
   AND    password = [a String variable]
   AND      active     = [an INT variable]
</cfquery>

There are other reasons to use data binding (SQL Injection attack prevention is chief among them), but the performance benefit can be quite dramatic. I have seen as much as a 40% decrease in query execution time - just by adding cfqueryparam. Here's another tip, add the shell for CF_SQL_CHAR and CF_SQL_INTEGER to your IDE as a short cut (snippet). It will save you a lot of typing. These are by far the 2 most common types.

One more tip. On MS SQL server, fully qualifying the objects in the query can further increase your chances of hitting the cache. So "SELECT * FROM dbo.users" has a better chance than just "SELECT * FROM users". SQL 2000 is far superior to SQL 7 in this regard - but even it can benefit from fully qualified objects.

Comments
Brian Rinaldi's Gravatar Thanks for posting this. It is often hard to explain the benefits of using cfqueryparam, but you do a very good job of it. I have shared this with my coworkers as I have found my explanations lacking in the past in making the argument for using cfqueryparam.
# Posted By Brian Rinaldi | 6/28/05 9:52 AM
Ryan's Gravatar If you are only going to use a few columns from a table, using SELECT * FROM users is not as efficient as Select colA, colB From users.

By replacing the wildcard, * with the column names you can speed up the results. Not only is it fast for the database to determine which columns to return but in the case of wide tables (with long varchar columns) if you only need to return a few columns you are cutting back considerably on the quantity of data being processed and returned to ColdFusion.

i.e. a table with 15 columns, first name, last name, address, job title, and maybe a big long varchar 255. If you only need their primary key value and first name, select mykey, firstname from table will save on hundreds of characters per row.
# Posted By Ryan | 6/28/05 11:32 AM
mkruger's Gravatar Excellent point - You shouldn't be returning back a huge record set if you only intend on using a few columns... and specifying column names is always good form (in my book). I was using "*" as an example. In fact, most of the samples you find will use an "*" - which may explain the ubiquity of the practice (ha).
# Posted By mkruger | 6/28/05 12:05 PM
HKS's Gravatar Would having cfqueryparam in the where clause make a difference if the value is to be hardcoded?
# Posted By HKS | 7/4/05 10:56 PM
mkruger's Gravatar Yes - it would make a difference. The DB server is agnostic of your code choices. It doesn't "know" that you used "where id = 34" instead of "where id = #somevar#". It only knows that without the proper bind data it has to "figure out" what kind of data type is "34" - see what I mean?
# Posted By mkruger | 7/5/05 11:41 AM
HKS's Gravatar Yup. Makes perfect sense. Thanks for that.
# Posted By HKS | 7/6/05 8:16 AM
HKS's Gravatar Yup. Makes perfect sense. Thanks for that.
# Posted By HKS | 7/6/05 8:43 AM
Adedeji Olowe's Gravatar Unless you are developing general freebie applications, it is always faster and safer to leverage on the database by embedding your logic and some of your complex queries into view, packages, stored procedures and user defined functions.

Not only does you application run far faster than client side (application server side) codes, they are also a lot more secured. Also, in an enterprise environment, it is far easier for disparate frameworks (J2EE and .NET for example) to share business logics on the DB than some expensive duck-in-a-pond integration middleware.

Why does the DB code fly? Because they are already precompiled and they are just waiting to run. Embedded logics are usually strongly typed so it is a lot more secured.

The only grouse developers have is the fact that it impossible to migrate packages, stored procedures and functions from say Oracle to MS SQL Server. But for enterprise applications, specific databases are usually pre-certified which are hardly changed.
# Posted By Adedeji Olowe | 4/27/06 1:46 AM
mkruger's Gravatar Adedeji,

Thanks for you input. Your comments represent an excellent viewpoint and they are helpful to the discussion.

-Mark
# Posted By mkruger | 4/27/06 8:53 AM



Blog provided and hosted by CF Webtools. Blog Sofware by Ray Camden.