I heard an excellent presentation by CF giant Charlie Arehart yesterday. It was one of the "unconference" sessions title CfMythbusters. Later that day I was priveledged to share the mic with Charlie and talk bout CF Troubleshooting. Anyway, while discussing cfqueryparam Charlie said something that made me sit up a bit and say "huh?". It has long been the contention of myself and others that the use of Cfqueryparam benefits both security and performance when used against an RDBMS like SQL server or Oracle. While this is a generalization it usually holds true. Charlie, however, illustrated to me a case where cfqueryparam might be detrimental to performance and he was so convincing I thought I would share it with you.
NOTE: Check out the comments for some cavaets and opposing viewpoints. Also note that the tip on constants may not hold water. See this discussion on Brad Wood's blog for more insight on that item
First a quick primer to refresh your memory. The reason that cfqueryparam is often faster is that it allows an RDBMS to take advantage of pre-compiled execution plans. Consider this query.
The first time it sees it the DB server will pick out the best way to execute this query and save the query pattern allong with the execution plan. The DB Server might say to itself:
DB, everytime you see a query that matches "select fname, lname from users where ID IN (*an array of ids*)" make sure and use the xyz index.
Once that plan is in place the DB listener simply looks for the same pattern query. When it sees "select fname, lname from users where id in (*an array of ids*)" it says "Aha!" (or possiblye Eureka! if you are using the Greek character set) I know what to do with this one. So cfqueryparam does usually result in performance gains because it allows the DB server to figure out way in advance the best way to execute your queries and then use that method (the "cached" execution plan) without the need to figure it out again and again. Seems very simple right? What could go wrong?
Even given the facts above there are times when the use of cfqueryparam does not improve performance? At least one reason why has to do with that idea mentioned earlier of the first time. Think about it for a minute and it will become clear. If the efficiency of the execution plan is determined by the first time the query pattern is established, then that first query is important. It becomes the basis for any subsequent queries that match the pattern. But as you might have guessed, the most efficient way to run a query does sometimes depend on the variables you pass in to it. Take the example above. If the first time it is run the "idlist" is sequential as in 1,2,3,4 - then hypothetically the system might say to itself, "DB, the id is the primary key, clustered and unique - let's use a table scan" (again, hypothetically - this is not a discussion of DB execution plan algorithms).
Now consider what would happen if a subsequent query is a non-sequential list as in (55,18,20038,72,1). Even though an index might be a better choice for this second query, the cached execution plan will stick with the first plan. Do you see the problem? Subsequent queries that might benefit from a different plan are going to be shoehorned into whatever the DB came up with to begin with. The DB is going to sit in it's rocking chair like an old geezer and say, "I a-been a-doing it this way for 10 hours and I ain-a-gonna change fur you or anybody!"
I'm still wrapping my head around this issue, but here are a couple possibilities. First, make sure your queries use the same "type" of data in the same way. For example, I've seen function that are set up for search also be used to retrieve individual records. Consider the likely groups of params that will be passed into the query and build your queries to service that particular type or group of params. Don't be so married to code reuse that you build enormous queries that do everything for you in one query when 2 or 3 might actually be more efficient (do you really need to join the same table 4 times?). Finally there is the Calvinist method. Add index hints to predetermine the execution plan for the DB engine. I'm looking forward to some suggestions from the savvy CF/SQL readers in the audience.
NOTE: I used to have a tip here about making sure that static variables (i.e. "active = 1" where also paramed to insure the use of the plan cache. Now it turns out I was mistaken about that. See the comments below for some more definitive information. The comment from Christoropher Secord explains a way to test when your cache is updated or not (very useful - thanks Chris).
Hopefully this post gives you one more way to analyze your queries and a new insight into cfquery. As always, if you have constructive input I welcome and look forward to your comments.
http://www.codersrevolution.com/index.cfm/2008/7/2...
But I really don't know how often this sort of thing happens honestly... My gut says it's not very often.
Doesn't really matter in my case because DataFaucet automatically adds cfqueryparam tags for all my filters. ;) I'm just playing devil's advocate. But I'd have to put together a test case to know for certain whether that's accurate.
Everything I've read about query optimizers leads me to believe that, given the way CF does parameterized queries, no optimizer could generate 2 different plans for the same parameterized query. That is, in the case of a parameterized query with type information, the optimizer is going to pretty much ignore the data in the actual parameters and pay attention only to the columns the parameters are for. In fact, you can have the optimizer show you the execution plan for a parameterized query without ever setting the values for the parameters. I can't see a scenario where changing the value of the parameters would make the optimizer switch to a different plan. Honestly though, I'd love to see a specific example of how I've got this wrong.
Second, I'm also pretty sure you are wrong about unbound types causing the plan to not cache. Anecdotal testing as I write this leads me to believe that my plans are getting cached just fine, even with them. Again, I'd love to see some specific examples of it happening.
http://www.codersrevolution.com/index.cfm/2008/7/2...
Removing CfQueryParam made a huge difference in memory use (as did creating smaller SQL, buffering SQL to every 50 records and removing any use of Coldfusion components). I did quite a lot of testing and decided that when doing that many queries for trustable data that's being processed by simpler queries such as INSERT's, UPDATE's and small SELECT's, that it's best to avoid CfQueryParam as much as possible.
Anything that's not trustable or on a smaller scale I definitely use the tag. :)
Thanks for the great comments. Sorry about the hack script causing you problems. When writing about SQL it really does put a hamper on things.
The comments on cardinality got me thinking. I need to run some tests and follow up with you. The trick is finding a really large dataset to work with I think.
Regarding active = 1. The 1 is not going to be bound is it? If that is the case isn't the SQL server going to have to evaluate the variable for type so it knows what to do with it?
I need to find a way to test this through the drivers (maybe tracing would help). I can analyze in query analyzer and see the execution plans, but typing into query analyzer means everything is a constant so I'm seeing the plan after compiling.
I can view the cache hits in perfmon, but not on a per DB basis (I don't think). So I'm struggling with finding a way to test wether I'm hitting the cache or not without cfqueryparam.
Can you explain how you could examine the execution plan without actually setting the values?
Also, what about these scenarios.
1) A new index is added after the execution plan is established.
2) The data has grown very large from the time the execution plan was established.
3) An FK or other constraint was added.
Wouldn't all of these conditions also effect this - or would the RDBMS automatically "know" to recompile?
I don't completely disagree with your comment, but I can see instances where the execution plan first established would not suffice for later queries.... no?
I'm not surprised by the results of a 60k record import. CF is not the best tool for this task, and yes, it requires more memory to use cfqueryparam because the SQL passed from the driver now includes an array of variables and is more of an object instead of just a string - so I could see how this would be the case for such a task. What you have really done is a division of labor. Your web server has some breathing room and your DB works harder - and maybe this is good and accomplishes what you want.
However, I still hold that in most cases cfqueryparam will result in better performance - and my experience bears me out. What database server were you using?
I suppose execution time might tell me if I could completely isolate the environment - but many many things can effect query execution time. Regarding the sp_executeSQL ... I thought SPs had a separate compilation method and caching. I gather you are saying that, while the SP exucation plan is one thing - when used to execute dynamic SQL it actually affects the "other" cache (the query plan cache) - yes?
re: just checking the time -- I do most of my work on my notebook these days and so I test these things locally where I'm the only user and although there are other things that could affect it, the environment isn't as open as it would be on even a company dev a server. So I'll run it, restart the cf/sql services and run it again just to be sure. I know it's not the most precise method, but it works for me. :)
---------------------
Recompiling Execution Plans
Certain changes in a database can cause an execution plan to be either inefficient or invalid, given the new state of the database. SQL Server detects the changes that invalidate an execution plan, and marks the plan as invalid. A new plan must then be recompiled for the next connection that executes the query. The conditions that cause a plan to be invalidated include:
- Any structural changes made to a table or view referenced by the query (ALTER TABLE and ALTER VIEW).
- New distribution statistics generated either explicitly from a statement such as UPDATE STATISTICS or automatically.
- Dropping an index used by the execution plan.
- An explicit call to sp_recompile.
- Large numbers of changes to keys (generated by INSERT or DELETE statements from other users that modify a table referenced by the query).
- For tables with triggers, if the number of rows in the inserted or deleted tables grows significantly.
---------------------
I can also speak to SQL Server 7 and DB2.
For SQL Server 7 it was a known bug that if you altered the underlying table you had to go in and alter the text inside your CFQuery or CF would try to keep using an invalid statement handle (cached QEP). I believe this was worked-around in CF, but I believe it was also fixed between either SQL Server 2000 or 2005.
For DB2/400 .... woof. Query caching will be the death of me. DB2/400 is much, much, much more sensitive about these things. The optimizer is also significantly dumber than SQL Server's, so it'll do things like keep trying to reuse a QEP when it really really shouldn't. Occasionally it gets so confused that you have to actually go in and delete the QEP cache. Conversely, woe unto you if you delete the QEP cache and don't restart anything that was touching the database at the time (including the CF service).
As for how to see the plan without entering the variables, check out two things in SQL Server. You can do it manually (SET SHOWPLAN_TEXT ON), or you can do it automagically in the Query Analyzer by highlighting a block fo text and hitting Ctrl-L, which is a shortcut for Query - Display Estimated Execution Plan. Put a tracer on and let CF do its thing and you'll see how it walks through sp_prepare and that family of procedures. You can then go in and try one on your own, and get it to show you a plan without ever entering a single value. (I'd copy and paste it here, but I understand your comment filter is tricky.)
I'm using SQL Server 2000. Managed to get things running great when I grouped all the SQL into a stored procedure and buffered the calls into an array for executing every 50 together in one CFQuery tag with no cfqueryparams. Monitoring memory and performance this seemed by far the best from the Coldfusion side of things.
I totally agree with using CfQueryParam for everything else though :)
@ike
Haven't done much with BCP before, I'll take a look at it but from what I can tell it might not do enough for the job.
Awesome awesome information - thank you so much for posting. That is really helpful. I never say that "estimated execution plan" choice before. I always set it to on in the menu and the ran an actual query. Feel free to enter your comment too ... I have altered how the "comments" field works.
David,
It sounds like you have your process well in hand. I would only add one tip. For reading in a text file you should consider creating a DTS package. You can still call it from CF if you like. There are lots of possibilities including dynamic file names and even automatic FTP retrieval. We do a DTS package for option quotes that consumes a 500k+ record file in about 12 seconds using the SQL DTS task scheduler. The fact is that a file DB type driver is much better at reading and parsing than anything you can do in CF (at least that's my experience).
I'm really unsure about this assertion that the values being passed by the <cfqueryparam> tag can influence how the execution plan is created. Certainly *looking* at the execution plans being generated by SQL 2005 does not bear this out (I haven't checked Oracle yet).
If one has a hard-coded SQL statement which has WHERE x IN (1,2,3,4), then I could see how the execution plan might go "right, well it's those exact rows I need to fetch, and they're the first four rows (how it divines this, I don't know), so I'll just do a table scan". Looking at the plan generated from this sort of thing, no such inferences are made about the proximity to the "start" of the table, but still: I could see how in theory it could work that way.
However if one's passing in parameters for the list elements, eg:
WHERE x IN (?,?,?,?), then the DB is not going to consider what the data could be (that would just be dumb), it's going to base its plan on the schema of the objects involved ("x has got a unique index on it, I'll use that").
However, in considering all this, one interesting thing I've discovered - which didn't occur to me before - is that if the length of the list in the <cfqueryparam> varies, a new execution plan is generated for it, because there's a different number of parameters being passed in: each element in the list is a separate parameter; it's not simply one parameter (I knew this, I just didn't see the ramifications of it); this means the SQL string being passed in is different, and the reuse of an execution plan is keyed on the SQL string (or, I imagine, a hash of it).
--
Adam
That isn't true, at least not with Microsoft SQL server. The handles that point to the plans in SQL server are stored in the table, sys.dm_exec_cached_plans. Count the rows in that table, then execute the example query you have in your blog (a query with a constant, like "active=1" and a cfqueryparam). You'll see that the row count increases by 1. The query has been cached.
Now change the value of the param. This will tell us if SQL server actually *uses* the plan. You'll see that the row count doesn't increase this time. However, if you change the constant to "active=2" then you'll see the row count increase.
Clearly, SQL server will cache this query and make use of the cache.