I heard an excellent presentation by CF giant Charlie Arehart yesterday. It was one of the "unconference" sessions title CfMythbusters. Later that day I was priveledged to share the mic with Charlie and talk bout CF Troubleshooting. Anyway, while discussing cfqueryparam Charlie said something that made me sit up a bit and say "huh?". It has long been the contention of myself and others that the use of Cfqueryparam benefits both security and performance when used against an RDBMS like SQL server or Oracle. While this is a generalization it usually holds true. Charlie, however, illustrated to me a case where cfqueryparam might be detrimental to performance and he was so convincing I thought I would share it with you.
NOTE: Check out the comments for some cavaets and opposing viewpoints. Also note that the tip on constants may not hold water. See this discussion on Brad Wood's blog for more insight on that item
First a quick primer to refresh your memory. The reason that cfqueryparam is often faster is that it allows an RDBMS to take advantage of pre-compiled execution plans. Consider this query.
Once that plan is in place the DB listener simply looks for the same pattern query. When it sees "select fname, lname from users where id in (*an array of ids*)" it says "Aha!" (or possiblye Eureka! if you are using the Greek character set) I know what to do with this one. So cfqueryparam does usually result in performance gains because it allows the DB server to figure out way in advance the best way to execute your queries and then use that method (the "cached" execution plan) without the need to figure it out again and again. Seems very simple right? What could go wrong?
Even given the facts above there are times when the use of cfqueryparam does not improve performance? At least one reason why has to do with that idea mentioned earlier of the first time. Think about it for a minute and it will become clear. If the efficiency of the execution plan is determined by the first time the query pattern is established, then that first query is important. It becomes the basis for any subsequent queries that match the pattern. But as you might have guessed, the most efficient way to run a query does sometimes depend on the variables you pass in to it. Take the example above. If the first time it is run the "idlist" is sequential as in 1,2,3,4 - then hypothetically the system might say to itself, "DB, the id is the primary key, clustered and unique - let's use a table scan" (again, hypothetically - this is not a discussion of DB execution plan algorithms).
Now consider what would happen if a subsequent query is a non-sequential list as in (55,18,20038,72,1). Even though an index might be a better choice for this second query, the cached execution plan will stick with the first plan. Do you see the problem? Subsequent queries that might benefit from a different plan are going to be shoehorned into whatever the DB came up with to begin with. The DB is going to sit in it's rocking chair like an old geezer and say, "I a-been a-doing it this way for 10 hours and I ain-a-gonna change fur you or anybody!"
I'm still wrapping my head around this issue, but here are a couple possibilities. First, make sure your queries use the same "type" of data in the same way. For example, I've seen function that are set up for search also be used to retrieve individual records. Consider the likely groups of params that will be passed into the query and build your queries to service that particular type or group of params. Don't be so married to code reuse that you build enormous queries that do everything for you in one query when 2 or 3 might actually be more efficient (do you really need to join the same table 4 times?). Finally there is the Calvinist method. Add index hints to predetermine the execution plan for the DB engine. I'm looking forward to some suggestions from the savvy CF/SQL readers in the audience.
NOTE: I used to have a tip here about making sure that static variables (i.e. "active = 1" where also paramed to insure the use of the plan cache. Now it turns out I was mistaken about that. See the comments below for some more definitive information. The comment from Christoropher Secord explains a way to test when your cache is updated or not (very useful - thanks Chris).
Hopefully this post gives you one more way to analyze your queries and a new insight into cfquery. As always, if you have constructive input I welcome and look forward to your comments.
http://www.codersrevolution.com/index.cfm/2008/7/2...
But I really don't know how often this sort of thing happens honestly... My gut says it's not very often.
Doesn't really matter in my case because DataFaucet automatically adds cfqueryparam tags for all my filters. ;) I'm just playing devil's advocate. But I'd have to put together a test case to know for certain whether that's accurate.
Everything I've read about query optimizers leads me to believe that, given the way CF does parameterized queries, no optimizer could generate 2 different plans for the same parameterized query. That is, in the case of a parameterized query with type information, the optimizer is going to pretty much ignore the data in the actual parameters and pay attention only to the columns the parameters are for. In fact, you can have the optimizer show you the execution plan for a parameterized query without ever setting the values for the parameters. I can't see a scenario where changing the value of the parameters would make the optimizer switch to a different plan. Honestly though, I'd love to see a specific example of how I've got this wrong.
Second, I'm also pretty sure you are wrong about unbound types causing the plan to not cache. Anecdotal testing as I write this leads me to believe that my plans are getting cached just fine, even with them. Again, I'd love to see some specific examples of it happening.
http://www.codersrevolution.com/index.cfm/2008/7/2...
Removing CfQueryParam made a huge difference in memory use (as did creating smaller SQL, buffering SQL to every 50 records and removing any use of Coldfusion components). I did quite a lot of testing and decided that when doing that many queries for trustable data that's being processed by simpler queries such as INSERT's, UPDATE's and small SELECT's, that it's best to avoid CfQueryParam as much as possible.
Anything that's not trustable or on a smaller scale I definitely use the tag. :)
Thanks for the great comments. Sorry about the hack script causing you problems. When writing about SQL it really does put a hamper on things.
The comments on cardinality got me thinking. I need to run some tests and follow up with you. The trick is finding a really large dataset to work with I think.
Regarding active = 1. The 1 is not going to be bound is it? If that is the case isn't the SQL server going to have to evaluate the variable for type so it knows what to do with it?
I need to find a way to test this through the drivers (maybe tracing would help). I can analyze in query analyzer and see the execution plans, but typing into query analyzer means everything is a constant so I'm seeing the plan after compiling.
I can view the cache hits in perfmon, but not on a per DB basis (I don't think). So I'm struggling with finding a way to test wether I'm hitting the cache or not without cfqueryparam.
Can you explain how you could examine the execution plan without actually setting the values?
Also, what about these scenarios.
1) A new index is added after the execution plan is established.
2) The data has grown very large from the time the execution plan was established.
3) An FK or other constraint was added.
Wouldn't all of these conditions also effect this - or would the RDBMS automatically "know" to recompile?
I don't completely disagree with your comment, but I can see instances where the execution plan first established would not suffice for later queries.... no?
I'm not surprised by the results of a 60k record import. CF is not the best tool for this task, and yes, it requires more memory to use cfqueryparam because the SQL passed from the driver now includes an array of variables and is more of an object instead of just a string - so I could see how this would be the case for such a task. What you have really done is a division of labor. Your web server has some breathing room and your DB works harder - and maybe this is good and accomplishes what you want.
However, I still hold that in most cases cfqueryparam will result in better performance - and my experience bears me out. What database server were you using?
I suppose execution time might tell me if I could completely isolate the environment - but many many things can effect query execution time. Regarding the sp_executeSQL ... I thought SPs had a separate compilation method and caching. I gather you are saying that, while the SP exucation plan is one thing - when used to execute dynamic SQL it actually affects the "other" cache (the query plan cache) - yes?
re: just checking the time -- I do most of my work on my notebook these days and so I test these things locally where I'm the only user and although there are other things that could affect it, the environment isn't as open as it would be on even a company dev a server. So I'll run it, restart the cf/sql services and run it again just to be sure. I know it's not the most precise method, but it works for me. :)
---------------------
Recompiling Execution Plans
Certain changes in a database can cause an execution plan to be either inefficient or invalid, given the new state of the database. SQL Server detects the changes that invalidate an execution plan, and marks the plan as invalid. A new plan must then be recompiled for the next connection that executes the query. The conditions that cause a plan to be invalidated include:
- Any structural changes made to a table or view referenced by the query (ALTER TABLE and ALTER VIEW).
- New distribution statistics generated either explicitly from a statement such as UPDATE STATISTICS or automatically.
- Dropping an index used by the execution plan.
- An explicit call to sp_recompile.
- Large numbers of changes to keys (generated by INSERT or DELETE statements from other users that modify a table referenced by the query).
- For tables with triggers, if the number of rows in the inserted or deleted tables grows significantly.
---------------------
I can also speak to SQL Server 7 and DB2.
For SQL Server 7 it was a known bug that if you altered the underlying table you had to go in and alter the text inside your CFQuery or CF would try to keep using an invalid statement handle (cached QEP). I believe this was worked-around in CF, but I believe it was also fixed between either SQL Server 2000 or 2005.
For DB2/400 .... woof. Query caching will be the death of me. DB2/400 is much, much, much more sensitive about these things. The optimizer is also significantly dumber than SQL Server's, so it'll do things like keep trying to reuse a QEP when it really really shouldn't. Occasionally it gets so confused that you have to actually go in and delete the QEP cache. Conversely, woe unto you if you delete the QEP cache and don't restart anything that was touching the database at the time (including the CF service).
As for how to see the plan without entering the variables, check out two things in SQL Server. You can do it manually (SET SHOWPLAN_TEXT ON), or you can do it automagically in the Query Analyzer by highlighting a block fo text and hitting Ctrl-L, which is a shortcut for Query - Display Estimated Execution Plan. Put a tracer on and let CF do its thing and you'll see how it walks through sp_prepare and that family of procedures. You can then go in and try one on your own, and get it to show you a plan without ever entering a single value. (I'd copy and paste it here, but I understand your comment filter is tricky.)
I'm using SQL Server 2000. Managed to get things running great when I grouped all the SQL into a stored procedure and buffered the calls into an array for executing every 50 together in one CFQuery tag with no cfqueryparams. Monitoring memory and performance this seemed by far the best from the Coldfusion side of things.
I totally agree with using CfQueryParam for everything else though :)
@ike
Haven't done much with BCP before, I'll take a look at it but from what I can tell it might not do enough for the job.
Awesome awesome information - thank you so much for posting. That is really helpful. I never say that "estimated execution plan" choice before. I always set it to on in the menu and the ran an actual query. Feel free to enter your comment too ... I have altered how the "comments" field works.
David,
It sounds like you have your process well in hand. I would only add one tip. For reading in a text file you should consider creating a DTS package. You can still call it from CF if you like. There are lots of possibilities including dynamic file names and even automatic FTP retrieval. We do a DTS package for option quotes that consumes a 500k+ record file in about 12 seconds using the SQL DTS task scheduler. The fact is that a file DB type driver is much better at reading and parsing than anything you can do in CF (at least that's my experience).
I'm really unsure about this assertion that the values being passed by the <cfqueryparam> tag can influence how the execution plan is created. Certainly *looking* at the execution plans being generated by SQL 2005 does not bear this out (I haven't checked Oracle yet).
If one has a hard-coded SQL statement which has WHERE x IN (1,2,3,4), then I could see how the execution plan might go "right, well it's those exact rows I need to fetch, and they're the first four rows (how it divines this, I don't know), so I'll just do a table scan". Looking at the plan generated from this sort of thing, no such inferences are made about the proximity to the "start" of the table, but still: I could see how in theory it could work that way.
However if one's passing in parameters for the list elements, eg:
WHERE x IN (?,?,?,?), then the DB is not going to consider what the data could be (that would just be dumb), it's going to base its plan on the schema of the objects involved ("x has got a unique index on it, I'll use that").
However, in considering all this, one interesting thing I've discovered - which didn't occur to me before - is that if the length of the list in the <cfqueryparam> varies, a new execution plan is generated for it, because there's a different number of parameters being passed in: each element in the list is a separate parameter; it's not simply one parameter (I knew this, I just didn't see the ramifications of it); this means the SQL string being passed in is different, and the reuse of an execution plan is keyed on the SQL string (or, I imagine, a hash of it).
--
Adam
That isn't true, at least not with Microsoft SQL server. The handles that point to the plans in SQL server are stored in the table, sys.dm_exec_cached_plans. Count the rows in that table, then execute the example query you have in your blog (a query with a constant, like "active=1" and a cfqueryparam). You'll see that the row count increases by 1. The query has been cached.
Now change the value of the param. This will tell us if SQL server actually *uses* the plan. You'll see that the row count doesn't increase this time. However, if you change the constant to "active=2" then you'll see the row count increase.
Clearly, SQL server will cache this query and make use of the cache.
Regarding the post David made below... the severe memory use is due to some bug in CF/MSSQL... this usage of memory and crashing of the server will happen to any of you using CFQUERYPARAM for high traffic sites IF you have the CFA DataSource setting:
Max Pooled Statements
set to 1000 (which I think is the default)... I am not sure what the effect of other settings are, but if you set this to ZERO you will no longer have the memory leak crash your server... I had a server that was running great... over time we started adding CFQUERYPARAM to all our SQL and then our server slowly became less and less stable ultimately requiring 4 to 5 restarts of CF a day (and RAM usage going to 900+ mb)...
We set this to zero and the server has not been restarted since and CF hovers at 625mb of RAM... it has made a world of difference on every CF server we have set this to...
Someone who knows better than I do may say that setting that to zero means CF cannot actually cache any execution plans... that is fine by me until this bug is fixed... :)
This happens in CF8 and maybe CF7 (but haven't verified)...
A bigger deal should be made about this in the CF community... :)
Patrick
----- david's original post -----
I've run into a different type of performance problem with CfQueryParam. I was looping over a CSV file with 60000+ records and recording information from each one into a database, CfQueryParam was eating up memory pretty fast, until there was nothing left and Coldfusion shuddered to a halt.
I have run such tests... MSSQL execution plans are definitely affected by the usage of cfqueryparam.
-mark
@Patrick: You said, "Someone who knows better than I do may say that setting that to zero means CF cannot actually cache any execution plans".
From my understanding that would be incorrect. Whether or not you allow pooled statements, SQL Server will continue to use a cached execution plan and will not recompile. (i just tested this and watched the results in SQL profiler)
As Ike said, Pooled Statements != cached execution plans. Execution plans are created and cached at the database level. You can see all your database's cached plans along with some stats in the master.dbo.syscacheobjects table.
Pools statements are something that J2EE apps do (among others) in which they prepare a SQL statement using sp_prepexec (or possibly sp_prepare) which returns an integer "handle" that can be used to re-execute that same prepared SQL at a later date via sp_execute using the same database connection. When you set your "Max Pooled Statements" eq to 0, ColdFusion un-prepares the statement every time it is called with sp_unprepare. Frankly, at this point I'm not sure why CF doesn't just use sp_executesql instead which is for a single-use paramaterized query.
Even though the DB is still holding on to (and using) the same execution plan, the Application is not bothering to use the resources to keep track of all the prepared statements it has used. It simply discards them immediately.
Other than the fact that a call to a prepared statement does NOT pass the entire SQL over the wire, but rather a the integer handle that points to it, I'm not 100% positive specifically what is supposed to be faster about the prepared statement since compilation and caching of the execution plan still only happens the first time you call it.
Also, a random note on the topic. The "Max Pooled Statements" setting is only useful when you have the "Maintain Connections" box checked. If you don't maintain connections, ColdFusion logs in and out of the database with every request. Before logging out, it must unprepare and prepared statements it had laying around.
And one last thing (sorry, this got so long)... "the severe memory use is due to some bug in CF/MSSQL"
I have never seen evidence that this is a bug. In my opinion, the reason for implementing queryparamed cfqueryies as prepared statements was under this assumption: Most applications have a limited number of SQL statements that they will use and generally most apps will wish to resuse the same statements over and over. Therefore, it is worth the over head to keep track of up to a default of 1000 such statements per connection.
These sort of assumptions wouldn't hold true if you had some sort of import process running thousands upon thousands of DIFFERENT statements. (which is what started this sub-discussion) I think this is one of those things where CF makes it INSANELY simple to do paramaterized prepared statements, but with a few assumptions made that apply to 98% of all use cases.
Hmmm, perhaps this warrants a full blog entry instead of filling up Mark's comments. :)
Ah... I see. Well that would be interesting to test to be sure. I would think that connection pooling would definitely impact performance - especially on a busy SQL server... eliminating the overhead of actually making the connection, finding the right IP address, permissions etc. Right? To be honest I never turn it off :)
-Mark
My understanding is that unchecking the "Maintain Connections" checkbox in your data source settings disables connection pooling (since each data base connection is disconnected at the end of the page request) and as a side affect also rendered statement pooling useless since there are no connections left open and prepared statements are connection specific.
http://stackoverflow.com/questions/855066/is-there...
I'm not so sure. I read the post and took a look at the example. I have a great deal of code like this running successfully (large import queries from a loop and using cfqueryparam) - so I suspect there is something else going on.
-Mark
Run my test if you have a chance. I'm interested in hearing of your results.
Some did mention in later comments that there are ways to prove this. You don't need to take our word for it. I'll point especially to a talk I did at cf.objective 2007 (http://www.carehart.org/presentations/#dbplancachi...) where I not only offer demonstratable code examples but more important offer useful SQL Server monitoring statements that can show clearly if/when this is happening, and I show things for SQL Server 2005 and 2000. I also mention some related monitoring for other DBMS's, as indeed the issue is definitely not limited to SQL Server.
I'll say also that some comments have hinted at another useful point: there can be differences in specific SQL used that can affect how likely this issue will happen. In my talk, besides all that I try to summarize, I also point to several substantial resources from others (engineers related to the DBMS's themselves and docs) which demonstrate and explain this in far more detail.
And let's clarify that it's definitely not a CF-specific problem, either, as I explain in the talk. CFQUERYPARAM just creates a prepared statement (or parameterized query) and other languages support these as well.
You may wonder, then, why this isn't more well known and broadcast. I honestly don't know. Part of it is that coders often don't pay attention to what's going on "behind the curtain". Or they accept as fact assertions made by the majority. In this case, I'm not asking anyone to "believe me". I offer clear evidence that you can test for yourself.
Hope that's helpful. Thanks, Mark, for all your awesome entries here.