ColdFusion Muse

Good Developers Practice Safe Query Caching

First let me say that query caching on a CF server is not a panacea. There are many ways to improve performance and there are many techniques for caching. All techniques have trade-offs. Still, there are instances where caching will save you time and money - and those are both things in short supply. More to the point, query caching is so easy to implement that it can be done for an entire site or application in a relatively short time - as long as you follow some simple rules and take some precautions (please people - use "safe caching").

How do I Cache?

First, you must make sure that caching is enabled in the Coldfusion Administrator. Go into CF Administrator and click on the Caching link in the side menu. Make sure there is a number entered for "maximum cached queries" in the box provided. How many you wish to cache depends on your server resources, site activity, database etc. See comments below under the section explaining the FIFO buffer.

Now it's time to fiddle with the CFQUERY tag. There are actually 2 attributes to cfquery that deal with caching. They are cachedAfter and cachedWithin. CachedAfter takes a date as a parameter. It will cache the query AFTER the date that is specified. So, if I added the cachedAfter tag with tomorrows date in it, the query would run "live" until tomorrow and run from the cache from then on. To be honest I've never had the occasion to use cachedafter. It would be more useful if it took a time value as well as a date.

The second caching attribute is the one that I find useful. It takes the results of "createTimespan()" as its argument. Create timespan allows you to create "length-of-time" variable down to the second. It takes a list of 4 integers for Days, hours, minutes and seconds. So, for example, a time span of 2 hours and 13 minutes would be #createTimespan(0,2,13,0)#. If I wanted to cache a query for 2 hours and 13 minutes I would doe something like this:

<cfquery name="NightOnTown" datasource="dsn" cachedwithin="#createTimespan(0,2,13,0)#">
   SELECT   Hose, Slip, Garters, shoes, fashion_sense
   FROM      myCloset
   WHERE   username = 'bubbaLouie'      
</cfquery>

How does it work?

This is where some folks who are new to query caching get nervous. If I'm caching the query "NightOnTown" and username "samIam" shows up, will he see BubbaLouie's shoes? The answer is no. Bubbalouie's pumps will remain in the closet (along with his garters hopefully). To pull from the cache, more than just the name of the query must match. Here's the list:

  • Same query "Name"
  • Exact same SQL statement - "where username='bubbaLouie'" and "where username = 'samIam'" are 2 different statements, ergo 2 different queries in the cache - even if they are both "named" NightOnTown.
  • Same Datasource - for those of you who fail to assume and stumbled onto that thought.
  • Same Username and password - This is interesting to note. If you have a site with a shared datasource but multiple db usernames you may not get the benefit from caching that you think you should.
  • Same DBTYPE

How does it Help?

Recently I was working on a "sick server" and ran across a query that was being run with every page request. The purpose of the query was to build a drop down list of choices. This busy site was receiving over 200 views per minute. That means (for those of you taking notes) that the database is being tapped for what is largely the same information 200 times per minute or 12000 times an hour. I suggested caching the query. The client was concerned because the choices in the drop down list changed frequently - several times an hour. So I suggested caching the query for just three minutes.

Three minutes? How can caching for so short a time make any difference? Well in this case the number of db hits went from 12,000 per hour to 20 per hour for that single query. So you see, even a small amount of caching can make a difference. Minimizing the number of calls to the database has an exponential effect on your server. Remember, database activity is virtually always the single most costly process on your web server. Getting to know your data, database schema, database platform and advanced SQL techniques will probably do more for your site than advanced CF programming (although you should strive for both). When the number of calls to the DB goes down, there are more threads in the pool and the JDBC manager doesn't have to work so hard. Resources are freed up for other things. Requests are less likely to be queued and you diminish the possibility of the snowball effect on your resources. In other words, it's a good thing.

What are the rules?

Now don't go off and add caching willy nilly. Here are some things to keep in mind.

CFQUERYPARM and Caching are Mutually exclusive on ColdFusion 7 and below - you can't have both on the same server. Why? Because data binding is based on "typing" the data for the DB server. The SQL Statement becomes "where username = [variable of type varchar]" instead of "where username = 'bubbaLouie'". Since the "statement" part of the query is generic, it would cause bubbaLouie and samIam to seem like they were the same query. One more important note, if you can't use CFQUERYPARAM and caching together, then you must remember to validate user entered variables. Otherwise you will be exposed to SQL injection attacks. NOTE: Starting with ColdFusion 8 Queries using cfqueryparam can now be cached. Apparently Adobe figured it out so now you need not worry about that issue.

Know your Data Patterns - To use caching you should be aware of how often the data you are caching changes as the result of updates or inserts. You should also be aware of how significant the lag time might be to a user. If the information changes quite frequently, but a lag of 5 or 10 minutes doesn't make any difference to those who need the data, then you can cache safely for 5 or 10 minutes. If the users cannot tolerate any lag time, then you may have to abandon caching even if the data doesn't change often.

Beware the Delete - One special category of changes is the "delete". If data that you intend on caching is frequently deleted then you may need to implement a "cache refresh" technique (I'll blog about that sometime). Consider what might happen if data is being displayed on your site that has already been deleted. If a user has to interact with that data you will throw the site open to errors in your database code where an expected record is not found, or (worse) you will end up creating orphan records or badly synchronized data that will not be tenable.

FIFO is Not Always Your Friend - 2 things control the cache. One is the time span of hours, minutes or even days that you choose to cache the query, but the other is the "max cached queries" setting that we referenced above. Consider the scenario where a developer has chosen to cache the users login information. Let's suppose the max cache limit is 500 queries. When the 501rst user logs in, his query is cached and the "oldest" query is expired from the cache. That means someone's query who is already logged in will be re-run "live" (not pulled from the cache), and at that point his or her query will be cached forcing out another query - and so on... (I can hear Elton John singing "..it's the circle of life....").

If the query you are caching is run thousands of times with different parameters it may not be useful to cache it. You could turn your FIFO (first in first out) cache buffer into a wildly spinning revolving door. Queries that may be larger and more useful to cache might get pushed out so often as to make caching them equally useless. You could set the number into the thousands, but that might just overload the resources of your web server. As you can see, it's important to know that level and activity on your server.

Happy Safe Caching!

Related Blog Entries

Comments
Ryan Guill's Gravatar Its funny you wrote this, I was just having a discussion about query caching with a fellow developer this morning.

Personally, im not crazy about caching using the cachedwithin attribute of cfquery. Mostly because of the fact that I can't use cfqueryparam. But it also just seems too loose. I prefer to create an application variable to hold the recordset, if the recordset doesnt exist or a url parameter exists telling it to reload it runs the query and stores it, if not, it just ignores the query. But thats just me.
# Posted By Ryan Guill | 10/13/05 11:01 AM
Rob Brooks-Bilson's Gravatar I think this is a really good overview of query caching. Well done.

One point I'd like to make concerns your FIFO example. I think you do a good job of showing how query caching should NOT be used in the query caching of a user login. I think it would benefit your readers if you explain what an alternative solution in that case would be such as putting that information in the session scope, where it would be cached independently from query caching. When I design applications, this is one of those decisions that's a constant tradeoff. Where should the caching take place. Generate static content, query cache, cache in session scope, application scope, etc.
# Posted By Rob Brooks-Bilson | 10/13/05 11:06 AM
Doug's Gravatar I was hoping for more alternatives to query caching in general, as well, but it's a good overview.

Anyone know if there's a way through the Java classes to determine how many cached queries (out of the total possible) are in memory?
# Posted By Doug | 10/13/05 12:06 PM
Rick Osborne's Gravatar As for an example of cachedAfter ...

We have a financial report that the execs look at that changes daily. Financial state of the company, inventory, etc. We rebuild the info in the mornings, at 2am or some ridiculously early time, and for the rest of the day the data stays the same. We throw cachedAfter into the queries so that while the first person who runs the report will sit there and spin for 20 minutes while the various queries run, anyone after that for the rest of the day gets it instantly.
# Posted By Rick Osborne | 10/13/05 12:17 PM
Mark's Gravatar Ryan - I'm with you in most cases. However, when it's a matter of "lookup lists" that rarely change I implement a simple scheme. I set a variable "tmtocache = createtimespan(0,1,0,0);" Then I use it in the cachedWithin attribute. When I want to "refresh" the cache I just set it to 0. For lookup lists where the caching can be "dumb" or "global" it works fine and keeps me from having to track time values etc.

But yes... as I said before, it is NOT a panacea. However, if you ever have a client with a sick server you can light him up with a little caching. There's a great WOW factor when you take his server from slow as molassas to faster than it was when he first built the site in just an hour or so of work (ha).
# Posted By Mark | 10/13/05 12:24 PM
Mark's Gravatar Rob - I see your point about the users. I agree completely that the login stuff should be in the session scope. I could have picked a better example. I have, however, seen code where the username and password are checked on every page by running a query against a cookie (ha).... so it's not completely out of my experience.
# Posted By Mark | 10/13/05 12:25 PM
Mark's Gravatar Doug - if you do get some code that is able to do that, let me know. It could be very useful for diagnostics eh?
# Posted By Mark | 10/13/05 12:26 PM
Mark's Gravatar Rick - great example of cachedafter! Thanks.
# Posted By Mark | 10/13/05 12:27 PM
Frank Wheatley's Gravatar We currently use cfstoredproc and ref cursors in Oracle to return queries.
<cfstoredproc procedure=".................>
<cfprocparam type="In" cfsqltype="CF_SQL_NUMERIC"...........>
<cfprocparam type="In" cfsqltype="CF_SQL_VARCHAR"...........>
<cfprocresult name="..........>
</cfstoredproc>

Is there any way to cache these queries. I've been throwing around the idea of scoping some queries that are fairly static with the session scope.
# Posted By Frank Wheatley | 10/13/05 12:52 PM
Mark's Gravatar Frank - you can cache this query if your move away from the cfprocparam syntax. See this blog entrie on databinding without using cf tags.

http://mkruger.cfwebtools.com/index.cfm?mode=alias...

And the subsequent entry on how to do that for oracle:

http://mkruger.cfwebtools.com/index.cfm?mode=alias...

Basically you would have

<Cfquery name="blah" datasource="blah">
DECLARE
.... bind variables to types
BEGIN
run stored proc

EXCEPTION
.... commit or rollback stuff

END
</CFQUERY>

You would be able to use this syntax with the Cachedwithin or after parameters.
# Posted By Mark | 10/13/05 4:23 PM
William's Gravatar I use a CFC to house all my SQL queries. For all the SELECT queries I pass a boolean parameter to the method telling it whether or not to "flush" the cache (aka set cachedwithin to 0 like Mark said above). Then in my methods that do INSERTs, DELETEs, and UPDATEs, I simply have them call & flush the affected cached SELECT queries' methods after they are run. Then I am ensured that my lists are always up-to-date. Having all my SQL in the CFC makes the code easy to maintain.
# Posted By William | 10/13/05 7:06 PM
Trond Ulseth's Gravatar William, that sounds like a great idea. Could you be bothered to show some code?
# Posted By Trond Ulseth | 10/14/05 4:21 AM
Frank Wheatley's Gravatar Thanks Mark-
I've used syntax like that in our test site. The problem is the webserver has no rights to tables in production. The DBA set this up for security.
# Posted By Frank Wheatley | 10/14/05 8:32 AM
Mark's Gravatar Maybe I'm missing something. Why would access to the tables be necessary? Is the problem that you are returning multiple results and MUST use cfstoredproc? If so they you are probably going to have to cache in the application or session scope. Sorry if I Misunderstood.
# Posted By Mark | 10/14/05 9:12 AM
Mark's Gravatar Will - nice approach. I especially like having the CFC handle it all internally.
# Posted By Mark | 10/14/05 9:13 AM
hamlet's Gravatar William - that's a brilliant idea.
Ryan- I like you idea as well.

hmmmm, I wonder which I approach I should take?
# Posted By hamlet | 12/31/09 12:39 PM
zac's Gravatar thanks, this was very useful for me. did you spell create wrong on purpose to keep us all on out toes?!

i have a question. if i dont have access to the cfadmin. how can i check to see if my queries have in fact been cached?

thanks
zac
# Posted By zac | 5/11/10 12:13 PM
Mark Kruger's Gravatar @Zac,

Dude... lay off the spelling will ya? As for determining if it's cached, if you are on CF 8 try the "result" attribute (result="rs") and dump out the attribute.
# Posted By Mark Kruger | 5/11/10 12:17 PM
zac's Gravatar a ha, yes, thanks for that.

zac
# Posted By zac | 5/11/10 12:19 PM
Kamil's Gravatar @Will - very smart approach. It was always not being able to "predict" correctly the cachedWithin interval that caused difficulty in implementing a cache, but your technique obviates that need.
# Posted By Kamil | 9/20/10 11:33 AM
David Cheong's Gravatar is there any way that I want to query to run "live" everyday at 12pm (because our data refresh everyday at this time and I not sure when will be the first user to view that report).

Initially i plan to use cacheddafter 12PM everyday and catchedwithin 23 hours. But CF not allow to use both attributes together
# Posted By David Cheong | 4/26/12 1:17 AM
Mark Kruger's Gravatar If I understand the question (and I'm not sure I do) you can set cachedwithin="0" to refresh the cache at any time you need to. But given what you are saying I think I would put this query into the application scope and refresh it carefully on a scheduled basis. If your data refresh is an import routine you could even "trigger" the application scope refresh as needed so the data would always be as current as possible.
# Posted By Mark Kruger | 4/26/12 9:52 AM
David's Gravatar I think i would put the begin tag in if else condition to refresh the query when user trigger it in a particular "refreshing time"

another question is if a particular query is access by user in a very frequent basic, said every minutes. and I set the cachedwithin = 5 mins. is that mean that the query will be cache forever? since the cachedwithin attribute being trigger every single mins
# Posted By David | 4/26/12 12:22 PM
Mark Kruger's Gravatar @david,

No.. caching for 5 minutes means the query will be rerun after 5 minutes. I would also caution you about the FIFO buffer - your query is not "guaranteed" to be cached for a given amount of time.

-Mark
# Posted By Mark Kruger | 4/26/12 12:26 PM
bibin's Gravatar If i need to get updated data from DB then there is no use for Caching right?
# Posted By bibin | 9/12/13 10:15 AM
Mark Kruger's Gravatar @biblin,

The reason to cache is for performance. If you need updated data all the time then you shouldn't cache. Your caching scheme should match your need for the latest data. So the first question to ask is "how often is my data updated" and the second question is "how soon do I need it".
# Posted By Mark Kruger | 9/12/13 10:19 AM
David's Gravatar Is there any way or any tools that we can monitor and find out which query is more recommended to cache?

I try to monitor via the CF8 server monitoring tools, but still not that optimize
# Posted By David | 9/13/13 2:33 AM
Mark Kruger's Gravatar @David,

This is a case of using your noodle :) Here's what I do. I trace the DB on Prod during a busy time of day (or possibly during a load test). Then I look for repetetive queries in my trace. Sometimes obvious ones come to light right away. Other times I have to do more digging.

I also take a look at server monitor (or seeFusion or Fusion Reactor) and watch for slow pages ... examining the code in this pages can give me additional clues.

Finally, watch your DB server monitor to see if there are excessive locks or blocks that arise under load. They can help identify tables that need less pressure.

Hope this helps you.
# Posted By Mark Kruger | 9/13/13 10:26 AM
David McGuigan's Gravatar So when the first query past your threshold fires and "the oldest one" is removed from the cache... is "the oldest one" the one that was first executed period or the one that's gone the longest without being pulled from the cache? Seems like the 2nd option would be much more effective and should be ER'd if that's not how it works.
# Posted By David McGuigan | 12/18/13 2:18 PM



Blog provided and hosted by CF Webtools. Blog Sofware by Ray Camden.