ColdFusion Muse

Sessions and Cookies and Bots (oh my)

Would you like to know how to create your own memory leak using the design of the Coldfusion Server to do it? Here's one way. Let's say you have a site that sells products from Narnia. It has a root folder that display your products and prices. You've done a great job of creating friendly links for browsing your Narnia products. You have stuffed Aslan lions both friendly and fearsome, White witch figurines, fauns, nyads, dryads, a toy lamppost and even a wardrobe for sale. Let's say (for the sake of argument) that you have 50 links to Narnia products just on your home page. If a user chooses to buy one of your products he or she clicks on "add to cart". At this point the user is taken into the "/shop/" folder to the page at "www.Nnarniaproducts.com/shop/cart.cfm". So far so good. This is how many online stores are organized and it's just peachy. But let's look under the hood shall we.

Listen Here

Sharing the Application.cfm (or Application.cfc) page

Where you place your cfapplication tag and how you configure it's attributes is very important. For example, it might make sense to put an application.cfm page in root because you have variables that are instantiated and shared between the shopping cart and the main browse pages. So it might make sense to simply include them in one big application. with a single "cfapplication" tag like so...

<cfapplication
      name="narnia_products"
         sessionmanagement="Yes"
            setclientcookies="yes"....>
You will note that we have enabled session management. Now the truth is, we don't need session management until the user puts something in his or her shopping cart, but since we are sharing the application.cfm page we go ahead and enable it here. When the user adds that big stuffed Aslan to his shopping cart and we need a session variable, we will be able to create one with no problem.

What happens when we do it this way? When a user first arrives on the home page 2 variables - CFID and CFTOKEN - are created. They are placed in memory under the "application" and a cookie is sent to the browser. The browser's responsibility is to return the cookie with each additional request. Subsequent requests "see" the cfid and cftoken, and match them with the data stored in memory. If there are any session variables, they are stored in memory using the CFID and CFTOKEN as a key to figure out which session variables belong to which user. The application knows to set a new CFID and CFTOKEN when it examines the request and finds no existing CFID and CFTOKEN.

These 2 variables are the basis for the "session". The server hangs on to these variables in memory until the session times out or the server restarts. You can easily test this by outputting the CFID and CFTOKEN on a test page. Output them several times and you will see they remain the same. Then delete all cookies and refresh the page. The values will change. Consider that if you crafted requests devoid of any cookies or url variables (when they are used for sessions) you would, in effect, be creating a new session with each request.

<cfoutput>
   #cookie.cfid#   
    <br>
    #cookie.cftoken#
</cfoutput>

Bots and the Train they Came In On

Actually, this is exactly how a bot works. An indexing bot retrieves your home page. The CF server sets a CFID and CFTOKEN in memory and sends back the cookie header with the response. The bot takes stock of your home page and parses out all the links. It then goes out and retrieves each link and indexes the content - exactly what you want it to do. But because a bot, by design, ignores cookie headers, it does not send back the CFID and CFTOKEN with each subsequent request. In effect, every request of the bot is creating a new session. In the case of our Narnia product site - an unneeded session. If you have thousands of products and you want your site to be heavily indexed this can create a resource issue for you. An aggressive bot on a large site can create hundreds and even thousands of phantom sessions that take up space in memory until they expire.

Solutions

Be careful how you configure sessions and the application.cfm page. Using a single Application.cfm or Appliciation.cfc page might not be the best solution for you. You might want to configure 2 of them - one with and one without sessions. In most cases, a "session aware" shopping cart can have it's own cfapplication tag while a "public facing" area can have a separate one. You can still use a shared "include" file to populate shared variables and functions - giving you a reasonable level of modularity. Using a robots.txt file to exclude the shopping cart area is also a good idea.

Please note, that while I'm explaining using the CFID and CFTOKEN, there is an alternate variable called "JSESSIONID" that can be used. You can enable JSESSIONID in the Coldfusion admin. jsessionid is a 1 way Hash and it takes the place of both CFID and CFTOKEN variable. In most cases I recommend that you use jsessionid - especially for a new project. However, there are many legacy Coldfusion applications out there that depend on the existence of the CFID and CFTOKEN variable - so you have to take it on a case by case basis.

Comments
Sean Corfield's Gravatar Calling this a memory leak is very misleading. It doesn't cause memory to leak at all - the sessions all expire and memory is recovered correctly. The real issue is that a bot / spider can hit the site fast enough to create more session data than can be held in memory, causing an out of memory error.

A common workaround for this is to check CGI.USER_AGENT and use a different cfapplication tag that doesn't enable sessions (if your site can survive without them) or sets an extremely short timeout value (like one second).
# Posted By Sean Corfield | 11/29/05 3:26 PM
mkruger's Gravatar Sean, You are right of course - and thanks for the nice work around. I was trying to use "memory leak" as a sort of convention to talk about the issue (i.e. "how to create your 'own' memory leak using the 'design' of coldfusion server"). I know that CF is behaving as expected. Thanks for the input!
# Posted By mkruger | 11/29/05 4:26 PM
Ryan's Gravatar This is a really good point. In fact, this could cause some significant stability problems if you use session variables extensively. The only down side is that by removing the sessions on the non shopping cart areas makes it harder to track your users and doesn't bode well if you have integrated shopping into your entire site (or some other need for sessions).
# Posted By Ryan | 11/29/05 8:22 PM
mkruger's Gravatar Ryan - yeah. I liked Sean's comment about using the user_agent to fiddle with the session timeout. That makes sense to me.
# Posted By mkruger | 11/30/05 7:44 AM
Charlie Arehart's Gravatar Mark, great podcast and blog entry. I'll offer, in addition, that the impact of turning on client variables (clientmanagement="yes", which many do without thought) is not only similar but could be worse in some ways.

The impact on sessions of excessive memory can indeed be bad, but at least it's temporarily (because sessions are lost on restart).

In the case of client variables, which are stored in the registry or a database, it's more persistent (and typically don't "time out" for upwards of one or more months). And many don't realize that even if they never use client variables in their code, every page hit does cause creation of several client variables (if enabled in cfapplication), such as last page visited, the time, and the hitcount.

Hope that helps someone.

As for what you've shared, this is something that I think a lot of folks don't know about, and I hope you'll consider creating a CFDJ article on the topic. I'm sure it would help many. Again, thanks, and good luck with the podcasts. I've enjoyed them.
# Posted By Charlie Arehart | 1/23/06 7:59 AM
mkruger's Gravatar Charlie - great point on client variables, thanks for the input (and the kind words).
# Posted By mkruger | 1/23/06 8:45 AM
Dave Anderson's Gravatar I've been looking for information on the topic of sessions v bots, and have become deeply confused by some of the suggestions I've seen offered, including those represented here. If you set an application variable, that value persists for the lifetime of the application, right? I usually set my application timeout to 1 day. If you timeout an application every second, doesn't that defeat the purpose of the whole scope? Furthermore, isn't sessiontimeout an attribute of the application, and therefore is set once until the application times out? Or is that something you can set on a per-session basis? If so, that seems like the only logical solution. I'm genuinely curious about the answers to these questions, and hope I don't sound like I'm being an ass about it.
# Posted By Dave Anderson | 5/22/07 10:53 PM
Sean Corfield's Gravatar @Dave, tho' it may be somewhat confusing, the session timeout value (and other attributes of cfapplication) are set on *every request*. A ColdFusion application only has a name when you execute the cfapplication tag (or this.name = ... in Application.cfc) and you can change the name of the application any time during a request, i.e., associate the request with different applications.

Looking at it another way, an application is created and exists for a certain period of time and each request can attach to any application.

Does that make sense?
# Posted By Sean Corfield | 5/22/07 11:02 PM
Dave Anderson's Gravatar Thanks for replying, Sean. Does it make sense? Yes and no. Examples I've seen that use bot detection have used the same application name with different attribute values, eg session mgmt / no session mgmt, which seems to me like you'd be recreating the whole application scope with every request that switch from bot to no-bot or vice-versa. If, however, you detect a bot and call the application 'myApp4Bots', and give the application the name 'myApp' for non-bots, then yeah -- that would make sense.

Now, my (hopefully) final question on the topic: Is the user agent string really the best (or least-worst) way of detecting bots? I can't think of a better solution, even though I resist it, which is possibly irrational.
# Posted By Dave Anderson | 5/23/07 10:17 AM
Sean Corfield's Gravatar @Dave, no, that's the whole point of my comment - you are not "recreating the whole application scope with every request". The application scope is created just once per application name. Subsequent requests merely "attach" to that application scope. Hence you can set session timeout *per-request* and have it timeout bots very quickly (or set sessionmanagement="false" and have *no* session - for those requests). Do you understand that explanation?
# Posted By Sean Corfield | 5/24/07 1:59 AM
Dave Anderson's Gravatar Indeed it does. Thanks!
# Posted By Dave Anderson | 5/24/07 10:20 AM
David Levin's Gravatar I wonder if this solution will help against the regular "Timed out trying to establish connection" errors that I get every couple of days. Those errors only come up when there is a search engine bot name in the CGI.USER_AGENT variable.
# Posted By David Levin | 8/6/07 11:23 AM
md's Gravatar Hi mark and all, this might be a bit off-topic..

I have an issue -my session variable is immortal.
No matter how short the value I put for the default timeout of my session variable, the session will still not expired.

Here is the cfapplication code written in application.cfm
<cfapplication name="myAppName"
clientmanagement="no"
sessionmanagement="yes"
setclientcookies="yes"
setdomaincookies="no">

Here is the current setup on my cf admin:

Use J2EE session variables [checked]
Enable Application Variables [checked]
Enable Session Variables [checked]

Maximum Timeout
Application Variables : 2 days, 0 hours, 0 minutes, 0 seconds
Session Variables : 0 days, 0 hours, 1 minutes, 0 seconds

Default Timeout
Application Variables : 2 days, 0 hours, 0 minutes, 0 seconds
Session Variables : 0 days, 0 hours, 1 minutes, 0 seconds

There is one time, i set 0 days, 0 hours, 0 minutes, 0 seconds for the default timeout's session variables. This suppose to kill the current session alive and not allowing new session to be created right? anyway, even with this there is no difference.

so far since this issue persists (yup, it was not like this before), the session can only be killed with this code
<cfset structClear(session)>

so i hope u guys can share the thoughts and shed some lights...
thanks in advance :)

P/s: guys, my team mate just told me that the session expired already .. finally! But still, why it cannot get
expired before this, I mean it should be expired according to the setup in cfadmin right??
She said she closed and opened the IE and the session expired already... weird and we are definitely not going
to tell our customers that after sometime.. they have to close their browsers.
# Posted By md | 11/23/07 2:29 AM
Charlie Arehart's Gravatar @md, I have a few thoughts (though yes, this seems pretty off-topic).

First, what's telling you that the session is not expiring? I suppose you're visiting a page whose session you think should be expired, and perhaps you're expecting to have to log in, and are not having to.

A better test would be to isolate things so that you run just a simple set of 2 pages, one that sets a session variable and one that outputs it. If you run the first, then the second, are you really SURE that the second shows the session var existing beyond your timeout? I'd be surprised.

So then it could instead be a problem in your code. For instance, you may be setting the session variable (that you're testing for) in the application.cfm or some file included from that. You could also have some code that runs on the client to ping the server regularly for that user to keep their session alive. I know it's a stretch, but confirm with the first test above, and if it works as expected, then you have to suspect something in your code.

If you're running CF8, you can even use the Server Monitor to watch your sessions (see how many are active at a point in time, how many have been created over time).

And whether running that or an earlier release, you could use a tool like FusionReactor or SeeFusion (or the CF8 monitor) to observe what requests are coming in to your CF server. As I said above, maybe there are requests coming from the client that you're not expecting.

Finally, you could also use a CFML debugger (FusionDebug for 6/7/8 or the CF8 debugger) to walk through the code line by line which may identify your problem.

There are also underlying features in 6, 7, or 8 (some documented, some undocumented) that can report what the session timeout really is for a given session. You may find that it's not the value that you think.

I'll say that with any of the above, if you feel you'd like help with it, I am an independent consultant and can be hired for as little as an hour to help with such problems [charlie (at) carehart.org]. I'm confident we'd resolve it. There's always an explanation for problems. :-) But hopefully you can resolve it with the info above.

Finally, as for your setting the max timeout to 1 minute, that may be useful for testing, but certainly don't leave it there permanently! :-)
# Posted By Charlie Arehart | 11/23/07 4:39 PM



Blog provided and hosted by CF Webtools. Blog Sofware by Ray Camden.