Finding a cause for something as generic as a CPU spike can be a frustrating task. In my experience it is best
to start with "known" issues and get them out of the way. First, ensure that the OS is up to date with all
patches (if using w2k - sp4). Likewise the RDBMS server should be up to date - both the OS and the application
(sql2000 SP 2 for example). Next, examine the CF server and make sure that it is up to date as well. Be careful -
make sure that all patches are installed and tested for compatibility on the dev server first. I never install a
patch on the first day - or without doing it on the dev server first. Patches get rolled back too. Current CF 5
version is 5.0 - but there is an extensive list of hot fixes that may or may not be germane to your installation.
Here's a link to Macromedia's Patch List.
p>
I always start with issues that are "known bugs and workarounds" - because they are easy to check or discover and have
known fixes for them.
Known Issues
- The dreaded CFMAIL bug - This bug occurs (listen carefully now - this is tricky), when a file handle for
a mail file is created, but the process that writes the file is never completed. It is most common when you are sending
out individual mail files on a busy server. Some error that occurs procedurely after the call to cfmail interupts
the process and keeps the file from being written - but the file handle and entry in the file directory still exists. This results in a
"zero length" cfmail file. It's easy to check. Go to the %cf root%\mail\spool directory and see if a file exists there
with a zero byte size. Take the steps below to fix it.
- Verity Issues - If you are using verity and you don't have an adequate maintenance plan on the collection(s)
you may experience performance issues. Make sure you are reindexing the collection periodically. If you have issues you
can take the following escalating steps.
- Reindex, Purge or delete and re-add collections
- Move to CFMX - cfmx actually has some verity improvements.
- Move to the Verity 2k Server
- Sequential File operations - If you have an application that makes extensive use of file operations - particularly
reading, writing and/or appending a sequence of files in a single operation, you may experience a server spike. Remember
there are some operations in CF that use the file system that are not cffile. For example, cfhttp and cfftp. A
request that manages several files in sequence will naturally run longer than a typical http request. That means you already
have a thread that you can expect to last longer than normal. When you use cffile you create multiple file handles (pointers)
and send requests to the disk system. Since the disk cannot fulfill each request simultaneiously it queues the requests as well.
That means a single request for multiple file operations will spike the processor. If any of those threads hang you are now
in a situation where the clock cycles are concentrating on a thread that cannot be released. How to solve this?
- Use named CFLOCKs around file operations. You can restrict the process to a single thread that must be executed sequentiall this way.
- Consider a different solution - CF is probably not the best choice for batching large
groups of files for data import/export and the like.
- Log files (and cfam) - CF can be configured to do a lot of logging. It's important to remember that CF
must append to the log file with each logging operation. The larger the log file the more difficult this may be. 2 steps
will help.
- Purge the log files periodically.
- Disable and/or patch the CFAM service - This is the "management repository serivce" that runs with the JRUN vm. I
have not found it to be a terribly useful service in CF 5 unless you are using enterprise in a cluster and
using the deployment and archiving service.
- Client Variables in registry - If the application in question makes extensive use of client variables
and you have not altered the default handling of client variables, they are being stored in the registry. Check the
registry size and see if it is unduly large. Here are some tips to solve this problem.
- Resize the maximum cieling of the registry to accomodate the variables.
- Possibly run the "clean" tool to remove client variables. Macromedia has one, but Edge web hosting has a
much faster and cleaner removal tool. (Edge web)
- Possibly move client variables to a database instead. Actually, if you are going to make extensive use of client
variables this is the preferred solution.
- Locking - Yes it's true. Failure to lock session and application variables is a very likely suspect. Each session and application write should have exclusive locks and each read should have read only locks (this second step is mitigated in CFMX).
- Search code base for session and/or application variables and ensure locking.
- Set flag for "enforce strict validation" in the cf administrator and check the log for locking related errors. NOTE:
do this on a dev server. If you do it on a live server it may bring down your ap until you fix the errors.
Of course there are also external causes of server performance problems - a topic for another blog.