ColdFusion Muse

Looking for Bottlenecks? Watch Out for Cfdirectory

I have a client with a file intensive application. It allows users to upload images and manage galleries. It's very slick and uses the flash uploader to accomplish multiple file uploads. He was having performance problems with the uploader. The flash uploader is a nifty way to upload a wheel-barrow full of files in a single operation. You can even check for things like file size and type in advance instead of waiting for the whole file to arrive on the server.

What we began to notice is that some requests took longer than others, a lot longer. I, being the expert troubleshooter that I am, naturally thought it was file sizes. I assumed that requests for a 2 meg file upload naturally took longer than requests that handled files of 200k. When we looked closer, however, it turns out that was not the case. A much more sinister culprit was lurking.

The problem

It turns out the file handler does a number of things. It uses the "upload" action of cffile to store the file in a temporary location, then it checks the main file directory for the user to see if there is already a file with that name. If there is, it uses a short routine to create a new filename. I created a little "time" structure that I emailed myself on every successful request. Like so:

<cfset times = structNew()/>
<cfset tm = getTickcount()/>

    ... file upload code
    
<cfset times.fileupload = getTickcount() - tm/>
<cfset tm = getTickcount()/>

    ... cfdirectory code to get list of files

<cfset times.dirList = getTickcount() - tm/>
<cfset tm = getTickcount()/>

    ... code to move, rename, resize....

<cfset times.renamefile = getTickcount() - tm/>
...you get the idea. What I discovered was that the call to CFDIRECTORY was at least 10 times as expensive as any other operation - sometimes more than 40 times as expensive. Two things contributed to this obvious bottleneck.

When Size Matters

Think about Cfdirectory like you think about Cfquery. If your directory has 10,000 images in it, then you are creating a recordset of 10,000 rows. In our case, a user with 13,000+ images was taking 60 seconds to return a directory list. The other issue is that this site stores the files on a SAN and uses UNC paths to manipulate the files. UNC paths will always perform slower than accessing a local drive.

The Solution

In our case the solution was to stop checking the directory structure and check the database instead. Since the file name was stored in the database under that user, it made sense to simply check and see if that user already had uploaded an image of that name. Another possible solution would be to further segment the directory structure - creating a directory based on the date for example. In any case, in your troubleshooting, don't neglect directories as a possible bottleneck.

  • Share:

Related Blog Entries

7 Comments

  • Jorrit Janszen's Gravatar
    Posted By
    Jorrit Janszen | 10/26/06 5:49 AM
    I don't exactly see why you are using cfdirectory to check if a file in that directory. Can't you use the fileExists instead? Or just use filter="thefilename.ext" in the cfdirectory tag?
  • Daniel's Gravatar
    Posted By
    Daniel | 10/26/06 6:25 AM
    Use Java instead
    i did something like that instead of using cfdirectory, works fine, fast and transparent, you will get 2 arrays, one with files and one with foldres.
    I had a problem listing a directory with a lot of

    <cfscript>
    listFiles = createObject("java","java.io.File").init(Trim(this.path)).list();
    aFiles = arraynew(1);
    aFolders = arraynew(1);

    for (i=1;i lte arraylen(listFiles); i=i+1){
       name = listFiles[i];
       path=this.path & '\' & name;
       if (find('.',name)) {
          arrayAppend(aFiles,name);
       }
       else if(directoryexists(path)) {
          arrayAppend(aFolders,name);
       }
    }
    </cfscript>
  • mkruger's Gravatar
    Posted By
    mkruger | 10/26/06 7:28 AM
    Jorrit, the original code had a series of nested "fileExists()" operations that performed more poorly than the current code. As for the filter - I would still have to call for a directory list multiple times - once for each proposed file name... yes? Or can you think of a way around that.

    Daniel, Love it! I'll check out the java snippet.... looks like it might be a good choice for a cflib udf as a cfdirectory replacement. Thanks!
  • Daniel's Gravatar
    Posted By
    Daniel | 10/26/06 2:05 PM
    I just sent u a message with some investigation. U will also get an attachment with my tests.

    The idea of using java is good but if it s outside a function, spaguetti code in the middel. Wrapped will make time aproximatly the same.
    The tests are done with 3000 files, maybe a bigger amount will make results different.

    make your tests and send feedback ;)
  • Axel's Gravatar
    Posted By
    Axel | 11/13/06 8:20 PM
    Hey Mark,

    I was reading Ben Forta's Blog, and this post caught my eye, and i thought you would really like the undocumented info... not sure if you saw it or not

    http://www.forta.com/blog/index.cfm/2006/11/1/impr...

    Axel
  • Oyun's Gravatar
    Posted By
    Oyun | 10/9/09 2:38 PM
    Does the DB size matter here ?
  • agustufus's Gravatar
    Posted By
    agustufus | 3/24/11 10:38 AM
    I love when these older posts help out. Thanks.