ColdFusion Muse

Java Based Directory List

This is a follow post to my previous post on cfdirectory as a bottleneck. A helpful muse reader of that post named Daniel Gracia sent me some Java code that builds a directory list. The code itself is a call to the core io.File class. It takes a directory path and returns an array of files and folders mixed (with the folders identified with a period). His claim was that it performed faster than Cfdirectory. His claim is 100% true, but there are some nuances to it. I ran a few tests and here is what I found.

The Code

Here is my test script:

<Cfsetting requesttimeout="200">
<cfscript>

currentDirectory = expandpath("../folder_With_15000_Images_In_It/");

function dirList(path) {
    return createObject("java","java.io.File").init(Trim(path)).list();
}

ftime = gettickcount();
    mylist = dirList(currentDirectory);
ftime = gettickcount() - ftime;

nfTime = gettickcount();
    nfList = createObject("java","java.io.File").init(Trim(currentDirectory)).list();
nfTime = getTickcount() - nfTime;
</cfscript>
<!--- Use CFDirectory as a baseline for comparison --->
<cfset tm = getTickcount()/>
    <cfdirectory action="list" directory="#currentDirectory#" name="test">
<cfset dirTime = getTickcount() - tm/>
<cfoutput>
<h4>#ftime# milliseconds for java using a function</h4>
<h4>#nftime# for Java outside of function</h4>
<h4>#dirTime# for CF</h4></cfoutput>
The results where no contest. The Java code performed in under 200 milliseconds consistently while the Cfdirectory call took 4 seconds - a forty-fold increase. Those numbers come from a test using local storage (meaning a physically attached disk - not a UNC path). Wrapping the call in a function is about 50% more expensive than calling it directly. In other words if the direct call is 100 milliseconds the function call will be around 150 milliseconds.

UNC Paths

To test it further I moved it to a different server and used a UNC path.

<cfscript>
    currentDirectory = "\\server\share\folder_With_15000_Images_In_It\";
</cfscript>
As you might expect running this against a UNC path takes appreciably longer. After all it has to use the network redirector to locate the resource and make the list call. I was surprised to find that the results using a UNC path were dramatically in favor or using the Java method. In my tests the function wrapped Java method to the same list of 15000 images using a UNC path took around 3.5 seconds on average. The direct call was around a half a second (500ms) - a dramatic reduction.

That got me thinking. Why did that second call take so much less time (fully 1/7th of the time of the function call). Then it hit me - I was calling the same function for the same directory list two times in a row - once from with the function and immediately afterward outside the function. The function wasn't having such a tough time - it was just the first in line. Once the directory list had been built by the io.File class it was likely cached or otherwise available for that second retrieval operation. I reversed the order and called the direct method first. The result? They were almost dead even at 500 milliseconds. This indicates to me that the function wrapper has (at best) a negligible impact.

As for using Cfdirectory through a UNC path - my only advice is don't do it! Cfdirectory took 59 to 63 seconds on average to return the data. In case you are guessing that CF takes most of that time preparing the Coldfusion query object that is returned by cfdirectory, let me remind you that a cfdirectory call for the same directory of 15000 files took only 4 seconds when calling it as local storage. The problem is somewhere else - how the data is chunked or verified or something.

Conclusions

My conclusion is, if you are dealing with large file structures - especially across the network, and you don't need the other things that Cfdirectory provides (like dateLastModified), use the pure java solution and call it directly. What could go wrong?

  • Share:

Related Blog Entries

7 Comments

  • Devin's Gravatar
    Posted By
    Devin | 10/26/06 8:53 PM
    Interesting find! I'm in the same situation. I built a site for my parents' photography business. I allow them to upload photos through ftp. The site uses cfdirectory to get a list of all the images, and then converts the structure to xml to be used for a flash image gallery that uses xml as its database of galleries and images. I've noticed that with thousands of images the gallery takes forever to load. I have yet to start looking for the bottleneck, but have just assumed it was the process of converting the cfdirectory query to xml or loading an xml file of that size into the flash slideshow widget (and those might very well be part of the problem). Now I'm interested in knowing how much of the time is devoted just to the cfdirectory call!
  • Daniel's Gravatar
    Posted By
    Daniel | 10/27/06 2:16 AM
    Nice! I didn t test with the UNC path and that makes sence with the problems I had. 2 minutes with cfdirectory, 4-5 seconds with the java call. :D
  • PaulH's Gravatar
    Posted By
    PaulH | 10/27/06 2:31 AM
    if you use listFiles() instead & are prepared to jump thru a few more hoops you can get back pretty much the same info cfdirectory does, though i'm not sure the speed will be better.
  • Tim Elsner's Gravatar
    Posted By
    Tim Elsner | 10/27/06 12:55 PM
    You say "with the folders identified with a period". However, when I run the code, neither files or folders have a period. Am I missing something?
  • Tom Jordahl's Gravatar
    Posted By
    Tom Jordahl | 11/1/06 11:25 AM
    Try checking the performance of cfdirectory with the ListInfo="name" attribute. I think you will find that the performance is comparable to File.listFiles() API, which is what the code is doing.

    The slower 'full' listInfo performance is because we get all the information about a file and place it in the query. This requires performing a stat() system call on each file, which is expensive, especially over any sort of network. The OS will cache file info collected via stat(), which is why the second calls will run resonably fast.

    Hope that helps.
  • mkruger's Gravatar
    Posted By
    mkruger | 11/1/06 12:16 PM
    Tom, Thanks for that tip. I'll keep it in mind.
  • Craig M. Rosenblum's Gravatar
    Posted By
    Craig M. Rosenblum | 10/1/09 2:08 PM
    Does this automatically sort by name asc?

    Or is there some method i should be calling to force that?