ColdFusion Muse

Muse Review: Exploring CouchDB With Matt Woodward

On Saturday I sat in on ColdFusion genius Matt Woodward's session on practical couchDB. I have experience with both Memcached and MongoDB so I thought I was prepared for the general sense of what you could do with CouchDB (which I had never explored). I assumed it was just another "no SQL" database. But Matt demonstrated some things that were new to me and I am intrigued enough to experiment with them - hopefully engendering a few more "CouchDB" blog posts. Here's a couple pros and cons gleaned from the presentation.

The Pros

For one thing it seems like (from the demo) that CouchDB's simple HTTP interface for getting, putting and updating records is a natural fit for ColdFusion and CF programmers. The storage engine uses Json (and really... who isn't using Json) which makes it easy to work with fairly complex data types. For example, getting content looks like this:

<!---MAK: get some user--->
<cfhttp url="#somedomain#:5984/dbname/keyOrGuid" meothod="GET" result="getUser"/>
<!---MAK: turn him into a CF data object --->
<cfset getUser = deserializeJSON(getUser.filecontent)>

Other actions use the standard HTTP verbs (PUT, DELETE, POST etc). HTTP status codes allow for traditional error handling. This seems conventional and easy to grasp. If a "document" (an individual record) is not found you get a 404 error. Various views and filters are possible by altering the URL or passing params in as a part of the request. It's a straightforward model that "fits" what a typical ColdFusion programmer already knows how to do - unlike MongoDB or Memcache which (while still easy) require some java construction.

The most intriguing thoughts bouncing around in the Muse' head have to do with performance. In its simplest form CouchDB is a listening HTTP port. Since it has clustering and replication under the hood (this was not demonstrated but it was made to sound easy :) you could easily cluster your content on multiple VMs or iron behind a load balancer - making scaling with traditional load balancing a piece of cake and eliminating the need for the application to use a teamed IP, know about failover, handle primary/delegate relationships etc. Indeed scaling databases is a great deal more challenging than scaling web servers, so having this capability should be a real plus.

The Cons

Not that I'm completely sold. I still worry about the overhead associated with HTTP requests. A cfhttp request is more expensive than a JDBC driver call and there's not "connection pooling". So the general connect and retrieve infrastructure is going to be more expensive than traditional DBs - at least that's my theory. In addition, you have to pull in the data as a string and deserialize it. Granted Json serialization/deserialization is light weight but it's still one more step.

What would really be cool (listen up Railo and Adobe Folks) is to allow a CF admin to register CouchDB (or Mongo or whatever) as data-sources or pseudo datasources and provide a function or tag to get the data back already deserialized. That would be a true data enhancement that fits with the ease of the CF administrator. One of ColdFusion's strengths is this sort of "pre-configured" resources that are easy to access within the code. No SQL Dbs would be a natural extension of this I think. But back to performance...

I'm often given a sort of glib response to granular performance concerns (like the overhead associated with CFHTTP) by developers who say "how often do you really need to worry about millisecond level performance? The truth is - quite often. One of my CF Webtools roles these days is to dive into under-performing systems and try to find ways to maximize output and speed. In many cases the goal is to get another month or two out of existing hardware or design while a new approach can be devised. In such cases everything is in play. Consequently, when designing a new system that is intended to receive a lot of request traffic, it's important to make such decisions right up front or pay the penalty later.

Still, for intermediate caching, shopping carts, sessions, aggregate portals and many types of content CouchDB would be a huge step up from the hodge-podge of approaches usually seen. I was impressed with its ease of use and responsiveness.

Matt Woodward's Gravatar Thanks for attending my session and for the review Mark!

I think the piece you're missing when discussing performance is that while in the abstract you may be correct that a database connection can be considered a less expensive operation than an HTTP call *assuming there's pooling and you're using an already open connection*, there's a ton of other variables involved, particularly with respect to the work that needs to be done on the database server side, that impact performance.

What I can tell you is that we have seen nothing but across the board performance improvements in our applications moving from SQL Server to CouchDB. Everyone's mileage on that point will of course vary, but CouchDB is blazing fast. The way the views work in Couch the database has to do practically zero work to return data, so rather than worrying how long a query will take and tuning certain aspects of the SQL to the Nth degree, it's just a matter of how fast your network is and how much data you're pulling back at once.

Also of course once you're in HTTP land you can do all the normal HTTP "stuff" with Couch that you can with any webapp, including using caching, squid, etc. on your data layer which is massively cool.

Anyway, as you rightly point out there are things to consider, but the equation is a bit more complex than just worrying about performance at the connection level.

Glad that you're going to give it a shot and I'm happy to answer any questions you may have. As you could tell an hour was only enough time to scratch the surface of all the features Couch has.
# Posted By Matt Woodward | 5/22/12 12:21 PM
Mark Kruger's Gravatar @Matt,

I knew you'd chime in and pick at my "cons" (ha). Thanks for the great comments - very helpful.
# Posted By Mark Kruger | 5/22/12 12:38 PM
Matt Woodward's Gravatar Heh -- sorry. Great "pros" too! All great reasons for people to try Couch and it's definitely a natural, simple fit for CFML developers.

I'll post the sample apps (including the one I didn't have time to dig into) to my blog before long and write up a nice tutorial around all this stuff. And I'll let you know next time I'm in Nebraska and maybe I could dig deeper into things in person!
# Posted By Matt Woodward | 5/22/12 12:44 PM
Sami Hoda's Gravatar Matt/Mark,

You guys should read and comment on this:
# Posted By Sami Hoda | 5/23/12 10:34 PM
Matt Woodward's Gravatar Read it, and as he says himself, between CouchDB 1.2 or BigCouch the vast majority if not all of their issues would have been resolved. There are also a lot of other ways to address the things he brings up that aren't necessarily fixed with an upgrade.

If nothing else these are good discussion points but I don't see a lot of validity to a lot of the points since there are solutions to the problems, they simply chose to cut bait and go back to MySQL purely for familiarity's sake. Personally I think all they're doing is trading one set of problems for another potentially much bigger set. They'll just have a higher comfort level with their old sets of problems.
# Posted By Matt Woodward | 5/23/12 11:33 PM

Blog provided and hosted by CF Webtools. Blog Sofware by Ray Camden.