ColdFusion Muse

Duplicates and Sub Selects

Here's a sticky problem. How do you build a query that gives you distinct records from one table based on multiple records from another table and order by a date found in the second table. In my real world example, I have project tracks in "Items" and comments or notes in "events". I want a distinct list of "items" that have been updated in the past 10 days. That seems easy right? Well... not as easy as it seems on the surface.


Fun with th HAVING Clause

You probably already know about the GROUP BY clause in SQL - but have you ever tried "HAVING"? The group by clause is great for getting aggregate groups of information together. Let's say you want a count of how many times a keyword appears in a table. You have the words car, auto, van and bus all in a keywords table.


You Should Use an Alias for Those Sub Query Tables!

Here's a rule of thumb for you. When you are writing a query that includes a sub query always use an alias for the table and qualify the columns - or use the full column name if you are so inclined. If you don't you may very well end up with unexpected results. Consider this query:


Removing Duplicate Records in a Database

SQL provides many ways of grouping data. It also has many ways to select data. But if you've ever tried to remove duplicates from a database you might have needed 2 or 3 trips to the liquor store to figure it out. I've seen routines that match and compare and order and update and delete willy nilly - all just to find and fix duplicate rows in a database. I've never had to do this to any of my databases of course, because all my constraints and applications are perfect (gah!). Still, for those of you with imperfect databases I'm happy to report there is a magic bullet. Here's the big secret.


Getting the ID field after an Insert

About every 2 months or so someone asks about reliably returning the primary key record from an insert query. The problem they are trying to solve usually has to do with additional inserts into related tables. For example, if you are adding a new user and you want to set group permissions as well - but group permissions require inserting into another table. One way to do it is to do an insert, then do a second query that pulls back the "max(id)" and uses another qualifier - like an email address. This requires 2 connections to the database, but it is a very common method. If you are using SQL server and your primary key is an "identity" type field then you have another option. You can insert and get back the identity value in the same query. This is preferred because of SQL treats a single query statement as an implicit transaction - meaning you are assured of data integrity, and that you will return the right value. Here's the way it works.


A ListFind( ) function for SQL

First let me say that it is usually not a good idea to store lists in Database tables. Many novice web developers falls into the trap of treating the data in the database the way they treat it in their web application. They are used to using lists and list functions in logic loops to qualify users or statements, and they try to do the same thing in the Database. But SQL doesn't have any native list functionality. The equivalent of a "list" to a database is a table with a single row.

A while back I saw a security scheme implemented by storing a list of group ids in a character column of the users table. The code grabbed the groupIds column from the users table on login, then it selected from the "groups" table where group_id IN (1,2,4,8...). This actually works fine when dealing with individual users. But what if you wanted to grab all the users from a particular group? You end up with code that looks like "where groupIds LIKE '%,2,%' Or groupIds like '2,%' or groupIds LIKE '%,2' or groupIs = '2'. Why the 3 statements? Because you have to account for a group id of 20 and a case where 2 is alone or at the beginning or the end. There are other ways around this - none of them pretty (like storing beginning and ending comma's for example). Not to mention this sort of design makes DBA's pull out what little hair they have.

Now, having warned you against being "listy" in your database design, I'm going to show you one way of working with a list that might help in a case like that above. I know, I know, a little knowledge is a dangerous thing. Let's just assume you are forced to deal with a legacy schema that you have no power to change (ha).


Using UNION in your queries

Are you using Union queries yet? If not, you should get up to speed. A union query is an extremely useful method for returning records from different tables in the same recordset. You just have to remember that datatypes of the columns must match in the same order they are referenced. Here's an example:


Clustered indexes on a Non Primary-Key (MS SQL 2000)

Ever wonder why a table is only allowed one "clustered" index? It's because the clustered index is the actual sort order that of the table. When you implement a clustered index you are "rearranging" the rows so that they are actually ordered by that index. That makes this index faster. Choosing the right clustered index can have a major impact on the speed of searches in your table. Wait a minute - isn't the Primary Key always the clustered index? Well... yes, when you use enterprise manager to set a primary key you will find that it also designates that column or columns as the default clustered index. It doesn't have to be that way however.


Leveraging Your SQL - Update Using a Join

Most update queries are pretty straightforward. You already know the primary key or some other criteria for a single table and and the WHERE clause is just "WHERE pk_id = 4" or "WHERE area_code = 312". There are times, however, when it might be useful to update a table based on critria from 2 or more tables. For a purely hypothetical example, let's say I have a shipping amount in an "ordShip" table, a base amount in an "orders" table and I have a tax amount (as a decimal) in a tax table based on the state. Let's also assume I have the state in the "orders" table. I want to update the "grandTotal" amount in the "orders" table. The formula would be:

base Amount + (base Amount * tax rate) + shipping.
How would I go about it?


Why You should worry about your execution plan

No I'm not talking about dead man walking. I'm talking about your database execution plan. I want to give fair warning to all of you Microsoft haters out there (and you know who you are) that I'm going to use lingo from Microsoft SQL Server 2000. It's a ubiquitous and full-featured database with good documentation regarding this subject. Much of what is said here applies to other databases as well (no doubt using different lingo). So please, feel free to post comments of how Oracle or MySQL or PostgreSQL or Interbase or your flat file - all have a great way of doing this. But please don't post about how your favorite DB is so great and Microsoft is the spawn of Satan. That's not helpful and it's makes me want to poke out my eye with an ice pick! Whew! Now that that's out of the way.

The database execution plan is the series of steps that a database takes to deliver a particular query or task. These steps are cached on the DB server. When you run a query it looks in the cache for a plan that matches. If it finds one, it uses it. If not, it creates a new one. Why is this important? Because the more often your DB Server finds a matching plan in the cache, the better it performs. In fact, it can run significantly faster when it is not tasked with constantly building execution plans from scratch. Here's the rub, much of the query code written in Coldfusion requires the RDBMS to compile a new execution plan. Here's why.


Changing Database Schemas may require a restart of CFMX

There's a tricky nuance that you must take into account when you make schema changes in MS SQL. When we discovered the following behavior I looked to see if I could find it blogged or documented somewhere. Failing that, I thought someone else out there might benefit from hearing about this issue. You might run into problems if you are dealing with the following conditions:

  • CFMX using JDBC drivers to MS SQL
  • A view with a "select *" in it for one or more tables
  • The need to change the schema of a particular table referenced with the asterisk (*) within the view
Here's what happens (and how to fix it).


Using the WITH RECOMPILE option in a Stored Procedure

This is an excellent article by Arthur Fuller that was sent as a newsletter. If you've ever wondered why a stored procedure doesn't save quite as much time as you expected - or why equivelent query code can even be faster - this may be the answer.



(reprinted from a previous blog)

Here's the dilemma. Let's say you have a log table where each row is a record of some action taken by a user. Perhaps the user logs in, updates his profile, searches for products and makes purchase. The table has the following fields:

  • log_id (int) - a primary key for the table.
  • user_id (int) - a foreign key reference to the "users" table.
  • action_type (char) - a string indicating "login", "update", "search" or "purchase".
  • UpdateTime (smalldatetime) - indicates when the event occurred.
Your boss (we'll call him Ralph) comes to you and says "I'd like to know the last action that each user took on the site before they went away. Can you build me a report like that? Well, you know a couple of ways to do this. One is that you can pull in the whole record set ordered by user and updatetime - then track the last record for each user. Or you might pull in all the users ids and do query looking for the max log_id for that user. Both of these approaches will work, but none of them will pull in the data required in a single query. Never Fear, there is another way - it's tricky, but it works!


the TOP keyword in SQL

One of the most common things I see when looking at code I'm trying to optimize is a missunderstanding of how to effectively utilize SQL and a good database platform. Nothing illustrates this more effectively than the constant use of the "maxrow" attribute for a <cfquery> or a <cfoutput> statement. In many cases the TOP keyword should be used instead. Let me Explain...


Blog provided and hosted by CF Webtools. Blog Sofware by Ray Camden.