ColdFusion Muse

Fun with th HAVING Clause

You probably already know about the GROUP BY clause in SQL - but have you ever tried "HAVING"? The group by clause is great for getting aggregate groups of information together. Let's say you want a count of how many times a keyword appears in a table. You have the words car, auto, van and bus all in a keywords table.

sample data

id          keyword
1 auto
2 auto
3 auto
4 van
5 bus
6 truck
7 taxi
8 car
9 car

A count of the totals is easy using group by:
<cfquery name="getkeyword" datasource="#mydsn#">
    SELECT count(id) AS tot, keyword
    FROM     keywordTest
    GROUP    BY keyword
    ORDER BY tot desc
</cfquery>
query - Top 6 of 6 Rows
KEYWORDTOT
1 bus 1
2 taxi 1
3 truck 1
4 van 1
5 car 2
6 auto 3
But what if you get tired of looking at all the single search words? What if you wanted to get at only the words where there are more than 3 entries? That's where "HAVING" comes in.

GROUP By With HAVING

The HAVING clause works after the fact. In other words, it doesn't do it's thing until after the result set is built. That is how it works its magic. If we change the above query to:

<cfquery name="getkeyword" datasource="#mydsn#">
    SELECT count(id) AS tot, keyword
    FROM     keywordTest
    GROUP    BY keyword
    HAVING count(id) >
2
    ORDER BY tot desc
</cfquery>
We will end up with just 1 result - "auto" with a total of 3 entries. One word of caution. "HAVING" makes your database work. Remember it must first retrieve the results, then filter them, then order them. So it should be used with care.

Another Clever Use for HAVING

A client recently asked for help with a complicated query using keywords. In his case he had pictures tied to keywords with a one-to-many relationship. In other words, a single picture could match many keywords. When a user searched for "auto and taxi" the client wanted the result set to contain only items that had both auto and taxi attached as keywords. Having gave us the ability to do it. Here's the query (with the names changed to protect the innocent).

<cfquery datasource="#mydsn#" name="kw">
SELECT I.cID, I.picName, I.picNumber
FROM pics I INNER JOIN pic_keyword_rel R
ON i.picID = r.picID
INNER JOIN keyw K ON r.kwID = k.kwID
WHERE r.cID = #someval#
AND k.kword IN (<cfqueryparam value="auto,taxi" cfsqltype="cf_sql_varchar" list="yes">)
GROUP BY I.cID, I.picName, I.picNumber
HAVING count(i.picNumber) = <cfqueryparam cfsqltype="CF_SQL_INTEGER" value="#listlen('auto,taxi')#"/>
</cfquery>
Nifty eh? This query builds a group of pics and then filters out the ones that don't match all of the words in the list. I can think of no easier SQL way to do this (but I'm prepared to be enlightened :).

Comments
Phillip Senn's Gravatar I don't understand the
HAVING count(i.picNumber) = <cfqueryparam cfsqltype="CF_SQL_INTEGER" value="auto,taxi"/>

Should it be
HAVING count(i.picNumber) = 2
?
# Posted By Phillip Senn | 10/23/06 12:24 PM
mkruger's Gravatar Whoops... You are right - I removed my "listlen" function when I was obscuring this code (trying not to reveal anything about my client).
# Posted By mkruger | 10/23/06 12:44 PM
Rick O's Gravatar Your version assumes good data and that the pic_keyword_rel table doesn't have any duplicate data. If there are two entries for 'auto' and no entries for 'taxi', you will still return a hit. (Not that we would ever have bogus data!)

As an alternative:

<cfquery datasource="#mydsn#" name="kw">
SELECT I.cID, I.picName, I.picNumber
FROM pics I INNER JOIN pic_keyword_rel R
ON i.picID = r.picID
INNER JOIN keyw K ON r.kwID = k.kwID
WHERE (r.cID = #someval#)
AND (k.kword IN (<cfqueryparam value="#kwList#" cfsqltype="cf_sql_varchar" list="yes">))
GROUP BY I.cID, I.picName, I.picNumber
HAVING ((0
<cfloop list="#kwList#" index="Keyword">
+ MAX(CASE WHEN k.kword = <cfqueryparam value="#Keyword#" cfsqltype="cf_sql_varchar"> THEN 1 ELSE 0 END)
</cfloop>)
= <cfqueryparam cfsqltype="CF_SQL_INTEGER" value="#listlen('auto,taxi')#"/>)
</cfquery>

Twisted and convoluted, sure.
# Posted By Rick O | 10/23/06 5:29 PM
Stephen Moretti's Gravatar Mark,

As a complete aside, you should be able to use :
HAVING tot > 2
rather than
HAVING Count(i.picNumber) > 2
because HAVING filter occurs after the aggregate has been done.

This certainly works in MySQL, not sure about other DBs, but I would be surprised if it doesn't. It should stop the COUNT(x) from being recalculated in the HAVING clause.
# Posted By Stephen Moretti | 10/24/06 4:04 AM
mkruger's Gravatar Rick,

I thought of the case - but it's so messy - and it's more expensive than my approach. As for your "good data" comment - you are right, but in this case the DB is actually more complex than I'm exposing here. A 1 to 1 relationship between a pic and a keyword is assured... although a pic can have more than one keyword associated, it can only have 1 relationship to any given keyword - so I'm safe.



Stephen, you mixed example 1 and example 2 - in example 2 there is no aggregate ... not derived column called "tot". But in the case of example 1 - yes... I could have put "HAVING tot > 2" instead of "count(id) > 2".

Thanks to both of you for the comments.
# Posted By mkruger | 10/24/06 7:32 AM



Blog provided and hosted by CF Webtools. Blog Sofware by Ray Camden.