Here's a sticky problem. How do you build a query that gives you distinct records from one table based on multiple records from another table and order by a date found in the second table. In my real world example, I have project tracks in "Items" and comments or notes in "events". I want a distinct list of "items" that have been updated in the past 10 days. That seems easy right? Well... not as easy as it seems on the surface.
Let's say in a project has been "commented on" ("events" table) several times during the 10 days. If I do something like:
Let's a look at the problem a bit closer. Items contains the master record for a particular track. For example, Items are related to individual bugs, feature requests, work orders, or change requests. For example, the Items table might look like:
ItemId DescriptionEvents contains comments or updates mapped to the itemid. So events might look lik this:
17 Wash Your Wife
22 Take out the dry cleaning
4 Pick up the dog
ItemId event DateIf we order by date we will end up with:
17 She Didn't like it 11-15-2006
17 Are you sure about this 11-12-2006
22 My Suite didn't like desert 11-14-2006
4 I should have bought a smaller dog 11-11-2006
ItemId event Date...so any "select distinct" join that includes the date will end up showing
17 She Didn't like it 11-15-2006
22 My Suite didn't like desert 11-14-2006
17 Are you sure about this 11-12-2006
4 I should have bought a smaller dog 11-11-2006
ItemId Description date
17 Wash Your Wife 11-15-2006
22 Take out the dry cleaning 11-14-2006
17 Wash Your Wife 11-12-2006
4 Pick up the dog 11-11-2006
What I want is to remove that third line. I already "know" that itemid 17 has been updated recently. I don't need to "know" that it was updated previously - even if it is inside my 10 day window. Sure, I can simply keep track of the itemIds in my loop and not display the ones that have already been handled. But surely there is another way? Here's the solution:
What's the secret? The subsect filters out all but the latest eventid - and that effects the date that is pulled in from the eventID table. This has the effect of removing that second level of items.
NOTE: Thanks to my good friend and SQL guru Mike Klostermeyer for this tip.
<code>
<Cfquery ...>
SELECT v.itemid, v.title, e.datecreated, e.eventid
FROM items v
INNER JOIN Events e
ON v.itemid = e.itemid
GROUP BY v.itemid, v.title, e.datecreated, e.eventid
ORDER BY e.datecreated desc
</CFQUERY>
</code>
should be faster.
I meant
<Cfquery ...>
SELECT v.itemid, v.title, e.eventid
FROM items v
INNER JOIN Events e
ON v.itemid = e.itemid
GROUP BY v.itemid, v.title, e.datecreated, e.eventid
ORDER BY e.datecreated desc
</CFQUERY>
<Cfquery ...>
SELECT v.itemid, v.title, e.eventid
FROM items v
INNER JOIN Events e
ON v.itemid = e.itemid
GROUP BY v.itemid, v.title, e.eventid
ORDER BY e.datecreated desc
</CFQUERY>
SELECT i.ItemID, i.Description, MAX(e.Created) AS LastUpdate
FROM items AS i
INNER JOIN events AS e ON (i.ItemID = e.ItemID)
GROUP BY i.ItemID, i.Description
HAVING MAX(e.Created) > DATEADD(DAY,-10,GETDATE())
ORDER BY LastUpdate DESC
Of course, that's got a bit of SQL Server-specific date magic, but you get the point. If you want the EventID that corresponds to that date, things get a little trickier:
SELECT i.ItemID, i.Description, f.EventID AS EventID, f.Created AS LastUpdate
FROM items AS i
INNER JOIN events AS e ON (i.ItemID = e.ItemID)
INNER JOIN events AS f ON (f.ItemID = i.ItemID)
GROUP BY i.ItemID, i.Description, f.EventID, f.Created
HAVING MAX(e.Created) > DATEADD(DAY,-10,GETDATE()) AND (f.Created = MAX(e.Created))
ORDER BY LastUpdate DESC
I know that looks heinous, with a join via the HAVING clause, but if you run it you'll see that it works.
Think of it this way: you need the Events table twice, once aggregated (to find the max date) and once unaggregated (to find the EventID that goes with that max date). The "e" alias is the aggregated version, hence why none of its columns show up in the GROUP BY clause. The "f" alias is the unaggregated version, so you can then tie back to the "e" table. You can't join in the ON clause like you normally would, because that happens before the aggregation, so you have to join in the HAVING clause, which happens after.
If your event dates really are just days and not full (unique) timestamps, you need to wrap an aggregating function like MIN or MAX around the f.EventID in the SELECT list and take it out of the GROUP BY list. This will pick just one of the Events from that same day and you won't get duplicate rows. You would still leave the f.Created in the GROUP BY, however.
Is that twisted, or what?
Whoa... that is twisted :) I think there is a future for you in writing the US tax code. That's an excellent and smart solution - and I bet it performs better than the sub select. Thanks for the input!
-Mark
Why not use derived tables? They're straightforward and likely outperform all of the queries above.
SELECT i.Itemid, i.Title, e.DateCreated AS LastUpdate
FROM (
SELECT ItemId, MAX(DateCreated) AS DateCreated
FROM Events
WHERE DateCreated > @tenDaysAgo
GROUP BY ItemId
)
AS e INNER JOIN Items i ON e.ItemID = i.ItemId
ORDER BY e.DateCreated desc
SELECT i.Itemid, i.Title, e.DateCreated, MAX(e.eventId) AS EventId
FROM (
SELECT ItemId, MAX(DateCreated) AS DateCreated
FROM Events
WHERE DateCreated > @tenDaysAgo
GROUP BY ItemId
)
AS mx
INNER JOIN Items i ON mx.ItemID = i.ItemId
INNER JOIN Events e ON mx.ItemID = e.ItemID
AND mx.DateCreated = e.DateCreated
GROUP BY i.Itemid, i.Title, e.DateCreated
ORDER BY e.DateCreated desc
> If you want the EventID that corresponds to that date,
> things get a little trickier:
You might check that query. It looks like it could return more than one row per itemId if two events have the same timestamp. Unlikely, but possible.