ColdFusion Muse

Top and the UNION Query

Mark Kruger August 25, 2006 9:10 AM MS SQL Server, Coldfusion & Databases Comments (7)

A "UNION" query can be very useful. Take sorting for example. If I want to select 2 types of data from the same table using different conditions and order by my conditions union is a great tool. See my previous post on Using UNION in your queries. If you also like using TOP you may run into an unexpected behavior when using UNION. Let's say you had 2 tables, portalUsers and intranetUsers and you wanted to get the most recent entries in each that were unapproved. If you try something like this:

<cfquery name="myquery" datasource="mydsn">
      SELECT    TOP 1 fname, lname,
            'Portal' AS sourceTable,
            dateAdded
      FROM    portalUsers
      WHERE   unApproved = 1
   UNION
      SELECT TOP 1 fname, lname,
            'Intranet' AS sourceTable,
            dateAdded
      FROM   intranetUsers
      WHERE   Approved = 0
   ORDER BY dateAdded      
</cfquery>
You might expect to get 2 records equal to the most recently added portal user and the most recently added intranet user - right? You could take the first record and you would be good to go. Actually, unless the "portalUsers" table has a clustered index ordered by dateAdded, this is not the case.

The reason is simple, the condition of top is met for both of the queries before the sort order is applied. If you think about it, this makes sense. To do it any other way would require the order by clause to be run three times - once for each query and then once for the returning result set. Instead, the UNION clause goes out and gets the data for query 1, then the data for query 2 and then sorts the data. The result is that you will not be able to depend on getting the records you want from this UNION query.

The Fix: Use a View

One fix would be to use a view. Simply create a view without the TOP key word - like so:

<cfquery name="myquery" datasource="mydsn">
CREATE VIEW vwUnApprovedUsers
AS
      SELECT    fname, lname,
            'Portal' AS sourceTable,
            dateAdded
      FROM    portalUsers
      WHERE   unApproved = 1
   UNION
      SELECT fname, lname,
            'Intranet' AS sourceTable,
            dateAdded
      FROM   intranetUsers
      WHERE   Approved = 0
   ORDER BY dateAdded      
</cfquery>
You could then select directly from this view using TOP 1 and get the most recent record.

The Fix: An "In-line" View?

Still another way is to use an "inline view" and select from a subselect using top 1. That would look something like this:

<cfquery name="myquery" datasource="mydsn">
   SELECT TOP 1 *
   FROM
   ( SELECT    fname, lname,
            'Portal' AS sourceTable,
            dateAdded
      FROM    portalUsers
      WHERE   unApproved = 1
   UNION
      SELECT fname, lname,
            'Intranet' AS sourceTable,
            dateAdded
      FROM   intranetUsers
      WHERE   Approved = 0
      ) AS inLineView
   ORDER BY dateAdded      
</cfquery>
Ahah! This creates a dataset and then allows you to pull out only the first record. Make sure and test for performance to see if one or the other of these methods works better. NOTE: The subselect alias is required ("AS inLineView"). If you do not alias the subselect the query will error out - but there will be little indication in the error why it's throwing an exception.

  • Share:

Related Blog Entries

7 Comments

  • Rick O's Gravatar
    Posted By
    Rick O | 8/25/06 2:37 PM
    You probably want to be doing a UNION ALL instead of a UNION. It eliminates the merging and sorting of a plain UNION, speeding it up by turning it into what amounts to query concatenation. (And since your sourceTable column will prevent cross-table duplicates, there's no reason not to.)
  • Rick O's Gravatar
    Posted By
    Rick O | 8/25/06 2:47 PM
    Also, as a third option, you could grab the top items in each table, union them, then grab the top of those. This means that you would be reducing the UNION ALL from two (possibly huge) tables to two one-row tables.

    I just ran a test on some fairly large tables here and saw a significant performance boost. The first way (huge UNION) returned a row in 0:48, while the second way (tiny UNION) returned a row in 0:29.

    SELECT TOP 1 fname, lname, sourceTable, dateAdded FROM (
    SELECT fname, lname, 'Portal' AS sourceTable, dateAdded FROM (
    SELECT TOP 1 fname, lname, sourceTable, dateAdded
    FROM portalUsers
    WHERE (unApproved = 1)
    ORDER BY dateAdded DESC
    ) AS a
    UNION ALL
    SELECT fname, lname, 'Intranet' AS sourceTable, dateAdded FROM (
    SELECT TOP 1 fname, lname, sourceTable, dateAdded
    FROM intranetUsers
    WHERE (unApproved = 1)
    ORDER BY dateAdded DESC
    ) AS b
    ) AS c
    ORDER BY dateAdded DESC

    It certainly adds to the query complexity, but for large-table situations, it may make a big difference.
  • mkruger's Gravatar
    Posted By
    mkruger | 8/25/06 2:53 PM
    Excellent - thanks Rick - great stuff...
  • ep's Gravatar
    Posted By
    ep | 6/12/08 4:42 PM
    Can't you jsut do...

    SELECT TOP 200 * FROM
    (
    SELECT field FROM table1 WHERE (field= 'ahaaaa')
    UNION
    SELECT field FROM table2 WHERE (field= 'ahaaaa')
    ) as result
  • Anny's Gravatar
    Posted By
    Anny | 1/16/09 7:05 AM
    Great Post ! Would be great if you share more posts on mssql
  • Amaran Athigaman's Gravatar
    Posted By
    Amaran Athigaman | 2/19/09 11:55 PM
    I tried this , its working too

    SELECT TOP 200 * FROM
    (
    SELECT field FROM table1 WHERE (field= 'ahaaaa')
    UNION
    SELECT field FROM table2 WHERE (field= 'ahaaaa')
    ) as result

    what does this actually do, does it create any Dataset or anything..will this affect performance??.. whats the difference between this and inLineView.?
  • David's Gravatar
    Posted By
    David | 12/21/09 11:42 AM
    Wow, information on this was hard to find. I was getting that same error you described "NOTE: The subselect alias is required ("AS inLineView"). If you do not alias the subselect the query will error out - but there will be little indication in the error why it's throwing an exception." - thanks for helping me out! No idea why an alias would solve that error, but hey, whatever works!