The Art of SQL Report Writing

Part 3 of 5

Building Queries

Before I begin this section, I would like to emphasize that the technique here is meant to get you a simple and methodical approach to getting the majority of your report complete. Reports can be quite complex and you need to use your head. The intention here is for you to spend significantly less time formulating a clean, easy to read and maintain report, not to write them all for you. If it were that easy then we wouldn't need a developer to create the report.

When you start building your query, begin by highlighting the tables on your ERD that the report will pull data from. This will give you the minimum list of tables you will need to include in your query. Make sure you include all columns that will be used, including columns used for calculated values (sums, counts, averages, case statements, etc.), columns used for sorting purposes and columns used for filtering (where and having clauses). We will call these our primary tables.

Next, trace the path along the foreign key relationships that join the primary tables. These paths may run across additional tables that were not highlighted in the first step above. Sometimes those additional tables can be short circuited and do not need to be included in the report. This is when you have a situation where two primary tables have a foreign key to a non-primary table and can be joined directly by this foreign key without including the non-primary table. Otherwise, also highlight these tables that lie in the paths between primary tables. We will call these our secondary tables.

Next, look at your ERD and determine the focal point of your report. The focal point is the center of attention when pulling out data. To determine this focal point, take a look at your highlighted ERD along side an example of what your final report is supposed to look like. Find the table where every line in the report maps to a single record in that table and that no two lines in the report point to the same record in the table. If you cannot find a table like this, look for a table that most closely matches this one line in the report to zero or one row in the table relationship. If you have several that seem likely candidates, select the lowest table on the ERD (the table that appears physically closes to the bottom of the diagram) that has the relationship of one line in the report maps to zero or one row in the table. If you still have problems selecting a focal point, you probably will want to break the report into sub-reports first.

A note on the focal point: 99% of the time you should find locating a focal point in the database very easy. If you are having a lot of difficulty doing this, it is often an indication that the report is either too complex or not cleanly defined. If it is too complex, try breaking it into several small reports first or sub-reports. You can use the sub-reports to build the larger report. If it is not very clean, you may want to go back to the person who defined the report and walk through it with them. Take sample data as you have it in the database and manually walk through with the person how that data would be assembled in the report "by hand."

When I design reports, I typically ask the user where their focal point is, before I even begin building the report. I walk in with the ERD and show the user what tables / entities they have to choose from. If you are the one designing the report with the end user, you may find it immensely helpful to start with the focal point BEFORE you draw out what the report looks like. Then you can show the user what data they can draw into the report. I find this results in much richer reports for the user, easier to write SQL and easier to optimize queries.


Once you have your primary, secondary and focal point tables identified, you need to look at the pattern of the highlighted tables and foreign keys on your ERD. You want to see that all traced foreign keys start from the focal point and radiate outward, never coming back toward the focal point. For example, if you have a focal point table F and it has a foreign key up to primary table A which then has a foreign key down to table B, this can pose a problem. Having B below A means you could end up with duplicate records from A appearing once you join A to B which will then result in duplicate records in F once joined to A. We do NOT want records from our focal point appearing more than once in the report.

To deal with the situation where your paths do not radiate outward from the focal point table, clump tables together into sub-reports. So if F -> A <- B then you should create a simpler report that contains only A and B where A is the focal point. Once that simpler report is created, then join F to this report as a sub-query. This will help ensure we don't get duplicates of records in F.

Once you have your focal point, you are ready to begin writing your query. We start by typing in the following:
SELECT
    ____
FROM focus_table
;
Where "focus_table" is the name of the table you identified as the focal point of the report.

Next, create inner or outer joins to all tables that appear ABOVE the focal point table. These are tables that appear physically above the point where the focal point table sits on the ERD.

SELECT
    ____
FROM focus_table
-- Joins up the ERD
INNER JOIN table_above_1 ON (table_above_1.table_above_1_id = focus_table.table_above_1_id)
INNER JOIN table_above_1_a ON (table_above_1_a.table_above_1_a_id = table_above_1.table_above_1_a_id)
INNER JOIN table_above_2 ON (table_above_2.table_above_2_id = focus_table.table_above_2_id)
LEFT OUTER JOIN table_above_3 ON (table_above_3.table_above_3_id = focus_table.table_above_3_id)
;


Note that you use an "INNER JOIN" when the foreign key is not nullable and it comes from a table that is either the focal point or a table that was also in an INNER JOIN. You use a "LEFT OUTER JOIN" on any table where the foreign key is nullable or where its "ON" statement is to a table that also has a LEFT OUTER JOIN.

There are exceptions to this INNER/OUTER JOIN rule. For example, if you only want data that is attached to table X then you probably want an INNER JOIN regardless of the relationship between X and the table below it on the ERD. The joins will likely need to be reviewed at the end when fine tuning your report query.

Next, you need to join the tables that appear BELOW the focal point table. Any column from one of these tables will almost definitely appear in an aggregate function. The reason for this is that if it is not and a lower table has more than one record for a single record in your focal point, you may end up with that one record in the focal point table appearing twice in the report. Recall that we specifically picked out the focal point table such that one record in the focal point table will not appear more than once in the report.

Perform a LEFT OUTER JOIN on any of the tables that fall below your focal point table.

SELECT
    ____
FROM focus_table
-- Joins up the ERD
INNER JOIN table_above_1 ON (table_above_1.table_above_1_id = focus_table.table_above_1_id)
INNER JOIN table_above_1_a ON (table_above_1_a.table_above_1_a_id = table_above_1.table_above_1_a_id)
INNER JOIN table_above_2 ON (table_above_2.table_above_2_id = focus_table.table_above_2_id)
LEFT OUTER JOIN table_above_3 ON (table_above_3.table_above_3_id = focus_table.table_above_3_id)
-- Joins down the ERD
LEFT OUTER JOIN table_below_1 ON (table_below_1.focus_table_id = focus_table.focus_table_id)
LEFT OUTER JOIN table_below_2 ON (table_below_2.table_below_1_id = table_below_1.table_below_1_id)
;


Next, you need to add your list of columns into the query. Make sure that any column from a table down the ERD only appears in an aggregate function such as max, min, sum, average, etc.

SELECT
    focus_table.name,
    focus_table.desc,
    table_above_1_a.type,
    table_above_2.style,
    table_above_3.address,
    sum(table_below_1.sales_total) as sum_sales_total,
    min(table_below_2.lead_dt) as first_lead_dt,
    max(table_below_2.lead_dt) as last_lead_dt
FROM focus_table
-- Joins up the ERD
INNER JOIN table_above_1 ON (table_above_1.table_above_1_id = focus_table.table_above_1_id)
INNER JOIN table_above_1_a ON (table_above_1_a.table_above_1_a_id = table_above_1.table_above_1_a_id)
INNER JOIN table_above_2 ON (table_above_2.table_above_2_id = focus_table.table_above_2_id)
LEFT OUTER JOIN table_above_3 ON (table_above_3.table_above_3_id = focus_table.table_above_3_id)
-- Joins down the ERD
LEFT OUTER JOIN table_below_1 ON (table_below_1.focus_table_id = focus_table.focus_table_id)
LEFT OUTER JOIN table_below_2 ON (table_below_2.table_below_1_id = table_below_1.table_below_1_id)
;


Next you need to add your group by clause. You do this by simply adding a "GROUP BY" and include all columns in your select that are NOT in an aggregate function.

SELECT
    focus_table.name,
    focus_table.desc,
    table_above_1_a.type,
    table_above_2.style,
    table_above_3.address,
    sum(table_below_1.sales_total) as sum_sales_total,
    min(table_below_2.lead_dt) as first_lead_dt,
    max(table_below_2.lead_dt) as last_lead_dt
FROM focus_table
-- Joins up the ERD
INNER JOIN table_above_1 ON (table_above_1.table_above_1_id = focus_table.table_above_1_id)
INNER JOIN table_above_1_a ON (table_above_1_a.table_above_1_a_id = table_above_1.table_above_1_a_id)
INNER JOIN table_above_2 ON (table_above_2.table_above_2_id = focus_table.table_above_2_id)
LEFT OUTER JOIN table_above_3 ON (table_above_3.table_above_3_id = focus_table.table_above_3_id)
-- Joins down the ERD
LEFT OUTER JOIN table_below_1 ON (table_below_1.focus_table_id = focus_table.focus_table_id)
LEFT OUTER JOIN table_below_2 ON (table_below_2.table_below_1_id = table_below_1.table_below_1_id)
-- Group by clause
GROUP BY
    focus_table.name,
    focus_table.desc,
    table_above_1_a.type,
    table_above_2.style,
    table_above_3.address
;


Next, add your filtering. If you are filtering on a table above the focus table, put it into your WHERE clause. If you are filtering on a table below the focus table, you put the filter in the WHERE clause if you are filtering out records before you aggregate the data or in the HAVING clause if you are filtering out records after you aggregate the data.

SELECT
    focus_table.name,
    focus_table.desc,
    table_above_1_a.type,
    table_above_2.style,
    table_above_3.address,
    sum(table_below_1.sales_total) as sum_sales_total,
    min(table_below_2.lead_dt) as first_lead_dt,
    max(table_below_2.lead_dt) as last_lead_dt
FROM focus_table
-- Joins up the ERD
INNER JOIN table_above_1 ON (table_above_1.table_above_1_id = focus_table.table_above_1_id)
INNER JOIN table_above_1_a ON (table_above_1_a.table_above_1_a_id = table_above_1.table_above_1_a_id)
INNER JOIN table_above_2 ON (table_above_2.table_above_2_id = focus_table.table_above_2_id)
LEFT OUTER JOIN table_above_3 ON (table_above_3.table_above_3_id = focus_table.table_above_3_id)
-- Joins down the ERD
LEFT OUTER JOIN table_below_1 ON (table_below_1.focus_table_id = focus_table.focus_table_id)
LEFT OUTER JOIN table_below_2 ON (table_below_2.table_below_1_id = table_below_1.table_below_1_id)
-- Where clause
WHERE
    table_above_1_a.type in ('SALES', 'MARKETING')
    AND
    table_below_1.sales_total > 1000
-- Group by clause
GROUP BY
    focus_table.name,
    focus_table.desc,
    table_above_1_a.type,
    table_above_2.style,
    table_above_3.address
-- Having clause
HAVING
    min(table_below_2.lead_dt) >= '2011-01-01'
;


At this point you are ready to run your query and see what you get. This should get you the majority of the way to finishing up your report. Often it will be about all you need to say you are finished.