Using SQL to Mege Three Incomplete Data Sets
You can join data sets. LEFT JOIN, RIGHT JOIN, even OUTER APPLY. There’s nothing to learn here.
Last week, I had what seemed to be a simple join of three data sets. My permutation was a lot more complicated than this, but I had to boil it down to this example in order to think it through.
What makes it tricky is that none of the three sets of data have ALL of the keys that we need in order to represent our full, inclusive result set.
Let’s look at an example.
Table 1:
Key | Animal |
---|---|
A | aardvark |
B | bird |
C | cat |
Table 2:
Key | Fruit |
---|---|
B | banana |
A | apple |
P | pear |
Table 3:
Key | Activity |
---|---|
A | act |
C | climb |
P | push |
R | run |
And then we’re looking for a result like this:
Key | Animal | Fruit | Action |
---|---|---|---|
A | aardvark | apple | act |
B | bird | banana | NULL |
C | cat | NULL | climb |
P | NULL | pear | push |
R | NULL | NULL | run |
Let’s create our data and figure it out.
1 2 3 4 5 6 7 |
DECLARE @t1 TABLE (col1 char(1), animal varchar(25)) DECLARE @t2 TABLE (col1 char(1), fruit varchar(25)) DECLARE @t3 TABLE (col1 char(1), activity varchar(25)) INSERT @t1 VALUES ('A','aardvark'), ('B', 'bird'), ('C','cat') INSERT @t2 VALUES ('B', 'banana'), ('A', 'apple'), ('P', 'pear') INSERT @t3 VALUES ('A', 'act'), ('C', 'climb'), ('P', 'push'), ('R', 'run') |
We’re going to need a full outer join between the three tables, so let’s start with that.
1 2 3 4 |
SELECT * FROM @t1 t1 FULL OUTER JOIN @t2 t2 ON t1.col1 = t2.col1 FULL OUTER JOIN @t3 t3 ON t1.col1 = t3.col1 OR t2.col1 = t3.col1 |
The key that makes this tricky is in that 2nd outer join. Because our key column might or might not be in either of our first two tables, we have to tell the server to try the join to t1, but also tell it to look at t2 if the key isn’t there in t1.
If we were to go to four tables, we would have to catch all of the permutations in order to be sure our join would work correctly. It would quickly get difficult to read, and I suspect, difficult for the server to perform efficiently.
We can use the COALESCE operator to simplify.
1 2 3 4 |
SELECT * FROM @t1 t1 FULL OUTER JOIN @t2 t2 ON t1.col1 = t2.col1 FULL OUTER JOIN @t3 t3 ON COALESCE(t1.col1, t2.col1) = t3.col1 |
With this syntax, we can just keep adding columns on there as we add tables. It’s not exactly telling server to do the same thing. It’s saying that if t1.col1 matches, join there. Otherwise, move along to t2.col1. But since our keys are common in these sets, the result is exactly what we want.
Speaking of results, we’re not quit there yet.
col1 | animal | col1 | fruit | col1 | activity |
---|---|---|---|---|---|
A | aardvark | A | apple | A | |
B | bird | B | banana | NULL | NULL |
C | cat | NULL | NULL | C | climb |
NULL | NULL | P | pear | P | push |
NULL | NULL | NULL | NULL | R | run |
Let’s use COALESCE again and put this one to bed.
1 2 3 4 5 6 7 |
SELECT COALESCE(t1.col1, t2.col1, t3.col1) AS col1 , t1.animal , t2.fruit , t3.activity FROM @t1 t1 FULL OUTER JOIN @t2 t2 ON t1.col1 = t2.col1 FULL OUTER JOIN @t3 t3 ON COALESCE(t1.col1, t2.col1) = t3.col1 |
Bingo!
Key | Animal | Fruit | Action |
---|---|---|---|
A | aardvark | apple | act |
B | bird | banana | NULL |
C | cat | NULL | climb |
P | NULL | pear | push |
R | NULL | NULL | run |