Here is a continuation
of last month's question. This part of the question series will
address the skills of lower-intermediate level candidates..
We described our
simple database last time and asked for an SQL-statement to
return all rows and columns. Our data-entry department completed
entry of 500 customers.
The most common
answer involves a simple join on the PK-FK columns.
"SELECT * FROM
customer, customer_location where customer.customer_id=customer_location.customer_id"
Provided that the
candidate offers this or an equivalent answer, we continue
the scenario.
"Your query produced
many pages of output which we printed and I verified using
our paper-files. There are more than 500 lines of output because
each customer may have serveral locations or offices. One
of our clients has nearly thirty offices. However, as I began
counting the individual companies, I was surprised to find
only 463 customers listed in this report. I wondered if Data-entry
missed 37 customers, but they are usually quite thorough.
Perhaps you can help me resolve this discrepancy"
At this point,
many candidates will give a correct analysis and resolution.
If a candidate has no ideas, I will suggest that a count of
each table in this join might help.
Select count(*)
from customer
(Which returns)
500
select count(*)
from customer_location
(Which returns)
1783 (We simply made up this number, guessing about 3 or so
locations per customer)
So we do have 500
rows in the customer table and many more in customer_location.
Why does the query,
"SELECT count(*)
from customer, customer_location WHERE customer.customer_id
= customer_location.customer_id"
return only 463?
If the candidate
still doesn't find the solution, we can suggest the following
query:
"SELECT count(distinct
customer_id) from customer_location"
This query also
returns 463, showing that there are 37 matching customer_id
values missing in this table.
The answer is that
37 of the customers entered do not have location information.
Since the query is a simple equi-join, all customer records
without matching customer_location records will be excluded
from the query output.
The resolution of the problem is to use an OUTER-JOIN. An
outer join will return rows from the "inner" or leftmost table
even if they have no matching "outer" records.
Of course, there
are three kinds of outer joins in the SQL standard. This is
a good point to discuss the left, right and full-outer joins.
Vendor implementation and syntax of outer-joins varies, so
reference the vendor documentation.
About the author
Mike Wilkins is an Oracle Master DBA with more than twenty
years IT experience in a variety of technologies. Mike's roles
include, Database Designer/modeler, Programmer-Analyst, Database
Administrator, UNIX Systems and Network Administrator, Network
Designer/Analyst, and others. Mike has performed 300+ technical
interviews in the last five years.
|