This month's
database question is on SQL and is authored by Mike Wilkins.
The question assesses the skill of low to mid-level skilled
Developers and DBA's using SQL. Mike has depicted a scenario
that many database pros will need to be able to work around.
Posing scenarios to your candidates is one of the best ways
to assess their real world experience. Ask a number of questions
around the scenario to get a broad understanding of the candidate's
skills. Next month's newsletter will include a continuation
of this scenario to address more senior-level skills.
Background
information:
"A company
has a simple database. It contains two tables, customer and
customer_location. The customer table contains basic information
about our customer, including customer name, date they became
a customer, national contact, etc.; while the customer_location
table contains detailed information about our customer's offices.
For example, the customer_location table will contain the
address, city, state, zip code, location contact, phone number,
fax number, etc.
One customer may have many branch locations, so the relationship
is one-to-many. The customer table has a primary key on the
customer_id column, which is a unique 'serial' number, like
'762345'. The customer_location table also contains a column
called customer_id, which references the column in the customer
table via a foreign key constraint. Therefore, there is a
primary key - foreign key relationship between the customer_table.customer_id
and customer_location.customer_id. The data-entry department
just finished loading the first 500 customers into the database"
Provide
a SQL statement that returns all columns and rows from these
two tables, joined by the customer_id (PK-FK).
What
to look for in your candidate's response:
SQL
is quite flexible, so there are several ways to get the desired
result.
ANSWERS
(see comments after the answer variations)
ANSWER
variation 1 (partially correct)
SELECT * from customer, customer_location
WHERE customer.customer_id=customer_location.customer_id
ANSWER
variation 2 (partially correct)
Select customer.customer_id,
customer.customer_date, customer.founded_date,
customer.web_site,
customer.contact, customer.stock_symbol,
customer_location.location_id,
customer_location.address1, customer_location.address2,
customer_location.city, customer_location.state,
customer_location.zip,
customer_location.phone
FROM Customer, customer_location
WHERE customer.customer_id =
customer_location.customer_id
NOTE:
The qualifiers shown in blue
are only necessary when the column name is common to tables
referenced in the query. Since it is the only column that
is common in this query, only customer.customer_id
MUST be qualified. Also, an alias may be substituted for the
table name, specified in the FROM clause. For example: "FROM
Customer C, customer_location
CL"
CONCEPT ONE: JOINING THE TABLES
The first important
concept a candidate should know is that a JOIN is required.
If we simply use:
"select * from customer, customer_location"
the query will run, but it will return the 'CARTESIAN PRODUCT'
of the tables. The Cartesian Product is EVERY row from the
customer table joined with EVERY row from the customer_location
table, without regard for values in any column. The WHERE
clause is used to JOIN the tables.
CONCEPT TWO: QUALIFYING COMMON COLUMNS
The second important
concept is that common columns in a multi-table query must
be qualified with the table name or alias. Joining the table
using the WHERE clause, "WHERE customer_id = customer_id",
will cause the query to fail because the columns are ambiguous.
The database cannot (and should not) assume to which table
the column belongs. The same rules apply to common columns
referenced in the SELECT or other portions of a query.
CONCEPT THREE: PROBLEMS ENCOUNTERED
WITH EQUI-JOINS (BONUS)!!
The more experienced
candidate will recognize that a JOIN involving a one-to-many
relationship may drop some rows from the query output if there
is not a matching customer_location row for every customer
row. The primary-to-foreign key constraint does not allow
insertion of a customer_location record without a matching
value customer_id in the customer table. So there will never
be a customer_location (child) row without a matching customer
row (parent). However, we can insert a row into customer without
having a matching customer_id value (row) in the child table,
customer_location. In this case, the simple equi-join, "customer.customer_id
= customer_location.customer_id", is not true, and the
row from the customer table is not returned.
For example:
| customer.customer_id |
customer_location.customer_id |
| 100 |
100 |
| 100 |
100 |
| 200 |
200 |
| 300 |
300 |
| 300 |
300 |
| 300 |
300 |
| 400 |
** No matching record **
|
| 500 |
500 |
| 500 |
500 |
The queries listed above would exclude the customer having the
customer_id value of 400 because there is no matching value
in the customer_location table.
The knowledgeable
candidate will mention that an OUTER JOIN will resolve the
situation.
More questions
on this scenario next time...
About the author
Mike Wilkins is an Oracle Master DBA with more than twenty
years IT experience in a variety of technologies. Mike's roles
include, Database Designer/modeler, Programmer-Analyst, Database
Administrator, UNIX Systems and Network Administrator, Network
Designer/Analyst, and others. Mike has performed 300+ technical
interviews in the last five years.
|