|
The following
question series assesses experience with beginning to intermediate
SQL techniques and should be asked of candidates who are DBAs,
Database developers and anyone using SQL. It is best suited
for developers and Database Administrators having some experience
or training with SQL. The techniques applied are useful in
a variety of situations and should be known to everyone using
SQL.
This month we will
examine one of the more controversial subjects in SQL, the
subject of NULLs.
While most programming
languages have Boolean expressions that are either TRUE or
FALSE, SQL adds a third value, NULL. This three-valued logic
is the source of much frustration for beginners, and still
trips the more experienced for a moment, now and then.
The line of questioning
presented here may be given to candidates of all experience
levels. We will start with the basics and move to more difficult
aspects.
Let's start with
a simple question:
Of
the following values, which is equal to the NULL of SQL?
A
- 0 (zero)
B - " " (blank/space)
C - "" (Empty String)
D - -1 (negative one)
E - None of the above
The answer is "E",
None of the above.
While some implementations
of SQL, like Oracle's, equate an empty string with NULL, others
do not. Oracle warns that future revisions may not equate
an empty string to NULL, so like the numeric value of ZERO,
do not equate it with NULL. Since NULLs apply to many datatypes,
it is best to define the NULL generically. Therefore, let's
ask the question in a different way.
Which
of the following best describes the value of NULL in SQL (no
specific datatype)?
A
- Unknown
B - Missing
C - Not Applicable
D - All of the above
The answer is A
- Unknown
An argument could
be made for answer 'D', but UNKNOWN best describes NULL. The
following scenarios will show why it is best to think of NULLs
as UNKNOWN.
Consider the table,
test_null:
| Name |
Type |
| COL_VARCHAR |
VARCHAR2(11) |
| COL_NUMBER |
NUMBER(3) |
| COL_VARCHAR |
COL_NUMBER
|
| mike |
100
|
| george |
200
|
| larry |
300
|
| sherry |
400
|
| bob |
500
|
| betsy |
|
| frank |
|
 |
33
|
 |
999
|
 |
|
The query,
SELECT * FROM
test_null WHERE col_number = 200
Returns
| COL_VARCHAR |
COL_NUMBER |
| george |
200 |
One might assume
that the following query,
SELECT * FROM
test_null WHERE col_number <> 200
which reverses
the WHERE-criteria, would produce all rows EXCEPT the one
previously returned.
This would be true
in two-value logic, but not in three-value logic.
Observe the results
below.
| COL_VARCHAR |
COL_NUMBER |
| mike |
100 |
| larry |
300 |
| sherry |
400 |
| bob |
500 |
 |
33 |
 |
999 |
The rows for betsy,
frank and
are not returned.
If we think of
the NULL values as "missing", we might state that a missing
value does not equal 200, and therefore should be included.
Let's place some pseudo-code within the SQL statement to illustrate
this.
Here is the pseudo-code
for thinking of the NULL as missing.
SELECT * FROM
test_null WHERE "missing value" <> 200
A missing value
CANNOT be equal to 200.
But if we think
of the NULL as UNKNOWN, then it MIGHT or MIGHT NOT be equal
to 200.
SELECT * FROM
test_null WHERE "UNKNOWN VALUE" <> 200
Therefore, defining
and visualizing NULLs as UNKNOWN gives us a more accurate
understanding of interaction between SQL and a Relational
Database Management System (RDBMS).
NULLs
in AND and OR Statements
Let's apply this
to more complex criteria in the WHERE clause.
Which rows will be returned when the following SQL is applied
to the test_null table?
SELECT * FROM
test_null
WHERE col_varchar = 'larry' AND col_number = '300'
RESULTS:
| COL_VARCHAR |
COL_NUMBER |
| larry |
300 |
Now let's change
the equal (=) to not equal (<>).
SELECT * FROM
test_null
WHERE col_varchar <> 'larry' AND col_number <> '100'
RESULTS:
| COL_VARCHAR |
COL_NUMBER |
| mike |
100 |
| george |
200 |
| sherry |
400 |
| bob |
500 |
A NULL in either
column causes the row to be excluded from the results because
SQL cannot be sure that BOTH the first column does not equal
'larry' and the second column does not equal '100'
SELECT * FROM
test_null
WHERE col_varchar = 'larry' OR col_number = '100'
RESULTS:
| COL_VARCHAR |
COL_NUMBER |
| mike |
100 |
| larry |
300 |
Again, let's change
the equal (=) to not equal (<>).
SELECT * FROM
test_null
WHERE col_varchar <> 'larry' OR col_number <> '100'
RESULTS:
| COL_VARCHAR |
COL_NUMBER
|
| mike |
100
|
| george |
200
|
| larry |
300
|
| sherry |
400
|
| bob |
500
|
| betsy |
|
| frank |
|
 |
33
|
 |
999
|
Notice that only
the row containing NULL values in both columns is excluded.
To summarize, look
at the following tables.
Table 1
|
AND
|
TRUE
|
FALSE
|
UNKNOWN
|
|
TRUE
|
TRUE
|
FALSE
|
UNKNOWN
|
|
FALSE
|
FALSE
|
FALSE
|
FALSE
|
|
UNKNOWN
|
UNKNOWN
|
FALSE
|
UNKNOWN
|
Table 2
|
OR
|
TRUE
|
FALSE
|
UNKNOWN
|
|
TRUE
|
TRUE
|
TRUE
|
TRUE
|
|
FALSE
|
TRUE
|
FALSE
|
UNKNOWN
|
|
UNKNOWN
|
TRUE
|
UNKNOWN
|
UNKNOWN
|
Table 1 shows the
combinations using the AND operator. Of course, in SQL we
can use many AND operators. I've seen SQL statements generated
by query tools that include tens, if not hundreds of AND and
OR operators.
Another way to state a portion of AND logic is: FALSE AND
anything is FALSE.
Table 2 shows combinations
using the OR operator.
Another way to state a portion of OR logic is: TRUE OR anything
is TRUE.
Summary:
Knowing
the pitfalls of NULL logic can save a lot of time and frustration.
Whenever unexpected results are returned from a query or a
program based on SQL, survey the values for each column involved.
Are NULLs allowed? Are there NULLs, empty strings or whitespace
in the columns? Surveying values for columns involved in a
query also provides data distribution, useful for tuning queries.
We might address
some techniques to survey data and more advanced NULL difficulties
in future columns.
About the author
Mike Wilkins is an Oracle Master DBA with more than twenty
years IT experience in a variety of technologies. Mike's roles
include, Database Designer/modeler, Programmer-Analyst, Database
Administrator, UNIX Systems and Network Administrator, Network
Designer/Analyst, and others. Mike has performed 300+ technical
interviews in the last five years.
|