Interviewer Insights  
About Us  
Comments, Suggestions  
  or General Feedback  
Interested in writing for  
Tech Talk Tips?  

 

 


Database Admin
SQL: Defining and Visualizing NULLs

by Mike Wilkins


The following question series assesses experience with beginning to intermediate SQL techniques and should be asked of candidates who are DBAs, Database developers and anyone using SQL. It is best suited for developers and Database Administrators having some experience or training with SQL. The techniques applied are useful in a variety of situations and should be known to everyone using SQL.

This month we will examine one of the more controversial subjects in SQL, the subject of NULLs.

While most programming languages have Boolean expressions that are either TRUE or FALSE, SQL adds a third value, NULL. This three-valued logic is the source of much frustration for beginners, and still trips the more experienced for a moment, now and then.

The line of questioning presented here may be given to candidates of all experience levels. We will start with the basics and move to more difficult aspects.

Let's start with a simple question:

Of the following values, which is equal to the NULL of SQL?

A - 0 (zero)
B - " " (blank/space)
C - "" (Empty String)
D - -1 (negative one)
E - None of the above

The answer is "E", None of the above.

While some implementations of SQL, like Oracle's, equate an empty string with NULL, others do not. Oracle warns that future revisions may not equate an empty string to NULL, so like the numeric value of ZERO, do not equate it with NULL. Since NULLs apply to many datatypes, it is best to define the NULL generically. Therefore, let's ask the question in a different way.

Which of the following best describes the value of NULL in SQL (no specific datatype)?

A - Unknown
B - Missing
C - Not Applicable
D - All of the above

The answer is A - Unknown

An argument could be made for answer 'D', but UNKNOWN best describes NULL. The following scenarios will show why it is best to think of NULLs as UNKNOWN.

Consider the table, test_null:

Name Type
COL_VARCHAR VARCHAR2(11)
COL_NUMBER NUMBER(3)

COL_VARCHAR
COL_NUMBER
mike
100
george
200
larry
300
sherry
400
bob
500
betsy
frank
  33
999


The query,

SELECT * FROM test_null WHERE col_number = 200

Returns

COL_VARCHAR COL_NUMBER
george 200

One might assume that the following query,

SELECT * FROM test_null WHERE col_number <> 200

which reverses the WHERE-criteria, would produce all rows EXCEPT the one previously returned.

This would be true in two-value logic, but not in three-value logic.

Observe the results below.

COL_VARCHAR COL_NUMBER
mike 100
larry 300
sherry 400
bob 500
  33
999

The rows for betsy, frank and are not returned.

If we think of the NULL values as "missing", we might state that a missing value does not equal 200, and therefore should be included. Let's place some pseudo-code within the SQL statement to illustrate this.

Here is the pseudo-code for thinking of the NULL as missing.

SELECT * FROM test_null WHERE "missing value" <> 200

A missing value CANNOT be equal to 200.

But if we think of the NULL as UNKNOWN, then it MIGHT or MIGHT NOT be equal to 200.

SELECT * FROM test_null WHERE "UNKNOWN VALUE" <> 200

Therefore, defining and visualizing NULLs as UNKNOWN gives us a more accurate understanding of interaction between SQL and a Relational Database Management System (RDBMS).

NULLs in AND and OR Statements

Let's apply this to more complex criteria in the WHERE clause.
Which rows will be returned when the following SQL is applied to the test_null table?

SELECT * FROM test_null
WHERE col_varchar = 'larry' AND col_number = '300'

RESULTS:

COL_VARCHAR COL_NUMBER
larry 300

Now let's change the equal (=) to not equal (<>).

SELECT * FROM test_null
WHERE col_varchar <> 'larry' AND col_number <> '100'

RESULTS:

COL_VARCHAR COL_NUMBER
mike 100
george 200
sherry 400
bob 500

A NULL in either column causes the row to be excluded from the results because SQL cannot be sure that BOTH the first column does not equal 'larry' and the second column does not equal '100'

SELECT * FROM test_null
WHERE col_varchar = 'larry' OR col_number = '100'

RESULTS:

COL_VARCHAR COL_NUMBER
mike 100
larry 300

Again, let's change the equal (=) to not equal (<>).

SELECT * FROM test_null
WHERE col_varchar <> 'larry' OR col_number <> '100'

RESULTS:

COL_VARCHAR
COL_NUMBER
mike
100
george
200
larry
300
sherry
400
bob
500
betsy
frank
  33
999

Notice that only the row containing NULL values in both columns is excluded.

To summarize, look at the following tables.

Table 1

AND
TRUE
FALSE
UNKNOWN
TRUE
TRUE
FALSE
UNKNOWN
FALSE
FALSE
FALSE
FALSE
UNKNOWN
UNKNOWN
FALSE
UNKNOWN

Table 2

OR
TRUE
FALSE
UNKNOWN
TRUE
TRUE
TRUE
TRUE
FALSE
TRUE
FALSE
UNKNOWN
UNKNOWN
TRUE
UNKNOWN
UNKNOWN

Table 1 shows the combinations using the AND operator. Of course, in SQL we can use many AND operators. I've seen SQL statements generated by query tools that include tens, if not hundreds of AND and OR operators.
Another way to state a portion of AND logic is: FALSE AND anything is FALSE.

Table 2 shows combinations using the OR operator.
Another way to state a portion of OR logic is: TRUE OR anything is TRUE.


Summary:

Knowing the pitfalls of NULL logic can save a lot of time and frustration. Whenever unexpected results are returned from a query or a program based on SQL, survey the values for each column involved. Are NULLs allowed? Are there NULLs, empty strings or whitespace in the columns? Surveying values for columns involved in a query also provides data distribution, useful for tuning queries.

We might address some techniques to survey data and more advanced NULL difficulties in future columns.


About the author


Mike Wilkins is an Oracle Master DBA with more than twenty years IT experience in a variety of technologies. Mike's roles include, Database Designer/modeler, Programmer-Analyst, Database Administrator, UNIX Systems and Network Administrator, Network Designer/Analyst, and others. Mike has performed 300+ technical interviews in the last five years.


REVIEWNET is a trademark and service mark of ReviewNet Corporation © 1997-2002,
Jenkintown, PA - All Rights Reserved. This information is confidential and proprietary
to ReviewNet Corporation. Use, duplication or misappropriation is strictly prohibited.