|
The following
question assesses experience with intermediate to advanced
SQL techniques and should be asked of candidates that are
DBAs, Database developers and anyone using SQL. It is best
suited for developers and Database Administrators having at
least a year of solid experience with SQL. The techniques
applied are useful in a variety of situations and should be
known to everyone using SQL.
This month we will
continue our look at NULLs in SQL. Our last column asked about
the value and how to visualize the NULL. It also pointed to
the fact that reversing the operator in SQL, from EQUAL (=)
to NOT EQUAL (<>), does not provide the inverse result when
NULLs are involved.
NULLs can complicate
SQL logic, so we want to control them.
That brings us
to our first question.
In
SQL, how can we prevent the existence of NULLs in a particular
column?
What's that? Revoke
all permissions from the table? Very funny. We could also
shut down the database, the computer, and disconnect it. But
let's talk about more practical ways of forbidding the existence
of NULLs in a column. There are several possible answers,
so let's look at a few.
Answer
1: Place
a NOT NULL constraint on the column.
Constraints restrict
the values that may be placed in columns. The NOT NULL constraint
simply prohibits NULLs from being placed in a column.
Without constraints,
we might encounter rows having NULLs for all columns, as in
the table we established, test_null. Notice that both columns
contain NULLs in the last row of the table.
| Name |
Type |
| COL_VARCHAR |
VARCHAR2(11) |
| COL_NUMBER |
NUMBER(3) |
| COL_VARCHAR |
COL_NUMBER
|
| mike |
100
|
| george |
200
|
| larry |
300
|
| sherry |
400
|
| bob |
500
|
| betsy |
|
| frank |
|
 |
33
|
 |
999
|
 |
|
We have indicated
the presence of NULLs by .
However, many tools simply show NULLs as blank spaces. For
example, Oracle's SQL*PLUS shows the following by default.
SQL> SELECT
* FROM test_null;
| COL_VARCHAR |
COL_NUMBER
|
| mike |
100
|
| george |
200
|
| larry |
300
|
| sherry |
400
|
| bob |
500
|
| betsy |
|
| frank |
|
| |
33
|
| |
999
|
| |
|
10 rows selected.
The last row is
now invisible to the naked eye. This and similar conditions
have tripped quite a few of us. Depending on the tool, you
may be able to change the default display output of NULLs
to something more indicative, like .
Remember, when
the output of SQL queries doesn't seem to make sense, check
the values for NULLs and spaces (white space).
Answer
2: Place
a UNIQUE constraint on the column.
The UNIQUE constraint
ensures that every value for a column, or combination of columns,
is unique. Therefore, having a UNIQUE constraint on our column,
COL_VARCHAR, would allow one row with a NULL in that column,
but not more than one. Right?
<
PLAY BUZZER SOUND > WRONG!
The UNIQUE constraint
does not consider NULLs when enforcing uniqueness.
Answer
3: Place
a PRIMARY KEY constraint on the column or combination of columns.
The PRIMARY KEY
constraint combines the properties of NOT NULL and UNIQUE
constraints. It is effective in prohibiting NULLs for columns
involved in a PRIMARY KEY.
Answer 4: Use
triggers and stored procedures to check for NULLs and other
integrity.
Triggers and stored
procedures are more flexible, but more complex. They are generally
used for complex data-integrity requirements that cannot be
implemented using constraints. Performance issues may also
influence a decision to use triggers and stored procedures.
Answer
5: Require
Non-NULL values in the Data-input "front end" or data-entry
form.
If data entry happens
in a uniform manner, say through a GUI data-entry form, constraints
or data-integrity checks may be placed in the application
or data-entry form/screen.
BONUS
QUESTIONS:
Why would some prefer this front-end method?
What
are the advantages of constraining data at the client versus
the server?
*
Warning: Advanced Topics *
BONUS ANSWERS:
Someone might use
a method because "it's what they know".
Remember the old
saying, "If all you have is a hammer, then everything looks
like a nail".
Of course, there
are often better and deeper reasons than, "it's what I know",
to choose a given method.
As far as advantages,
simplistically, the constraints or data-integrity checks will
take up time and space wherever they exist.
Constraints on
the database take time (CPU cycles) and space, not only to
store the constraint definition, but many times this also
involves an associated index to support the constraint. There
are also issues concerning how the database reports constraint
violations. For example, if the constraint-violation error
message isn't passed from the database to the application
or end user, they may be unaware that an error occurred. Also,
applications must handle constraint-violation errors appropriately.
On the client (or
application server), its CPU cycles are used and it stores
the constraint definition. However, most applications will
not store the equivalent of an index for support of the data-entry
constraint. Of course, some environments use a hybrid of these.
Some careful types
will use multiple levels of validation and constraints. You
know them; those who tend to wear a belt and suspenders at
the same time.
We are in deep
water now and thrashing precariously close to the region of
"outside the scope of this article". Database, SQL and single-computer
performance tuning is tough enough! Here we are discussing
performance of a more complex application environment. However,
complex application implementations are not at all uncommon
today.
The experienced
person has likely encountered complex situations and should
have some knowledge of the issues. Ask the candidate about
their experience.
BONUS
SUMMARY:
Many constraints
and/or triggers in a database may severely impact database
and thereby, application performance. Likewise, data-integrity
checking located outside of the database server may severely
impact application servers and/or client machines. We have
yet to address network performance issues between the client
and server (or multiple tiers).
Where should I
constrain and validate my data? The answer is the infamous
"it depends…" You may have severely limited resources on your
server, client, application server, network or a combination
of these. Budgets and architecture may dictate the design,
rather than best practices.
REVIEW
OF ROWS CONTAINING ALL NULLS (?)
Let us look back
at the example of a row containing NULLs for every column.
Isn't this equivalent to a non-existent row? At a logical
level, one might argue this. In theory, SQL and Relational
Databases have a layer of abstraction between the logical
and physical implementations.
In practice, a
row containing NULL values for all columns still affects the
number of rows when COUNT(*) is used. In the physical implementation,
the row is still allocated and has an internal row number
or pointer. Your DBA will tell you that it still takes space
and depending on the vendor implementation of datatypes, certain
datatypes store a fixed length, even if the column is NULL.
Outside of this
column or for testing and learning, allowing NULLs in every
column would certainly indicate poor database or application
design.
Summary:
We don't
always have the luxury of working with good data, database
design, application design and SQL. Knowing the data and database
design is very important, especially when we have to work
with things WE didn't design. Handling and understanding
NULLs is a big part of data sanity.
Having database
constraints, triggers and application level data-integrity
still doesn't make our design foolproof (or NULL-proof)
Like someone said,
"Why is it so difficult to make something foolproof"?
"Because fools
are so ingenious"
We might address
other sources of NULLs (queries), some techniques to survey
data, and other data-integrity difficulties in future columns.
About the author
Mike Wilkins is an Oracle Master DBA with more than twenty
years IT experience in a variety of technologies. Mike's roles
include, Database Designer/modeler, Programmer-Analyst, Database
Administrator, UNIX Systems and Network Administrator, Network
Designer/Analyst, and others. Mike has performed 300+ technical
interviews in the last five years.
|