Interviewer Insights  
About Us  
Comments, Suggestions  
  or General Feedback  
Interested in writing for  
Tech Talk Tips?  

 

 


Database Admin
SQL: Controlling NULLs and Data Integrity

by Mike Wilkins


The following question assesses experience with intermediate to advanced SQL techniques and should be asked of candidates that are DBAs, Database developers and anyone using SQL. It is best suited for developers and Database Administrators having at least a year of solid experience with SQL. The techniques applied are useful in a variety of situations and should be known to everyone using SQL.

This month we will continue our look at NULLs in SQL. Our last column asked about the value and how to visualize the NULL. It also pointed to the fact that reversing the operator in SQL, from EQUAL (=) to NOT EQUAL (<>), does not provide the inverse result when NULLs are involved.

NULLs can complicate SQL logic, so we want to control them.

That brings us to our first question.

In SQL, how can we prevent the existence of NULLs in a particular column?

What's that? Revoke all permissions from the table? Very funny. We could also shut down the database, the computer, and disconnect it. But let's talk about more practical ways of forbidding the existence of NULLs in a column. There are several possible answers, so let's look at a few.

Answer 1: Place a NOT NULL constraint on the column.

Constraints restrict the values that may be placed in columns. The NOT NULL constraint simply prohibits NULLs from being placed in a column.

Without constraints, we might encounter rows having NULLs for all columns, as in the table we established, test_null. Notice that both columns contain NULLs in the last row of the table.

Name Type
COL_VARCHAR VARCHAR2(11)
COL_NUMBER NUMBER(3)

COL_VARCHAR
COL_NUMBER
mike
100
george
200
larry
300
sherry
400
bob
500
betsy
frank
  33
999


We have indicated the presence of NULLs by . However, many tools simply show NULLs as blank spaces. For example, Oracle's SQL*PLUS shows the following by default.

SQL> SELECT * FROM test_null;
COL_VARCHAR
COL_NUMBER
mike
100
george
200
larry
300
sherry
400
bob
500
betsy
 
frank
 
 
  33
 
999
 
 

10 rows selected.

The last row is now invisible to the naked eye. This and similar conditions have tripped quite a few of us. Depending on the tool, you may be able to change the default display output of NULLs to something more indicative, like .

Remember, when the output of SQL queries doesn't seem to make sense, check the values for NULLs and spaces (white space).


Answer 2: Place a UNIQUE constraint on the column.

The UNIQUE constraint ensures that every value for a column, or combination of columns, is unique. Therefore, having a UNIQUE constraint on our column, COL_VARCHAR, would allow one row with a NULL in that column, but not more than one. Right?

< PLAY BUZZER SOUND > WRONG!

The UNIQUE constraint does not consider NULLs when enforcing uniqueness.


Answer 3: Place a PRIMARY KEY constraint on the column or combination of columns.

The PRIMARY KEY constraint combines the properties of NOT NULL and UNIQUE constraints. It is effective in prohibiting NULLs for columns involved in a PRIMARY KEY.


Answer 4:
Use triggers and stored procedures to check for NULLs and other integrity.

Triggers and stored procedures are more flexible, but more complex. They are generally used for complex data-integrity requirements that cannot be implemented using constraints. Performance issues may also influence a decision to use triggers and stored procedures.


Answer 5: Require Non-NULL values in the Data-input "front end" or data-entry form.

If data entry happens in a uniform manner, say through a GUI data-entry form, constraints or data-integrity checks may be placed in the application or data-entry form/screen.


BONUS QUESTIONS:

Why would some prefer this front-end method?

What are the advantages of constraining data at the client versus the server?

* Warning: Advanced Topics *


BONUS ANSWERS:

Someone might use a method because "it's what they know".

Remember the old saying, "If all you have is a hammer, then everything looks like a nail".

Of course, there are often better and deeper reasons than, "it's what I know", to choose a given method.

As far as advantages, simplistically, the constraints or data-integrity checks will take up time and space wherever they exist.

Constraints on the database take time (CPU cycles) and space, not only to store the constraint definition, but many times this also involves an associated index to support the constraint. There are also issues concerning how the database reports constraint violations. For example, if the constraint-violation error message isn't passed from the database to the application or end user, they may be unaware that an error occurred. Also, applications must handle constraint-violation errors appropriately.

On the client (or application server), its CPU cycles are used and it stores the constraint definition. However, most applications will not store the equivalent of an index for support of the data-entry constraint. Of course, some environments use a hybrid of these.

Some careful types will use multiple levels of validation and constraints. You know them; those who tend to wear a belt and suspenders at the same time.

We are in deep water now and thrashing precariously close to the region of "outside the scope of this article". Database, SQL and single-computer performance tuning is tough enough! Here we are discussing performance of a more complex application environment. However, complex application implementations are not at all uncommon today.

The experienced person has likely encountered complex situations and should have some knowledge of the issues. Ask the candidate about their experience.


BONUS SUMMARY:

Many constraints and/or triggers in a database may severely impact database and thereby, application performance. Likewise, data-integrity checking located outside of the database server may severely impact application servers and/or client machines. We have yet to address network performance issues between the client and server (or multiple tiers).

Where should I constrain and validate my data? The answer is the infamous "it depends…" You may have severely limited resources on your server, client, application server, network or a combination of these. Budgets and architecture may dictate the design, rather than best practices.


REVIEW OF ROWS CONTAINING ALL NULLS (?)

Let us look back at the example of a row containing NULLs for every column. Isn't this equivalent to a non-existent row? At a logical level, one might argue this. In theory, SQL and Relational Databases have a layer of abstraction between the logical and physical implementations.

In practice, a row containing NULL values for all columns still affects the number of rows when COUNT(*) is used. In the physical implementation, the row is still allocated and has an internal row number or pointer. Your DBA will tell you that it still takes space and depending on the vendor implementation of datatypes, certain datatypes store a fixed length, even if the column is NULL.

Outside of this column or for testing and learning, allowing NULLs in every column would certainly indicate poor database or application design.


Summary:

We don't always have the luxury of working with good data, database design, application design and SQL. Knowing the data and database design is very important, especially when we have to work with things WE didn't design. Handling and understanding NULLs is a big part of data sanity.

Having database constraints, triggers and application level data-integrity still doesn't make our design foolproof (or NULL-proof)

Like someone said, "Why is it so difficult to make something foolproof"?

"Because fools are so ingenious"

We might address other sources of NULLs (queries), some techniques to survey data, and other data-integrity difficulties in future columns.


About the author


Mike Wilkins is an Oracle Master DBA with more than twenty years IT experience in a variety of technologies. Mike's roles include, Database Designer/modeler, Programmer-Analyst, Database Administrator, UNIX Systems and Network Administrator, Network Designer/Analyst, and others. Mike has performed 300+ technical interviews in the last five years.


REVIEWNET is a trademark and service mark of ReviewNet Corporation © 1997-2002,
Jenkintown, PA - All Rights Reserved. This information is confidential and proprietary
to ReviewNet Corporation. Use, duplication or misappropriation is strictly prohibited.