6 Null values in SQL

Diane Horton

Missing information

In a real database, we may have some missing information. For example, suppose we had a spouse column in the Student table. What should we put in this column for students who are not married? This isn’t actually a case of missing information, it is a case of bad schema design! Since many or most students are not married, it doesn’t make sense to have a column for spouses in the Student table. Instead, we can put it in another table with this schema (boldface is used to indicate that attributes student1 and student2 together form a key):

Married(student1, student2)

Only students who actually are married would appear in this table.

Sometimes we really do have missing information. For example, not every student has a spouse, but every student does belong to a specific campus. But suppose we have someone in the Student table whose campus we don’t know. There is a correct campus value for this student, we just don’t know it. What value should we put there?

In SQL, we use a special value called NULL.

Working with NULL

We can get a NULL value into a table with INSERT INTO. Recall the Student table:

    csc343h-prof=> SELECT *
    csc343h-prof-> FROM Student;
      sid  | firstname |  surname   | campus |   email   | cgpa 
    -------+-----------+------------+--------+-----------+------
     99132 | Avery     | Marchmount | StG    | avery@cs  | 3.13
     98000 | William   | Fairgrieve | StG    | will@cs   | 3.90
     99999 | Afsaneh   | Ali        | UTSC   | aali@cs   | 2.98
       157 | Leilani   | Lakemeyer  | UTM    | lani@cs   | 3.42
     11111 | Homer     | Simpson    | StG    | doh@gmail | 0.40
    (5 rows)

Here we insert a student with no value for first name or campus:

INSERT INTO Student VALUES (35461, NULL, 'Batman', NULL, 'bruce@batcave.com', 4.0);

If we don’t want to allow NULL values in a given column, we can prevent them by using a NOT NULL constraint in the table definition. (We’ll learn about table definitions a little later.) In fact, our schema was written so that NULL is not permitted in either column firstname or campus, so this insertion is unsuccessful.

We can compare a value to NULL using IS NULL. In our schema, the breadth column in table Course allows NULL values, so this query may yield some results:

    csc343h-prof=> SELECT *
    csc343h-prof-> FROM Course
    csc343h-prof-> WHERE breadth IS NULL;
     cnum |              name              | dept | breadth 
    ------+--------------------------------+------+---------
      100 | CSC for future prime ministers | CSC  | 
    (1 row)

Notice that the NULL value in column breadth is presented in the output as whitespace.

We can also compare a value to NULL using IS NOT NULL:

    csc343h-prof=> SELECT *
    csc343h-prof-> FROM Course
    csc343h-prof-> WHERE breadth IS NOT NULL;
     cnum |           name            | dept | breadth 
    ------+---------------------------+------+---------
      343 | Intro to Databases        | CSC  | f
      207 | Software Design           | CSC  | f
      148 | Intro to Comp Sci         | CSC  | f
      263 | Data Struct & Anal        | CSC  | f
      320 | Intro to Visual Computing | CSC  | f
      200 | Intro Archaeology         | ANT  | t
      203 | Human Biol & Evol         | ANT  | f
      150 | Organisms in Environ      | EEB  | f
      216 | Marine Mammal Bio         | EEB  | f
      263 | Compar Vert Anatomy       | EEB  | f
      110 | Narrative                 | ENG  | t
      205 | Rhetoric                  | ENG  | t
      235 | The Graphic Novel         | ENG  | t
      200 | Environmental Change      | ENV  | f
      320 | Natl & Intl Env Policy    | ENV  | f
      220 | Mediaeval Society         | HIS  | t
      296 | Black Freedom             | HIS  | t
      222 | COBOL programming         | CSC  | f
    (18 rows)

NULL introduces a third truth value: unknown

Because of the possible presence of NULL values, sometimes we don’t know if a condition is true or false! For example, consider:

    SELECT *
    FROM Student
    WHERE campus = 'StG';

If we have a row with a NULL value for campus, we cannot determine whether or not it is StG. In SQL, if one or both operands is NULL, a comparison evaluates to “unknown”.

Let’s look at the implications of having this third truth-value.

Impact of NULL values on WHERE

WHERE is picky: it will not include a row for which the truth-value of the WHERE clause is unknown. So in our example above, if we didn’t know a student’s campus, they would not be included in the result of the query.

NATURAL JOIN is picky also. You can remember this if you think of a natural join as being a Cartesian product followed by a WHERE condition.

Impact of NULL values on aggregation

When we aggregate on a column, for example to find the average, NULL values in that column are ignored:

Here’s a summary:

Null and aggregation

Let’s look at an example that uses this table about students at Runnymede Public School:

    csc343h-prof=> -- Column grade has one NULL,
    csc343h-prof=> -- and column age is entirely NULLS.
    csc343h-prof=> SELECT *
    csc343h-prof-> FROM Runnymede;
     name  | age | grade 
    -------+-----+-------
     diane |     |     8
     will  |     |     8
     cate  |     |     1
     tom   |     |      
     micah |     |     1
     grace |     |     2
    (6 rows)

Tom’s NULL grade doesn’t contribute to any of the aggregations below. But count(*) counts rows, and it includes every row, regardless of any NULLs.

    csc343h-prof=> SELECT min(grade), max(grade), sum(grade), avg(grade),
    csc343h-prof=>        count(grade), count(*)
    csc343h-prof-> FROM Runnymede;

     min | max | sum |        avg         | count | count 
    -----+-----+-----+--------------------+-------+-------
       1 |   8 |  20 | 4.0000000000000000 |     5 |     6
    (1 row)

If an entire column is NULL, it can’t have a meaningful avg, for example. The value of any aggregation on that column is NULL. There is one exception: the COUNT of such a column is 0. Let’s aggregate on our entirely NULL column, age.

    csc343h-prof=> SELECT min(age), max(age), sum(age), avg(age),
    csc343h-prof=>        count(age), count(*) 
    csc343h-prof-> FROM Runnymede; 

     min | max | sum | avg | count | count 
    -----+-----+-----+-----+-------+-------
         |     |     |     |     0 |     6
    (1 row)

Notice that count(*) gave the same answer as before. and that count(age) gave 0 because no values in that column are non-null.

There are corner cases that we haven’t examined. For example, are two NULL values considered distinct from each other when we do NATURAL JOIN, set operations, or SELECT DISTINCT? This behaviour may vary across DBMSs, so it is best not to assume you know what will happen. Check the documentation.

Impact of NULL values on boolean conditions

Now that we have three truth-values, we have to know what happens when we use logical operators in a boolean condition. The truth tables below specify the truth-value for each operator in each of the possible conditions.

Three-valued truth tables

The values in the “and”, “or”, and “not” columns above are common sense if you think about it.

Here’s another way to look at three-valued logic. If we treat true as 1, false as 0, and unknown as 0.5, then taking the minimum of two operands gives us “and”, and taking the maximum gives us “or”:

Truth table for and and or using minimum and maximum

Taking the complement of an operand, gives us “not”:

Truth table for not using complement

Surprises

Recall our Course table:

    csc343h-prof=> SELECT *
    csc343h-prof=> FROM Course;
     cnum |              name              | dept | breadth 
    ------+--------------------------------+------+---------
      343 | Intro to Databases             | CSC  | f
      207 | Software Design                | CSC  | f
      148 | Intro to Comp Sci              | CSC  | f
      263 | Data Struct & Anal             | CSC  | f
      320 | Intro to Visual Computing      | CSC  | f
      200 | Intro Archaeology              | ANT  | t
      203 | Human Biol & Evol              | ANT  | f
      150 | Organisms in Environ           | EEB  | f
      216 | Marine Mammal Bio              | EEB  | f
      263 | Compar Vert Anatomy            | EEB  | f
      110 | Narrative                      | ENG  | t
      205 | Rhetoric                       | ENG  | t
      235 | The Graphic Novel              | ENG  | t
      200 | Environmental Change           | ENV  | f
      320 | Natl & Intl Env Policy         | ENV  | f
      220 | Mediaeval Society              | HIS  | t
      296 | Black Freedom                  | HIS  | t
      222 | COBOL programming              | CSC  | f
      100 | CSC for future prime ministers | CSC  | 
    (19 rows)

Here, we ask for the courses that either have breadth true or have breadth false:

    csc343h-prof=> SELECT *
    csc343h-prof-> FROM Course
    csc343h-prof-> WHERE breadth OR NOT breadth;
     cnum |           name            | dept | breadth 
    ------+---------------------------+------+---------
      343 | Intro to Databases        | CSC  | f
      207 | Software Design           | CSC  | f
      148 | Intro to Comp Sci         | CSC  | f
      263 | Data Struct & Anal        | CSC  | f
      320 | Intro to Visual Computing | CSC  | f
      200 | Intro Archaeology         | ANT  | t
      203 | Human Biol & Evol         | ANT  | f
      150 | Organisms in Environ      | EEB  | f
      216 | Marine Mammal Bio         | EEB  | f
      263 | Compar Vert Anatomy       | EEB  | f
      110 | Narrative                 | ENG  | t
      205 | Rhetoric                  | ENG  | t
      235 | The Graphic Novel         | ENG  | t
      200 | Environmental Change      | ENV  | f
      320 | Natl & Intl Env Policy    | ENV  | f
      220 | Mediaeval Society         | HIS  | t
      296 | Black Freedom             | HIS  | t
      222 | COBOL programming         | CSC  | f
    (18 rows)

The expression breadth OR NOT breadth is a tautology, meaning it should always be true, yet we can see that this tautology isn’t true for the course “CSC for future prime ministers” – this course does not appear in the result!

This is a bit of a surprise, but it makes sense when we remember the truth table for OR:

Here’s another tautology: AGE >= 10 OR AGE < 10. But it’s not true if AGE is NULL. Here’s a little table that we can use to demonstrate this:

    csc343h-prof=> SELECT *
    csc343h-prof-> FROM Ages;
      name  | age 
    --------+-----
     Amna   |  21
     Zach   |  25
     Miriam |    
     Ben    |   0
    (4 rows)

    csc343h-prof=> SELECT *
    csc343h-prof-> FROM Ages
    csc343h-prof-> WHERE age >= 10 OR AGE < 10;
     name | age 
    ------+-----
     Amna |  21
     Zach |  25
     Ben  |   0
    (3 rows)

Miriam is not included in the result because her age is not known. Even though AGE >= 10 OR AGE < 10 would evaluate to true no matter what value we inserted for her age, the expression evaluates to unknown right now.

Here’s one final tautology, and this time it doesn’t involve any logical operators: AGE * 0 = 0. This mathematical rule would be true for any value of AGE, but it is not true in SQL when AGE is NULL, and so Miriam is again excluded:

    csc343h-prof=> SELECT *
    csc343h-prof-> FROM Ages
    csc343h-prof-> WHERE AGE * 0 = 0;
     name | age 
    ------+-----
     Amna |  21
     Zach |  25
     Ben  |   0
    (3 rows)

Summary

There are many corner conditions involving NULL, but you need only remember these core facts: