10 The Data Definition Language (DDL)

In our classes and coursework, we’ve seen relational schemas expressed in SQL. Now let’s dig deeper into this part of the language.

Types

See this video.

Keys and foreign keys

See this video.

Other kinds of constraint

See this video.

SQL “schemas”

SQL uses the keyword SCHEMA, but it means something different than how we used it in the relational model, where a schema is the definitions of the structure of a relation or set of relations.

When you connect to your database and start defining tables, types, etc., by default their names go into one big set. No two things can have the same name. In my postgreSQL database, I have tables and types from the university domain and from domains I’ve used in other examples and assignments across the years. I’ve defined hundreds of things and it would be difficult the avoid name clashes. But I have avoided these clashes – and made things more organized – by organizaing the names into separate sets or “namespaces.” A SQL schema is what we call one of these namespaces.

Creating a schema

You already have a schema called public. By default, that is where all your definitions go. If you connect to your database and define a table called Restaurant, it’s full name is actually public.Restaurant.

If you want to organize things into separate namespaces, you can define additional schemas. This command defines a schema called eatInToronto:

CREATE SCHEMA eatInToronto;

Using a schema

To put our Restaurant table into that schema, we can simply give it the full name eatInToronto.Restaurant:

    CREATE TABLE eatInToronto.Restaurant (
        name TEXT,
        cuisine TEXT,
        --- the rest of the table definition goes here
    );

We can refer to that table using its full name eatInToronto.Restaurant too:

    SELECT name
    FROM eatInToronto.Restaurant
    WHERE cuisine = `Sri Lankan`;

If we define another schema called university for our tables about courses and students etc., we could include a Restaurant table for keeping track of restaurants on campus. The table names wouldn’t clash, because one would be called eatInToronto.Restaurant and the other university.Restaurant. This is analogous to how we talk about people. I know lots of people named Peter, but they can be told apart by giving their full name.

The search path

It’s a little tedious to use full names like eatInToronto.Restaurant, just as it would be tedious in conversation to have to use full names of people all the time. We very often use first names only, and the context allows others to infer which person we are referring to. When I’m talking about my family, I can just say Peter and it is clear that I mean my brother Peter. When I’m talking about faculty members in the Department of Computer Science, if I say Peter, I’m referring to my colleague Peter Marbach.

We can do the analogous thing in SQL by setting the “search path”. The search path is the sequence of schemas you want postgreSQL to look in when looking up a name. For example,

    SET SEARCH_PATH TO university, eatInToronto, public;

says to look first in university, then eatInToronto, then public.

The search path is also used when defining new things. If we define a table without qualifying a specific schema, it will go into the first schema in the search path. So if we now say

    CREATE TABLE Review (
        score int,
        -- the rest of the table definition goes here
    );

then this table becomes part of the university schema, and its full name is university.Review.

The default search path in postgreSQL is $user, public. The first schema, $user, is a schema with the same name as your username. It is not automatically created for you, but if you create it, it will be at the front of the default search path. If you don’t create it, and don’t modify the search path, new things go into the schema called public.

You can see what the search path is set to by saying

    SHOW SEARCH_PATH;

Removing a schema

Removing a schema is easy, for example:

    DROP SCHEMA university;

If anything was already defined in the schema, dropping the schema will generate an error message. If, despite this, we want to drop the schema and remove everything in it, we add the keyword CASCADE:

    DROP SCHEMA university CASCADE;

A handy usage pattern

When developing a schema or a dataset, there are always lots of errors to work through: typos in the table definitions, broken foreign keys, etc. As you fix these, you end up loading the schema and data a number of times. This set of statements at the top of your DDL file is very handy (of course, substitute in the appropriate schema name):

    DROP SCHEMA IF EXISTS university CASCADE;
    CREATE SCHEMA university;
    SET SEARCH_PATH TO university;

First, it removes the schema and everything in it. (The extra IF EXISTS is there so that an error message is avoided the first time the DDL is loaded, before the schema has even been defined.) Then it [re-]creates the schema, and sets the search path to it so all subsequent definitions will automatically go into that schema.