1.11 postgreSQL “schemas”

1.11 postgreSQL “schemas”#

Things can get messy#

Over time, your instance of postgreSQL for this course will accumulate many tables, such as the ones about a university that we’ve been using in class, the ones about an HR department we used in a prep, and the one about world countries that we used in another prep. We can end up with many, many tables, all lumped together, and this can cause name conflicts. For example, there can only be one table named Employee. If we’ve already defined one for the company HR data and then later decide we want one for the university data, we are forced to pick a new name. The need to have names that are completely unique across all our domains leads to cumbersome names.

We can avoid these clashes – and make things more organized – by groupig the names into separate sets or “namespaces”. These namespaces are called SQL schemas.

Creating a namespace using dot notation#

We can simply use dot notation to create a new namespace. For example, instead of defining a new table like this:

create table Employee ...

we can write:

create table HR.Employee ...

This allows us later to define a table called University.Employee – the names are not in conflict. And if we define everything to do with the university data using the University prefix and everything about the company HR data using the HR prefix, this makes a logical separation and reduces cognitive load.

In that case, when we want to refer to a table or other database element (e.g., a user-defined type or named constraint) we have to use its full name. For example:

csc343h-prof=> select * from Employee;
ERROR:  relation "employee" does not exist
LINE 1: select * from Employee;
                      ^
csc343h-prof=> select * from HR.Employee;
 eid |      name       | salary | dept 
-----+-----------------+--------+------
   1 | Bill Gates      |     59 |   55
   2 | Marissa Mayer   |     82 |   55
   3 | Sheryl Sandberg |     17 |   55
   4 | Larry Ellison   |     55 |   33
   5 | Tim Cook        |     48 |   55
   6 | Mark Zuckerberg |     95 |   22
   7 | Jeff Weiner     |     28 |   33
   8 | Larry Page      |    145 |   33
   9 | Diane Horton    |    145 |   33
(9 rows)

Creating a namespace using `CREATE SCHEMA`#

Having to give the full name every time you refer to a table or other database element can be tedious.

There is a useful analogy to ordinary conversations, where we often refer to a person without using their full name. A first name alone can suffice if we take advantage of what we know about the context. For instance, if I am talking with friends about my job and I mention “Peter”, they infer that I’m referring to the Peter that I work with; if I’m talking about Thanksgiving and mention “Peter”, they infer that it’s the Peter in my family.

postgreSQL provides a way to define different contexts and to say which one we are “in” at a give time. These contexts are called schemas. Here’s an example of how to use a schema:

-- Define a new schema called University:
create schema University;
-- Tell postgres that every new name we define and every name we refer to
-- is in the University schema.
set search_path to University;
-- Because we set the search path to University, this new table is part 
-- of the University schema.
create table Student(
	sID integer primary key,
	firstName varchar(15) not null,
	surName varchar(15) not null,
	campus Campus,
	email varchar(25),
	cgpa CGPA);
-- And because we set the search path to University, when we refer to a
-- table, psql infers we are talking about a table in the University scheme.
-- So we can refer to this table without having to use a prefix:
select * from Student;
-- We can still use the full, prefixed name for our table, we just don't 
-- have to:
select * from University.Student;

The search path#

We used the keywords set search_path to in order to say what schema we wanted to be working in. More generally, we can give a list of schemas, for instance

set search_path to University, HR, World;

This tells postgreSQL that whenever we refer to a name X, it should first look in the University schema for it (that is, look for University.X), and if there is no such thing, then look in the HR schema (that is, look for HR.X), and if there is no such thing, then look in the World schema (that is, look for World.X). If it never finds X in any of those schemas, an error is generated.

csc343h-prof=> set search_path to University, HR, World;
SET
csc343h-prof=> select * from Frisbee;
ERROR:  relation "Frisbee" does not exist
LINE 1: select * from Frisbee;
                      ^

Any new thing we define will go into the first schema on the list, in this case, the University schema.

You can ask to see the search path:

csc343h-prof=> show search_path;
      search_path      
-----------------------
 university, hr, world
(1 row)

The default search path#

If you don’t set the search path, it gets a default value:

cdf> psql csc343h-prof
psql (16.11 (Ubuntu 16.11-0ubuntu0.24.04.1))
Type "help" for help.

csc343h-prof=> show search_path;
   search_path   
-----------------
 "$user", public
(1 row)

What are "$user" and public, and where did they come from? The schema called public is defined automatically. The $user at the front of the search path is special; it is not actually a schema. Instead it is a placeholder showing that, if you create a schema with the same name as your username (in my case prof) – it is not created automatically for you – that schema will go at the front of the search path.[1]

This means that if we start up postgreSQL and immediately create a new table called Frindle, it goes in the schema with the same name as your username, if you have created that schema. If you haven’t, the new table goes into the public namespace and its full name is public.Frindle. (Earlier, we said that any new thing we define will go into the first schema on the search path list but the rule is a tiny bit more complicated: of all the schemas on the search path, any new thing goes into the first of these schemas that has been defined.)

Note that, on our database servers, you don’t have permission to create things in the public schema. so this sequence of actions generates an error:

cdf> psql csc343h-prof
psql (16.11 (Ubuntu 16.11-0ubuntu0.24.04.1))
Type "help" for help.
csc343h-prof=> -- Schema prof has not been defined, so Frindle should go into
csc343h-prof=> -- the public schema.
csc343h-prof=> create table Frindle (num int);
ERROR:  permission denied for schema public
LINE 1: create table Frindle (num int);
                     ^

The solution is to create a schema and set the search_path, so that Frindle can go into that schema.

A handy three-line pattern#

To remove a schema, we can simply say drop schema, however, this generates errors if the schema contains anything:

csc343h-prof=> drop schema university;
ERROR:  cannot drop schema university because other objects depend on it
DETAIL:  type university.campus depends on schema university
type university.cgpa depends on schema university
type university.department depends on schema university
type university.grade depends on schema university
table university.course depends on schema university
table university.offering depends on schema university
table university.student depends on schema university
table university.took depends on schema university
HINT:  Use DROP ... CASCADE to drop the dependent objects too.

As the hint suggests, we can use CASCADE to ask for everything defined within the schema to be deleted before the schema itself is deleted:

csc343h-prof=> -- Adding CASCADE:
csc343h-prof=> drop schema university cascade;
NOTICE:  drop cascades to 9 other objects
DETAIL:  drop cascades to type university.campus
drop cascades to type university.cgpa
drop cascades to type university.department
drop cascades to type university.grade
drop cascades to table university.course
drop cascades to table university.offering
drop cascades to table university.student
drop cascades to table university.took
drop cascades to table university.flooper
DROP SCHEMA
csc343h-prof=> -- Now Took (and everything else) is gone:
csc343h-prof=> select * from Took;
ERROR:  relation "Took" does not exist
LINE 1: select * from took;
                      ^
csc343h-prof=> We won't find it using its (former) full name either:
csc343h-prof=> select * from University.Took;
ERROR:  relation "university.Took" does not exist
LINE 1: select * from University.Took;
                      ^

Dropping a schema is a great way to clean up if you want to redefine your schema and load your data anew. So it can be a good thing to put at the top of a DDL file. However, the first time you import that file it will generate an error, because the schema has never been defined! This is solved by saying to drop the schema, but only if it has been defined:

drop schema if exists University cascade;

A handy pattern for the top of a DDL file is to (1) clean up by dropping the schema if it exists, (2) define the schema anew, and then (3) set the search path to that schema so that all definitions that follow will go into that schema. This is the full pattern:

drop schema if exists University cascade;
create schema University;
set search_path to University;

Your tables are not lost#

Whatever you do in your database stays in your database even after you exit psql. However, you might think your tables have been lost if you forget about schemas. Here’s an example:

cdf> psql csc343h-prof
psql (16.11 (Ubuntu 16.11-0ubuntu0.24.04.1))
Type "help" for help.

csc343h-prof=> -- Here we create a schema and set the search path to it.
csc343h-prof=> create schema atlantis;
CREATE SCHEMA
csc343h-prof=> set search_path to atlantis;
SET
csc343h-prof=> -- Let's make a table and populate it.
csc343h-prof=> create table people(name text);
CREATE TABLE
csc343h-prof=> insert into people values ('Critias'), ('Socrates');
INSERT 0 2
csc343h-prof=> -- The table is here and all is well.
csc343h-prof=> select * from people;
   name   
----------
 Critias
 Socrates
(2 rows)

csc343h-prof=> -- Let's leave psql and return to it, and see if our table survives.
csc343h-prof=> \q
cdf> psql csc343h-prof
psql (16.11 (Ubuntu 16.11-0ubuntu0.24.04.1))
Type "help" for help.

csc343h-prof=> -- It's gone!!
csc343h-prof=> select * from people;
ERROR:  relation "people" does not exist
LINE 1: select * from people;
                      ^
csc343h-prof=> -- What about the persistence of data?
csc343h-prof=> -- Isn't that one of the key things a relational DBMS is 
csc343h-prof=> -- supposed to ensure???
csc343h-prof=> -- Yes it is, and in fact we do have persistence.  We just
csc343h-prof=> -- have to remember that, since the we had set our search_path
csc343h-prof=> -- to atlantis before creating the table, the name of the table
csc343h-prof=> -- is atlantis.people.
csc343h-prof=> select * from atlantis.people;
   name   
----------
 Critias
 Socrates
(2 rows)

csc343h-prof=> -- We can spare ourselves having to completely specify 
csc343h-prof=> -- the name by setting our search path again.
csc343h-prof=> set search_path to atlantis;
SET
csc343h-prof=> select * from people;
   name   
----------
 Critias
 Socrates
(2 rows)