1 Introduction to SQL

Diane Horton

So far, we have defined database schemas and queries mathematically. We are very well prepared to learn SQL a programming language that allows us to execute our queries in a DBMS.

SQL is short for “Structured Query Language”, but it is for more than just writing queries. It has two sub-parts:

PostgreSQL

We’ll be using an open-source DBMS called PostgreSQL. You’ll find documentation for postgreSQL at the postgreSQL home page including information on how to download it for your own platform, should you choose to.

These specific pages are highly relevant:

What about other DBMSs?

Learning PostgreSQL will prepare you well for using other DMBSs, such as MySQL and Oracle. Unfortunately, there are differences in the SQL language across DBMSs. There is an official SQL standard that fully specifies the language, and is updated periodically. However, DBMSs typically don’t implement every feature in the standard, and often add their own features as well. This means that you may find differences around the edges of the language when you work with a new DBMS. It also creates issues for portability of code.

Basic SELECT-FROM-WHERE queries

Before we begin, here is the university schema we have been working with (the key of each relation is in bold):

We’ll learn later how to express this in SQL using the Data Definition Language. But first, let’s start writing some queries!

The simplest kind of query is a select-from-where. Here is an example:

    csc343h-prof=> SELECT name             -- choose the column called "name"
    csc343h-prof-> FROM Course             -- from the Course table
    csc343h-prof-> WHERE dept = 'CSC';     -- choose only rows that satisfy
               name            
    ---------------------------
     Intro to Databases
     Software Design
     Intro to Comp Sci
     Data Struct & Anal
     Intro to Visual Computing
     COBOL programming
    (6 rows)

WHERE is equivalent to σ in relational algebra. Sadly, the designers of SQL chose the keyword SELECT for the relational algrebra Π but you will get used to that very quickly.

Below are a few additional things we can do with a select-from-where, to get us started.

Cartesian product

We can get the Cartesian product of two or more tables by puttting them in a comma-separated list in the FROM clause:

    csc343h-prof=> SELECT name, sid, grade
    csc343h-prof-> FROM Course, Offering, Took
    csc343h-prof-> WHERE Course.dept = 'CSC';
               name            |  sid  | grade 
    ---------------------------+-------+-------
     Intro to Databases        | 99132 |    79
     Intro to Databases        | 99132 |    79
     Intro to Databases        | 99132 |    79
     Intro to Databases        | 99132 |    79
     ... etc.

Of course, this will create many nonsensical combinations. We will soon learn all about how to express natural joins and other kinds of join in SQL. In the meanwhile, we can use WHERE to filter out those nonsensical combinations:

    csc343h-prof=> SELECT name, sid, grade
    csc343h-prof-> FROM Course, Offering, Took
    csc343h-prof-> WHERE Course.dept = Offering.dept and   -- New!
    csc343h-prof->    Course.cnum = Offering.cnum and      -- New!
    csc343h-prof->    Offering.oid = Took.oid and          -- New!          
    csc343h-prof->    Course.dept = 'CSC';
            name        |  sid  | grade 
    --------------------+-------+-------
     Intro to Databases | 99132 |    79
     Intro to Databases | 98000 |    82
     Software Design    | 98000 |    89
     Software Design    | 98000 |    72
     ... etc.

Compound conditions in a WHERE clause

We just saw our first compound condition. We can build boolean expressions using:

Temporarily renaming a table

Similar to ρ in relational algebra, in SQL we can rename a table for the duration of a query. Here we rename Employee to e and Deparment to d:

    csc343h-prof=> SELECT e.name, d.name
    csc343h-prof-> FROM Employee e, Department d
    csc343h-prof-> WHERE d.name = 'marketing' and e.name = 'Horton';

In this case, it makes the query a bit more concise than it would have been:

    csc343h-prof=> SELECT Employee.name, Department.name
    csc343h-prof-> FROM Employee, Department
    csc343h-prof-> WHERE Department.name = 'marketing' and Employee.name = 'Horton';

but we also know that renaming can be convenient for other reasons, such as distinguishing two occurences of a table in a self-join:

    csc343h-prof=> SELECT e1.name, e2.name
    csc343h-prof-> FROM Employee e1, Employee e2
    csc343h-prof-> where e1.salary < e2.salary;

Wildcard in the SELECT clause

If we want every column to be included in the result, rather than listing them all, we can use a wildcard:

    csc343h-prof=> SELECT *
    csc343h-prof-> FROM Course
    csc343h-prof-> WHERE dept = 'CSC';
     cnum |           name            | dept | breadth 
    ------+---------------------------+------+---------
      343 | Intro to Databases        | CSC  | f
      207 | Software Design           | CSC  | f
      148 | Intro to Comp Sci         | CSC  | f
      263 | Data Struct & Anal        | CSC  | f
      320 | Intro to Visual Computing | CSC  | f
      222 | COBOL programming         | CSC  | f
    (6 rows)

Naming columns

We can add an AS-expression to choose a column name in the result of a query.

    csc343h-prof=> SELECT name AS title, dept
    csc343h-prof-> FROM Course
    csc343h-prof-> WHERE breadth;
           title       | dept 
    -------------------+------
     Intro Archaeology | ANT
     Narrative         | ENG
     Rhetoric          | ENG
     The Graphic Novel | ENG
     Mediaeval Society | HIS
     Black Freedom     | HIS
    (6 rows)

Sorting

In relational algebra, a relation is a set and so order doesn’t matter. But in the real world, order often does matter. We can sort the results of a query by adding an ORDER BY clause to the end of a select-from-where query:

    csc343h-prof=> SELECT sid, grade
    csc343h-prof-> FROM Took
    csc343h-prof-> WHERE grade > 90
    csc343h-prof-> ORDER BY grade;
      sid  | grade 
    -------+-------
       157 |    91
     99999 |    91
     98000 |    92
     98000 |    93
     99999 |    94
     99999 |    96
     98000 |    97
       157 |    98
     98000 |    98
     99132 |    98
     99132 |    99
     99999 |    99
     99999 |    99
       157 |    99
     99999 |   100
    (15 rows)

The default is to use ascending order, but we can override this by saying DESC (not DESCENDING):

    csc343h-prof=> SELECT sid, grade
    csc343h-prof-> FROM Took
    csc343h-prof-> WHERE grade > 90
    csc343h-prof-> ORDER BY grade DESC;
      sid  | grade 
    -------+-------
     99999 |   100
       157 |    99
     99999 |    99
     99132 |    99
     99999 |    99
       157 |    98
     98000 |    98
     99132 |    98
     98000 |    97
     99999 |    96
     99999 |    94
     98000 |    93
     98000 |    92
       157 |    91
     99999 |    91
    (15 rows)

We can order according to multiple columns. Here, we see courses in order by dept, and where there are multiple rows with the same value for dept, they are further ordered by course name.

csc343h-prof=> SELECT *
csc343h-prof-> FROM Course
csc343h-prof-> ORDER BY dept, name;
 cnum |           name            | dept | breadth 
------+---------------------------+------+---------
  203 | Human Biol & Evol         | ANT  | f
  200 | Intro Archaeology         | ANT  | t
  222 | COBOL programming         | CSC  | f
  263 | Data Struct & Anal        | CSC  | f
  148 | Intro to Comp Sci         | CSC  | f
  343 | Intro to Databases        | CSC  | f
  320 | Intro to Visual Computing | CSC  | f
  207 | Software Design           | CSC  | f
  263 | Compar Vert Anatomy       | EEB  | f
  216 | Marine Mammal Bio         | EEB  | f
  150 | Organisms in Environ      | EEB  | f
  110 | Narrative                 | ENG  | t
  205 | Rhetoric                  | ENG  | t
  235 | The Graphic Novel         | ENG  | t
  200 | Environmental Change      | ENV  | f
  320 | Natl & Intl Env Policy    | ENV  | f
  296 | Black Freedom             | HIS  | t
  220 | Mediaeval Society         | HIS  | t
(18 rows)

We can also use the value of an expression to determing ordering. For example, if we had a table including columns called sales and rentals, we could write ORDER BY sales + rentals.

Expressions in a SELECT clause

Instead of a simple attribute name, you can use an expression in a SELECT clause. Here are two simple examples:

    csc343h-prof=> SELECT sid, grade + 10 AS adjusted
    csc343h-prof-> FROM Took;
      sid  | adjusted 
    -------+----------
     99132 |       89
     99132 |      108
     99132 |       92
     99132 |      109
     ... etc.

    csc343h-prof=> SELECT dept || cnum
    csc343h-prof-> FROM Course
    csc343h-prof-> WHERE cnum < 200;
     ?column? 
    ----------
     CSC148
     EEB150
     ENG110
     CSC100
    (4 rows)

In the second query, || is string concatenation. (See the PostgreSQL documentation for details on the many built-in functions and operators.) Notice that we did not provide a name for the column in the result. In such cases, SQL names it `?column?’. It’s essentially saying “I have no idea what to call this!”

Expressions that are a constant

Rather than pull values from a table, a SELECT clause can use a constant value in a column. Here is an example:

    csc343h-prof=> SELECT name, 'satisfies' as breadthRequirement
    csc343h-prof-> FROM Course
    csc343h-prof-> WHERE breadth;
           name        | breadthrequirement 
    -------------------+--------------------
     Intro Archaeology | satisfies
     Narrative         | satisfies
     Rhetoric          | satisfies
     The Graphic Novel | satisfies
     Mediaeval Society | satisfies
     Black Freedom     | satisfies
    (6 rows)

We have extracted only courses for which the value of breadth is true (which means, in our schema, that the course satisfies the “breadth requirement” of a university degree). And we chose to show this in the result by having the value satisfies throughout the second column, which we named breadthrequirement.

Pattern matching

SQL provides the LIKE operator for comparing a string to a pattern. The pattern is a quoted string and can include these special characters:

Here are two simple examples:

csc343h-prof=> SELECT *
csc343h-prof-> FROM Course
csc343h-prof-> WHERE name LIKE '%to%';
 cnum |           name            | dept | breadth 
------+---------------------------+------+---------
  343 | Intro to Databases        | CSC  | f
  148 | Intro to Comp Sci         | CSC  | f
  320 | Intro to Visual Computing | CSC  | f
  263 | Compar Vert Anatomy       | EEB  | f
  205 | Rhetoric                  | ENG  | t
(5 rows)


csc343h-prof=> -- Notice that UTSC students are excluded.  The string 'UTSC' 
csc343h-prof=> -- does not match the pattern 'UT_' because the underscore only
csc343h-prof=> -- matches a single character, and the pattern is required to 
csc343h-prof=> -- cover the entire string.
csc343h-prof=> SELECT *
csc343h-prof-> FROM Student
csc343h-prof-> WHERE campus LIKE 'UT_';
 sid | firstname |  surname  | campus |  email  | cgpa 
-----+-----------+-----------+--------+---------+------
 157 | Leilani   | Lakemeyer | UTM    | lani@cs | 3.42
(1 row)

LIKE offers an extremely limited form of pattern matching but has the advantage of being fast. SQL also supports regular expressions for string matching, using the ~ operator. For example:

csc343h-prof-> -- Find students whose surname begins with M or F or L, followed
csc343h-prof-> -- by 'a', and then zero or more additional characters.
csc343h-dianeh=> SELECT *
FROM Student
WHERE surname ~ '(M|F|L)a*';
  sid  | firstname |  surname   | campus |  email   | cgpa 
-------+-----------+------------+--------+----------+------
 99132 | Avery     | Marchmount | StG    | avery@cs | 3.13
 98000 | William   | Fairgrieve | StG    | will@cs  | 4.00
   157 | Leilani   | Lakemeyer  | UTM    | lani@cs  | 3.42
(3 rows)

For full details, see the postgreSQL documentation Note this warning contained in that documentation: “Regular expressions can be contrived that take arbitrary amounts of time and memory to process. Be wary of accepting regular-expression search patterns from hostile sources.”

Case-sensitivity and whitespace

Let’s address the lexical level of SQL syntax: how its smallest elements (keywords, operators, and so on) are recognized.

Keywords in SQL are not case-sensitive. Neither are identifiers (names of tables or columns). In addition, line breaks and tabs are ignored by SQL. So, for example, all of these queries are syntactically valid and do the same thing:

    csc343h-prof=> -- This looks nice.
    csc343h-prof=> SELECT sid
    csc343h-prof-> FROM took
    csc343h-prof-> WHERE grade > 50;

    csc343h-prof=> -- This is fine if we just want the answer and aren't saving the query.
    csc343h-prof=> select sid from took where grade > 50;

    csc343h-prof=> -- This is just ugly.
    csc343h-prof=> SELECT SID FROM
    csc343h-prof->     TOOK WHERE GRADE
    csc343h-prof->     > 50
    csc343h-prof->     ;

There is no standard convention, but a reasonable one (used in the first query above) is this:

Typing keywords in all capitals is a little awkward, and you may feel that the queries are shouting at you. It is acceptable in this course to not capitalize keywords. Whatever you do, do it consistently.

SQL is a high-level language

We say that SQL is a high-level language. An SQL query says what we want from the database, not how to get it. We never have to think about what is stored in memory vs in some level of secondary storage, or what points to what. In other words, we have a level of abstraction that makes it much easier to write queries. In addition, the DBMS can change how the data is stored with no impact on our queries. We say that we have “physical data independence”.