2.1 Introduction

2.1 Introduction#

You already know how to delete in SQL. Just use a DELETE FROM statement, and the rows you select are gone:

csc343h-dianeh=> select * from course;
 cnum |           name            | dept | breadth 
------+---------------------------+------+---------
  343 | Intro to Databases        | CSC  | f
  207 | Software Design           | CSC  | f
  148 | Intro to Comp Sci         | CSC  | f
  263 | Data Struct & Anal        | CSC  | f
  320 | Intro to Visual Computing | CSC  | f
  200 | Intro Archaeology         | ANT  | t
  203 | Human Biol & Evol         | ANT  | f
  150 | Organisms in Environ      | EEB  | f
  216 | Marine Mammal Bio         | EEB  | f
  263 | Compar Vert Anatomy       | EEB  | f
  110 | Narrative                 | ENG  | t
  205 | Rhetoric                  | ENG  | t
  235 | The Graphic Novel         | ENG  | t
  200 | Environmental Change      | ENV  | f
  320 | Natl & Intl Env Policy    | ENV  | f
  220 | Mediaeval Society         | HIS  | t
  296 | Black Freedom             | HIS  | t
  222 | COBOL programming         | CSC  | f
(18 rows)

csc343h-dianeh=> delete from course where name like '%COBOL%';
DELETE 1
csc343h-dianeh=> select * from course;
 cnum |           name            | dept | breadth 
------+---------------------------+------+---------
  343 | Intro to Databases        | CSC  | f
  207 | Software Design           | CSC  | f
  148 | Intro to Comp Sci         | CSC  | f
  263 | Data Struct & Anal        | CSC  | f
  320 | Intro to Visual Computing | CSC  | f
  200 | Intro Archaeology         | ANT  | t
  203 | Human Biol & Evol         | ANT  | f
  150 | Organisms in Environ      | EEB  | f
  216 | Marine Mammal Bio         | EEB  | f
  263 | Compar Vert Anatomy       | EEB  | f
  110 | Narrative                 | ENG  | t
  205 | Rhetoric                  | ENG  | t
  235 | The Graphic Novel         | ENG  | t
  200 | Environmental Change      | ENV  | f
  320 | Natl & Intl Env Policy    | ENV  | f
  220 | Mediaeval Society         | HIS  | t
  296 | Black Freedom             | HIS  | t
(17 rows)

Let’s go under the hood to see what has to happen to make this occur. It’s a little more complicated than one might think.

Some background on storage#

First we need to think about how data is stored on a computer.

In order for a program to use data — to examine its value or perform operations on it — that data must be in memory. The variables used in a program, whether ints that take only a few bytes, or large data structures consuming a lot of space, are all stored in memory. However, a relational database is designed to be capable of handling data that is far too large to fit in memory all at once. What won’t fit in memory must be kept in secondary storage and brought into memory when it is needed.

Unfortunately, working in secondary storage (for example reading data from there or writing data to there) is orders of magnitude slower than working in memory.

Putting DBMSs aside for a moment, the slowness of file operations comes up even with very simple code such as this:

with open('data.txt', 'r') as f:
    one_line = f.readline()
    print(one_line)

When we ask to read a single line from the file and put it into the variable one_line, the operating system decides instead to read a big chunk of data (called called a “block” or “page”) and store it in a buffer in memory. In the long run, retrieving and storing a large block is very likely to save time. The next time the program asks for something from the file, there is a good chance it is already in the buffer and can be retrieved from there. Thus, a very slow file read is avoided in favour of a very fast buffer read.

The OS manages multiple buffers and hides all these details from you; you can write a Python program that does file input and output without even knowing that buffering is going on.

A block is quite large. On current systems, it is typcially 4K, 8K, or 16K bytes.