\( \newcommand{\NOT}{\neg} \newcommand{\AND}{\wedge} \newcommand{\OR}{\vee} \newcommand{\XOR}{\oplus} \newcommand{\IMP}{\Rightarrow} \newcommand{\IFF}{\Leftrightarrow} \newcommand{\TRUE}{\text{True}\xspace} \newcommand{\FALSE}{\text{False}\xspace} \newcommand{\IN}{\,{\in}\,} \newcommand{\NOTIN}{\,{\notin}\,} \newcommand{\TO}{\rightarrow} \newcommand{\DIV}{\mid} \newcommand{\NDIV}{\nmid} \newcommand{\MOD}[1]{\pmod{#1}} \newcommand{\MODS}[1]{\ (\text{mod}\ #1)} \newcommand{\N}{\mathbb N} \newcommand{\Z}{\mathbb Z} \newcommand{\Q}{\mathbb Q} \newcommand{\R}{\mathbb R} \newcommand{\C}{\mathbb C} \newcommand{\cA}{\mathcal A} \newcommand{\cB}{\mathcal B} \newcommand{\cC}{\mathcal C} \newcommand{\cD}{\mathcal D} \newcommand{\cE}{\mathcal E} \newcommand{\cF}{\mathcal F} \newcommand{\cG}{\mathcal G} \newcommand{\cH}{\mathcal H} \newcommand{\cI}{\mathcal I} \newcommand{\cJ}{\mathcal J} \newcommand{\cL}{\mathcal L} \newcommand{\cK}{\mathcal K} \newcommand{\cN}{\mathcal N} \newcommand{\cO}{\mathcal O} \newcommand{\cP}{\mathcal P} \newcommand{\cQ}{\mathcal Q} \newcommand{\cS}{\mathcal S} \newcommand{\cT}{\mathcal T} \newcommand{\cV}{\mathcal V} \newcommand{\cW}{\mathcal W} \newcommand{\cZ}{\mathcal Z} \newcommand{\emp}{\emptyset} \newcommand{\bs}{\backslash} \newcommand{\floor}[1]{\left \lfloor #1 \right \rfloor} \newcommand{\ceil}[1]{\left \lceil #1 \right \rceil} \newcommand{\abs}[1]{\left | #1 \right |} \newcommand{\xspace}{} \newcommand{\proofheader}[1]{\underline{\textbf{#1}}} \)

1.4 Representing Data II: Booleans and Strings

In the previous section, we studied how we could represent numeric data in Python using two concrete data types: int and float. In this section, we’ll introduce two new data types that represent common pieces of individual data, before seeing how to combine multiple pieces of data together in the next section.

Boolean data

A boolean is a value from the set \(\{\text{True}, \text{False}\}\). Think of a boolean value as an answer to a Yes/No question, for example, “Is this person old enough to vote?”, “Is this country land-locked?”, and “Is this service free?”. The word “boolean” might sound a bit strange! Even though the concepts of “True” and “False” have been around since humans have existed, the term boolean is named after the logician George Boole (1815–1864), who was an early pioneer of symbolic logic.

We also saw booleans in the last section as the result of comparison operations. When we write \(3 > 2\), the value of this expression isn’t a number, but instead a boolean (in this case, True).

In Python, boolean data is represented using the data type bool. Unlike the broad range of numbers we just saw, there are only two literal values of type bool: True and False.

Operations on boolean data

Booleans can be combined using different kinds of operations. The three most common ones are:

In Python, we have three logical operators that correspond to the above operators. Unlike the arithmetic and comparison operators we learned about in the last section, Python’s logical operators are written as common English words rather than symbols: not, and, and or. The choice to use English words rather than symbols was made to improve the readability of Python code. We hope you appreciate this aspect of the language when going through the following examples!

>>> not True
False
>>> True and True
True
>>> True and False
False
>>> False or True
True
>>> False or False
False

Just as we saw how arithmetic operator expressions can be nested within each other, we can combine logical operator expressions, and even include arithmetic comparison operators:

>>> True and (False or True)
True
>>> (3 == 4) or (5 > 10)  # Both (3 == 4) and (5 > 10) evaluate to False
False

Next week, we’ll discuss these logical operators in more detail and introduce a few others.

Textual data (strings)

Our second new data type in this section is used to represent text: everything from names to chat logs to the text of of Shakespeare’s Romeo and Juliet. We represent text using the string data type, which is a sequence of characters.One of the English definitions of the word “string” is a sequence of items. We use the word character and not letter because the set of all characters extends far beyond any one alphabet (a, b, c, etc.). Here are examples of other types of characters that you have encountered before:

All Python code is text that we type into the computer, so how do we distinguish between text that’s code and text that’s data, like a person’s name? In Python, we represent strings using the str data type. Just as int is a short form of “integer”, str is a short form of “string”. A str literal is a sequence of characters surrounded by single-quotes ('). Python allows string literals to be written using either single-quotes or double-qutoes ("). You’re free to use either in your own code, but these notes and other course materials will use single-quotes to match how the Python console displays strings. For example, we could write this course’s name in Python as the string literal 'Foundations of Computer Science I'. Here are some examples of writing string literals in the Python console.

>>> 'Foundations of Computer Science I'
'Foundations of Computer Science I'
>>> 'I ❤️ ➡️ 👤'
'I ❤️ ➡️ 👤'
>>> ''
''

That last example is interesting: it is an empty string, a string with no characters. This is perfectly allowed in Python, and is a surprisingly common value to see when processing text data—for example, working with survey data in which a participant left a question blank.

Operations on strings

While there are many different kinds of operations we can perform on strings, we’ll only look at a few common ones right now.

The first is string equality. Two strings are equal when they have the same characters, in the same order. In Python, we can compare strings using the == operator, just like we could for ints and floats.

>>> 'David' == 'David'
True
>>> 'hi' == 'bye'
False

Because uppercase and lowercase letters are represented in Python as different characters, they are not considered equal when comparing strings. We say that string equality is case-sensitive.

>>> 'Mario' == 'mario'
False

Finally, string equality requires an exact match of all characters in both strings, so there can’t be any “leftover” characters:

>>> 'David' == 'David Liu'
False

This brings us to our second string operation, substring search. This operation takes two strings \(s\) and \(t\), and returns whether the string \(s\) appears somewhere inside of \(t\). If this is true, we say that \(s\) is a substring of \(t\). In Python, we can write a substring search expression using a new operator called in.

>>> 'm' in 'computer'  # single character search
True
>>> 'x' in 'computer'
False
>>> 'put' in 'computer'  # multiple character search
True
>>> 'pur'  in 'computer'  # False, because characters must appear consecutively
False
>>> 'computer' in 'computer'  # every string is a substring of itself
True
>>> '' in 'computer'  # the empty string is a substring of every string
True

Our next string operation gives us ways of building up larger strings from smaller strings. The string concatenation operation takes two strings \(s_1\) and \(s_2\) and returns a new string consisting of the characters of \(s_1\) followed by the characters of \(s_2\). In Python, this is represented using the same + operator as for arithmetic addition. So just as we saw how == can be used on values of different types in Python, so too can +. The designers of the Python programming language tried to reuse operators when possible to reduce the amount of different operators programmers would have to learn. This is great for beginners, but also means that you’ll need to pay attention to what the types of the values in an operation are, because the same operator will do different things depending on those types!

And Python supports concatenation using the familiar + operator:

>>> 'David' + 'Mario'
'DavidMario'
>>> 'David ' + 'Mario'  # Extra space after David
'David Mario'
>>> 'Mario' + 'David '  # Order matters!
'MarioDavid '

The final operation we’ll cover in this section is string indexing. This operation takes a string \(s\) and a natural number \(i\), and produces the \(i\)-th character of \(s\), where the counting starts at 0. Typically we’re used to counting starting at 1 (“first”, “second”, “third”, etc.). But due to how most programming languages, including Python, are designed, it is a very strong convention to have indexing of all sequences start at 0. We’ll keep reminding you of this in the first few chapters because it can be a bit strange, but eventually you will get used to it! This operation is denoted \(s[i]\), so \(s[0]\) returns the first character of \(s\), \(s[1]\) returns the second, etc. We use the same syntax with square brackets in Python as well, as in the following examples.

>>> 'David'[0]  # Remember, indexing starts at 0
'D'
>>> 'David'[1]
'a'
>>> 'David'[2]
'v'
>>> 'David'[3]
'i'
>>> 'David'[4]
'd'

References