Introduction
Announcements

Schedule
Labs
Assignments
TA office hours

Tests, exam

Topic videos
Some course notes
Extra problems
Lecture recordings

Discussion board

Grades so far

Software tools

This text file is presented instead of a video for the second (and main) part of the topic Software Tools.

The "software tools" idea is about writing small, simple programs, which do one thing well; and having powerful and general ways to combine them.

Cute quotation: "Unix is user-friendly; it's just choosy about who its friends are."

Summary: "Do one thing well."

Tools which do just one thing can be combined in arbitrary ways.


One thing a bit odd in unix is that program output doesn't contain headers.

Consider the "who" command. Example output:

ajr      console  Jan  8 06:28
ajr      ttyp1    Jan  8 09:25
ajr      ttyp2    Jan  8 09:26
(The "who" output is more exciting on a system with multiple users, especially if no one's on the console and creating multiple terminal windows.)

We can see how many entries there are by using the "word count" program "wc", with the option "−l" which means "only display the line count":

$ who | wc -l
       3
$ 
On many non-unix systems we would expect output with a header, identifying the columns, like this:
User     Terminal  Login time
------------------------------
ajr      console  Jan  8 06:28
ajr      ttyp1    Jan  8 09:25
ajr      ttyp2    Jan  8 09:26
But this would cause problems for the software tools model. In the "who | wc −l" case, the line count above would be off by two; in fact we would get funny results from many tools. For example, a "grep" (display only lines matching a search expression) to see who is logged in and has a "−" in their logname would also display the header separation line, or if a user were named "ogi", then "who | grep ogi" would also display the header line.


Command lines

Another background matter about how the shell parses commands has the unfortunate name "globbing". It's the expansion of filename patterns.

For example, '*' matches any number of any character.
"cat comp*" would output the "composers" file because that file name matches that pattern.

To illustrate how the globbing patterns work, we will use the 'echo' command, which just outputs its arguments.

$ echo hello
hello
$ 
But it outputs those arguments after all expansions and substitutions.
$ echo comp*
composers
$

So below we illustrate how

To play along, download the file toolsfiles/demo.tar , and extract its contents with the command  tar xf demo.tar
(or if you are working on a teach.cs computer, you can just do  tar xf /u/csc209h/summer/pub/02/demo.tar  )
and then cd to the newly-created demo directory.

We begin with an 'ls' to show the list of file names in the current directory, because the expansion of the glob patterns depends on this:

$ ls
a.pdf		a3.pdf		aw.pdf		gcd.c		newstudentlist
a1.pdf		a4.pdf		composers	grades		people
a12.pdf		abc		firstbyte.c	hello		studentlist
$ echo *.c
firstbyte.c gcd.c
$ echo *1*f
a1.pdf a12.pdf
$ echo a?.pdf
a1.pdf a3.pdf a4.pdf aw.pdf
$ echo a*pdf
a.pdf a1.pdf a12.pdf a3.pdf a4.pdf aw.pdf
$ echo a[0-9].pdf
a1.pdf a3.pdf a4.pdf
$ echo a[w1Q]*pdf
a1.pdf a12.pdf aw.pdf
$ 

A '.' at the beginning of the file name is treated specially: it is only matched explicitly, not by a '*' or '?'. (There's no special treatment of dots anywhere else in the name.)
"ls" also does not report files whose names begin with a dot, unless you give the "−a" option.

$ echo *c
abc firstbyte.c gcd.c
$ ls -a
.		a1.pdf		abc		gcd.c		people
..		a12.pdf		aw.pdf		grades		studentlist
.abc		a3.pdf		composers	hello
a.pdf		a4.pdf		firstbyte.c	newstudentlist
$ echo .*c
.abc
$ echo .*
. .. .abc
$ 


Software tools in unix

To make good use of the software tools which are out there, you need to know what they are. Here are a bunch of unix software tools which you want to have in your toolbox.

All of these commands, and all other commands, have man pages. You'll want to get used to reading man pages, especially to find obscure options.

I frequently read man pages. The on-line help in unix is very comprehensive. There's a lot to know and you don't have to remember it all.

If you haven't already done so above, please download the file toolsfiles/demo.tar , and extract its contents with the command  tar xf demo.tar
(or if you are working on a teach.cs computer, you can just do  tar xf /u/csc209h/summer/pub/02/demo.tar  )
and then cd to the newly-created demo directory.
(Or, the individual files are usually linked to below, but it's smoother to have the whole demo directory in advance.)


grep

An example filter is "grep". It outputs lines which match a pattern.

Where '$' is the shell prompt, and given a text file called "composers" (which is in that demo.tar file),

$ grep Q composers
$ 
(that's all zero of them)
$ grep H composers
Henry Purcell
Hildegard von Bingen
Heinrich Schuetz
$ 

We can combine this with other commands with a pipeline, as shown in the introductory section of this document:

$ who
ajr      console  Jan  8 06:28
ajr      ttyp1    Jan  8 09:25
ajr      ttyp2    Jan  8 09:26
$ who | grep ajr
ajr      console  Jan  8 06:28
ajr      ttyp1    Jan  8 09:25
ajr      ttyp2    Jan  8 09:26
$ who | grep 09:25
ajr      ttyp1    Jan  8 09:25
$ 
In the pipelines,

Data goes into a command via the standard input, but also via command-line arguments, as in the arguments to grep above.

Find lines which match a regular expression. Examples (some demonstrated in class):

    who | grep ajr
    grep /~ajr/209/ /var/httpd/log/access_log
    lpq | grep ajr | cut -f1 | xargs lprm

tr

Does character-level substitution, e.g. change all 'x's to 'y's.
tr does character ranges, and you can also identify a character by its octal number by writing a backslash followed by three octal digits (you'll need to quote that backslash so that it's not interpreted by the shell but rather passed straight through to tr).
Examples:
    tr o Q
    tr '\015' '\012' <file.mac >file.unix
    tr A-Z a-z
    tr a-zA-Z n-za-mN-ZA-M
(try these! except for the macintosh one I guess)

head, tail

The first or last n lines of a file (n=10 by default). Examples:
    last | head
    tail /var/log/messages
    tail -40 /var/log/messages

sort

Sorts the input. Try this on the "composers" file, or any other.
"−k" specifies which key field number to sort on (field 1, the left of the line, by default).
"−n" does a numeric sort instead of the usual sort (e.g. 12 comes after 2).
    sort
    sort -k2
    sort -n
    sort -n -k3
lots of other options such as case-insensitive, reverse order — see the man page.

uniq

Collapse adjacent identical lines.
Used in the following example which gives you a word-frequency count in a document, also introducing other features of tr:
    tr -cs a-zA-Z0-9 '\012' <file | tr A-Z a-z | sort | uniq -c | sort -n

sed

"Stream editor": do edit commands on a file as it goes by in the pipeline. Examples:
    sed s/Fred/Wilma/ people
    sed s/Fred/Wilma/g people
    sed 's/Fred[a-z]*/Wilma/g' people
    sed 5d people
    sed /pattern/d people
sed takes arbitrary regular expressions.
Note that this is not the same syntax as the glob expressions!
For example, "[a-z]*" above means "any number of lower-case letters", whereas in the glob notation it would mean one lower-case letter followed by anything.

The argument to sed often has to be quoted so that special characters in it aren't interpreted by the shell (e.g. as glob notation!).

I wrote a quick intro to unix regular expression syntax.

If you enclose some of the search string in backslashed parentheses, \1 in the replacement means the first such match. If you have multiple pairs of backslashed parentheses, you can also use \2, etc.
Whether some of the search string is enclosed in backslashed parentheses or not, '&' in the replacement string represents the entire search string.

Examples (try them!):

sed 's/[A-Z]/ capital-& /g' composers
sed 's/\(.*\) \(.*\)/\2, \1/' composers

echo

Provides output.
−n means not to output the terminating newline.
    echo Please enter repeat count:
    echo -n 'Please enter repeat count: '
→ note how it takes any number of arguments, and outputs them separated by spaces.

Use "tr" to convert x's to y's in xylophone:

cat

lots of interesting options, such as −n to number the lines, −s to eliminate multiple blank lines.

ls

"ls dir" or "ls file"
ls −d to avoid descending into a directory
use xargs to make it read stdin in any interesting way
check out −a, −l, −i, −q, −t, −r
ls strangely (and unsimply) acts differently by default based on whether its output is a "tty" or not (compare "ls" and "ls | cat"), but there are options −C to force columnar output and −1 to force one file per line (mnemonic: "one column")

In general in unix tools, you can combine options into one word with just one minus sign. For example, instead of writing "ls −l −a −r −t" you can write "ls −lart".
Although as soon as you hit an option which takes an argument (such as sort's −k option), that's it for that word. So for example, in "sort −k2f", all of "2f" would be the argument to −k. Although you could rearrange this one: "sort −fk2" is still the same as "sort −f −k2". (This asymmetry is caused by the fact that −k takes an argument and −f does not.)

cp

copy files
either 2 arguments, or multiple arguments and a directory; check out −p and −r

mv

move (rename)
similar command format to cp; always preserves time

rm

remove (delete)
also see options −r, −f
(Don't get in the habit of using −f unnecessarily — loses valuable error messages)

cmp

compare files in a byte-oriented way (especially useful for non-text files).
also try cmp −l

diff

compare files in a text-file-oriented way, cleverly finding matching bits so as to show only the differences.
also try diff −b, also −c

diff is the basis of commands to compare different revisions in many source control systems, e.g. "git diff".

comm

Show lines in common between two files.

Example: students enrolled in CSC 209 before and after the drop date (fictional)
Please try these commands:

comm -1 studentlist newstudentlist
comm -12 studentlist newstudentlist
comm -13 studentlist newstudentlist
comm -23 studentlist newstudentlist
The rule is that the command-line options say which of the three columns to suppress. It's a little odd. Compare with "comm studentlist newstudentlist" with no options, which produces unreadable and useless output but is the key to understanding how the options work.

join

Does a database join of two files. The files must be sorted by the key field you're joining on.
Try:
join newstudentlist grades
There are millions of options to specify what the key fields are, what the output format should be, etc. I think that most people consult the man page every single time they write a join command (which is used more frequently in a shell script than interactively).

The idea of the "−" file name

Almost all commands are willing to accept "−" instead of a file name, and read the standard input at that point.

For example,  diff − file  will compare the standard input to the contents of "file".

Summary

Software tools are small programs which do one thing well.


Find

find is a program you should also know about. It traverses directory hierarchies. It has a huge number of options. I'm not sure it quite qualifies as a "small program which does one thing well", but it's the right tool for a certain set of tasks.

The command line begins with a list of directories, then contains a list of predicates, usually ending with either "−print" (to print the path name if you get that far, i.e. all of the previous predicates are true) or "−exec" (to execute a command for that file path name).

For exec, in the command you can use "{}" to mean to substitute the path name here in the command-line. Since these characters are special to the shell, they need to be quoted.

The command-line for exec needs to be terminated, else find wouldn't be able to tell where the command ends and the find options resume. It is terminated by a semi-colon (as a separate argument). Since semicolon is special to the shell, it needs to be quoted.

The few examples below are intended to give you an idea of what find can do, not to teach you to use it; you'll learn how to use find as you have particular applications for it.


Basic command to find a file by name in a directory tree:

find /u/ajr/209/web/notes -name cat0.c -print
Sample output:
/u/ajr/209/web/notes/toolsfiles/cat0.c


Find files which are modified within the last 30 days, and execute the "ls −l" command on them. But they might be directories, so we need the "−d" option to ls as well.

find /u/ajr/209/web/notes -mtime -30 -exec ls -ld '{}' ';'
Sample output:
drwxr-xr-x  43 ajr  staff  1462 Feb 17 01:02 /u/ajr/209/web/notes
-rw-r--r--  1 ajr  staff  6964 Feb 17 01:01 /u/ajr/209/web/notes/c
-rw-r--r--  1 ajr  staff  7372 Feb 17 01:02 /u/ajr/209/web/notes/files
drwxr-xr-x  8 ajr  staff  272 Feb 19 13:44 /u/ajr/209/web/notes/sockets
-rw-r--r--  1 ajr  staff  1388 Feb 19 13:44 /u/ajr/209/web/notes/sockets/client.c
-rw-r--r--  1 ajr  staff  1329 Feb 19 13:43 /u/ajr/209/web/notes/sockets/client_inet.c
-rw-r--r--  1 ajr  staff  2830 Feb 19 13:35 /u/ajr/209/web/notes/sockets/server.c
-rw-r--r--  1 ajr  staff  1758 Feb 19 13:37 /u/ajr/209/web/notes/sockets/server_inet.c


As above, but only finding plain files (e.g. excluding directories) (so the "−d" option to ls is no longer important, but doesn't hurt either):

find /u/ajr/209/web/notes -type f -mtime -30 -exec ls -ld '{}' ';'
Sample output:
drwxr-xr-x  43 ajr  staff  1462 Feb 17 01:02 /u/ajr/209/web/notes
-rw-r--r--  1 ajr  staff  6964 Feb 17 01:01 /u/ajr/209/web/notes/c
-rw-r--r--  1 ajr  staff  7372 Feb 17 01:02 /u/ajr/209/web/notes/files
drwxr-xr-x  8 ajr  staff  272 Feb 19 13:44 /u/ajr/209/web/notes/sockets
-rw-r--r--  1 ajr  staff  1388 Feb 19 13:44 /u/ajr/209/web/notes/sockets/client.c
-rw-r--r--  1 ajr  staff  1329 Feb 19 13:43 /u/ajr/209/web/notes/sockets/client_inet.c
-rw-r--r--  1 ajr  staff  2830 Feb 19 13:35 /u/ajr/209/web/notes/sockets/server.c
-rw-r--r--  1 ajr  staff  1758 Feb 19 13:37 /u/ajr/209/web/notes/sockets/server_inet.c


Some I/O redirection details (to be discussed in class)