Assignment one questions and answers

Here are some questions and answers about assignment one, and other notes. Suggestions for additions to this list are welcome (e.g. via e-mail).

[COLLAPSE ALL]

Resources available:

Useful tools at https://www.teach.cs.toronto.edu/~ajr/209/notes/tools02.html
The first four shell programming videos
There is a sample compiled solution for adv, as mentioned on the assignment handout

Q: I uploaded my file at 11:45 last night [June 6th] and it didn't work on the school computers, but it worked fine at home. May I have an extension on account of this?

A: No. It's simply not possible for us to accommodate problems with your own computer equipment. I suggest that you use the teach.cs computers, which are professionally managed. You can use the teach.cs computers from home over the net just fine; see remotelogin.html (and then when your home computer breaks, you can come in to school to finish your work, because it was stored at school all along).

Use of your own equipment is at your own risk; I can sometimes assist you with it, and I'm not opposed to it, but if problems with your own computer were allowed to cause you to have extensions, everyone would have free extensions. It's just not feasible.

At this date [June 7th] it's a bit late for this information to be of use to you for assignment one. Fortunately I have a time-warp here so I'll post this on the course web page retroactively before the due date. Please be advised.

When the due time approaches, just hand in what you've got; you have to move on with your other work, in this course and in your other courses. But also, please try to avoid last-minute problems by avoiding setting yourself up to have a major change in working environment right at the last minute, such as switching from your home computer to the school computers. You have to make your program work on the teach.cs computers, so just start there.

If you know in advance that you're going to be late for a good reason, please talk with me about it in advance; although be warned that my suggestion might be that you should simply submit in advance.

I'd also like to point out that even a zero out of 10% is far better than cheating and suffering an academic penalty. Don't cheat even if you're under pressure; and please re-read the section about academic offences at the end of the course information sheet. Whatever the penalty eventually applied for cheating, it will be worse than merely a zero on the assignment.

General notes:

KEEP IT SIMPLE!
In shell programming, if there are existing commands which perform a particular operation then your shell script will be simpler and quite possibly less buggy if you use those commands rather than implementing the operation yourself in a loop.
Make your programs readable and clear.
Avoid clutter such as banner comments stating the name of the file, your name, silly version numbers, etc. None of this is helpful.

Remember always to quote user input unless you know it is "safe", in terms of not having spaces or other funny characters in it.

We use simple data formats throughout. Simple data formats are easier to parse. Don't store or expect any extraneous data in any files.

Remember that error messages are usually generated by the commands your shell script runs. For example, if the copying of a file yields a "disk full" error, "cp" will already have output that. So in most (but not all) cases, you don't have to write error messages when you write a shell script. (You do need to know the behaviour of the tools you're using.)

You may be able to simplify some programs by beginning with a "cd" to the argument directory. You do have to check for error, e.g. if the directory does not exist — it would be bad to use the current directory if the 'cd' fails. (As in the previous paragraph, the 'cd' command itself will generate an appropriate error message, and in fact a better one than you can — it might say "permission denied" or it might say "no such file" or "i/o error" or whatever the problem is. All you have to do is exit.)
(This 'cd' is not required, it's just one possible technique.)

Q: Do we need to set the PATH variable?

A: This is not being required for your shell scripts this term in CSC 209, although it is important in "real life". This issue is discussed in shell programming video 5, which is not part of this course this term.

Q: What is "dash"?

A: An implementation of the sh programming language.

Q: So what is "bash"?

A: An implementation of the sh programming language.

Q: Um, so what is "sh"?

A: An implementation of the sh programming language.

Q: Uhhh...

A: These are all implementations of the same core programming language, but perhaps adding other features that their authors thought were a good idea.

Some of these other features are useful interactively.

Some of these other features are useful in writing programs (shell scripts), but only if you want to commit to using only that implementation of sh. Normally such a commitment is a mistake — an intended user of your shell script might not have the same operating system version and might not have the same sh implementation available. If you stick to the core sh input language, your shell scripts will run anywhere (in unix or linux, that is).

Q: Why is one of them called "sh" itself? Is that the same as bash or what?

A: The standards require /bin/sh to be a valid sh programming language implementation. On a modern unix or linux system this will usually be a symlink to a particular sh implementation such as dash or bash. Check it out on your system with "ls −l /bin/sh".

Don't use temporary files!

In your shell scripts you might be tempted to use a "temporary file", where you output to a file, and subsequently supply that file as input to something else.

Almost all uses of temporary files can be replaced by pipes. Instead of saying

    cmd1 >file
    cmd2 <file
    rm file

, simply say

    cmd1 | cmd2

That's what pipes are for!

In some cases additional creativity might be required to avoid using temporary files. But it's worth it.

To use temporary files properly, there are a number of rules. Mostly, you cannot assume that the current directory is writable, and you certainly can't create files of arbitrary names in the current directory. If you create a file named "file" and then remove it, well, maybe the person running your program already had a file named "file", which you've now destroyed and removed! This is not acceptable.

To avoid this whole issue, do not use temporary files. They are not needed for this assignment. Please ask for hints if you don't see a way to avoid using temporary files.

Q: In rmlink and adv, do we need to check $# and output a usage message?

A: Yes! Any time you access the command-line arguments, you need to do a usage check first.
And the usage message must be in the standard format, and to the standard error (not the standrd output), and you must exit with exit status 1 in case of error or 0 in the normal case.

Q: In dist as well?

A: No, not in dist, because a usage error in dist is impossible! No arguments means to read stdin, and any arguments are to be interpreted as file names.

rmlink: Note the advice to use "test -L" to test if it's a symlink.

This might have been mentioned too subtly in the assignment handout: "test −L" tests whether something is a symlink.

Q: But then if the user does "sh rmlink file" where file doesn't exist, it won't do anything!

A: This is ok for this assignment, although ideally you would test for this situation specifically so as to give a useful error message in this case.

rmlink: Please "ls −l" the symlinks instead of "rm"ing them.

It will be accepted either way in grading, but to reduce the harm caused by bugs in your program during testing I am suggesting that you do an "ls −l" of the files which are symlinks rather than deleting them. You can submit the "ls −l" version for full marks.

dist: Note that the suggested cat "$@" is good for quite a lot of things.

It processes the files on the command-line one by one, so you don't need to write a loop for that.

Also, in the event of zero arguments on the command-line, it processes the standard input. So you don't need to test for that.

Then you can pipe this to other tools, or to a loop of the form "while read x".

The use of "$@" is described in shell video 04 starting at 08:45.

Q: When I write a 'while' loop which is part of a pipeline, variable assignments in the loop don't seem to stick. My program always outputs the original variable values, not what I've done to them in the loop.

A: Since the two sides of the pipe are separate processes which run in parallel, the shell starts a new separate shell process to execute the loop as the right side of the pipeline. Since the shell variables are represented in a data structure inside the shell, this separate process has its own set of variables.

This means that variables which you set in your loop might not be available outside of the loop, if the loop is in a separate process from the outsides, as in the second example below.

The problem can be solved with parentheses.

Compare:

x=0
for i in 1 2 3 4 5 6 7 8 9 10
do
    x=`expr $x + $i`
    echo total so far is $x
done
echo first example: final total is $x

with:

x=0
echo this is ignored | for i in 1 2 3 4 5 6 7 8 9 10
                do
                    x=`expr $x + $i`
                    echo total so far is $x
                done
echo second example: final total is $x

Try it! They are in separate files in /u/csc209h/summer/pub/a1 for easier experimentation, or: [ex1] [ex2]

The solution is to put the 'echo $x' in the subprocess along with the loop:

x=0
echo this is ignored | (
                for i in 1 2 3 4 5 6 7 8 9 10
                do
                    x=`expr $x + $i`
                    echo total so far is $x
                done
                echo third example: final total is $x
              )

And the above solution is also in a separate file in /u/csc209h/summer/pub/a1, or: [ex3]

Avoid unproductive temporary variables, just as in C or Python programming. E.g. don't say

        z=`expr $x + $y`
        echo $z

if you're not later going to use the value of $z. Instead say

        expr $x + $y

which, after all, does its output to stdout, just like echo.

Although sh scripts tend to be a bit less pretty than programs in normal high-level languages, you should still make a little bit of effort in this regard. Indent lines to the appropriate nested indentation level, and do use blank lines in the usual fashion (although of course some sh scripts are too short for that), and keep your lines within 80 characters.

Q: Can we use the built-in arithmetic functions of bash?

A: No. The assignment is to write a program in sh, and not using bash-specific extensions.

Q: But I think it would be much easier to use the built-in arithmetic functions.

A: Perhaps. And it might be easier to write this program in Python, too. But that's not the assignment. Your program should run on any sh-compatible shell.

Q: Ok, now I'm worried. How do I know that I'm not using bash-specific extensions?

A: Don't only use bash for your testing. Try your program with (for example) "dash dist file1 file2" too.

Q: How can I make it that you can just type (for example) "rmlink file" instead of "sh rmlink file"?

A: Two things, which we'll discuss a bit later in the course when we discuss how the kernel executes programs:

Make the first line of the file "#!/bin/sh", or at least, don't start it with a '#' unless the first line is "#!/bin/sh". (Note that this is a comment as far as sh is concerned; it's interpreted by the OS kernel to decide what interpreter to run, and it's designed to be ignored by the interpreter itself by beginning with an sh comment symbol.)
Make it executable by typing "chmod +x rmlink".

You don't have to do this for this assignment. As I say, we'll be discussing this later. Right now we'll be content to run our shell scripts with "sh file"; you can do better, but that's not an sh topic, it's an exec() topic.