Introduction
Announcements

Schedule
Labs
Assignments
TA office hours

Tests, exam

Topic videos
Some course notes
Extra problems
Lecture recordings

Discussion board

Grades so far

CSC 209 Tutorial sh Example: A little web tool

Web browsers tend to be monolithic, multi-functional programs with very little utility as software tools. You can't use them very well in shell scripts, etc.

The text-mode web browser "lynx", however, has some command-line options which make it suitable for use in this way. In this assignment you will write an interactive program in the Bourne shell script programming language which uses lynx to fetch and format web pages and accepts a small list of user commands.

For simplicity, this "websh" program will assume that all URLs are http URLs and that all web pages are html.

Operation

When "websh" is run, it takes an optional initial URL as its sole command-line argument. It then processes commands interactively from the list below. The commands 'c', 'g', and 'i' each take one argument, e.g. the user can type "g 3".
c arg
Change URL as specified. To retrieve the web page for the new URL you will want to run:
lynx -source -dump -base "$url"
s (show)
Display the current file with lynx -dump -force_html file | more. We could also add commands to send a message to a fancier web browser (e.g. see http://mozilla.org/unix/remote.html), to "view source", etc.
g arg (go)
Change URL to one of the references in the lynx output. This corresponds to "clicking on a link"; if you look at the lynx -dump (but not -source) output you will see that each hypertext link gets a number, and they are summarized at the bottom. The existence of this 'g' command means that you will have to keep a copy of the html page from the 'c' command in a temporary file, as suggested in the description of that command above. (If you re-get it for the 'g' command, the web page could have changed, making the user's numeric selection surprising! People edit their web pages a lot and you don't have control over that.) So use lynx -dump on your stashed file, and note that the reference number lines begin with a space; so search for the last occurrence of space, number, period.

If there is no such reference number in the current file, you should display an error message.

b
Bookmark this page. Append the URL to $HOME/bookmarks, which will have the simple format of one URL per line, no descriptive text or anything other than the URL.
B
Show bookmarks, with reference numbers. After this display, 'g' goes to the given bookmark number. After doing a 'B', it's ok if the 's' command shows bookmark data instead, and don't worry about what a subsequent 'b' would do. Note the filter "cat -n" for adding line numbers, although it puts tabs after the line numbers which you might have to modify with "tr" or something.
i arg
Import URLs from a document or other free-format file. Some of the words in the document will be URLs (as they are in this document); you are to extract appropriate words and throw them all into the bookmarks file.

Assume URLs begin with "http://". Remove a trailing period or comma, and remove surrounding double-quotes. However, everything except whitespace and double-quotes is potentially part of a URL, so don't be any more aggressive than this.

You will probably want the -s option to "tr", with the last argument being a simple \\012, to convert the file into its list of words.

q
Exit.
?
Display this list (help).

Other notes

Your programs should all work when given funny filenames as arguments. Also note that URLs can and often do include all sorts of funny characters. Be scrupulous in quoting both unknown file names and URLs.

As arguments to "tr", it's easier to use \012 for newline and \011 for tab rather than actual newlines and tabs. (Unfortunately, "tr" doesn't recognize \n and \t.) Note that '\' is special to the shell and needs to be quoted, e.g. \\012.

[a solution (don't look until you try it!)]


[back to sh problems]