Lab overview
Lab 01
Lab 02
Lab 03
Lab 04
Lab 05
Lab 06
Lab 07
Lab 08
Lab 09
Lab 10
Lab 11
Lab 12
Lab 13
Review

[Course home page]

CSC 209 lab 11 exercises, week of 26 July 2022

[solutions are available (requires teach.cs authentication)]

Attendance

As usual, please either run /u/csc209h/summer/present on the console of a lab workstation during the tutorial, or get the TA to record your attendance. Please do this first so that you don't forget.

Remember that you can check that your attendance has been recorded properly at https://wwwcgi.teach.cs.toronto.edu/~ajr/cgi-bin/auth/present


Part 1: Socket calls

1. Write a C program which listens on an internet-domain socket (port number) and calls accept() twice. Then, to the first client it outputs the string "first" terminated by a network newline; to the second client it outputs the string "second". Then it exits.

To save simply copying from examples, start with /u/csc209h/summer/pub/lab/11/starter.c , which parses argv and has all the #includes you need, and also does the basic bind() and listen(). It also provides a −p option to specify listening port number so that you can change port numbers without recompiling.

For the client you can use "nc".

2. Add a select() call to the above, selecting amongst the two connected sockets AND the standard input (file descriptor 0). There is no timeout (use NULL for the fifth argument to select()). Then report which input source sent input first (for this exercise, you do not need to read the input in question, just detect that it's there via select()). Remember to check your select() call for error and call perror() appropriately, as this will greatly assist in debugging (therefore, do this in the first place, not as a last-minute correction).

3. If you want to spend more time on this, you can go ahead and read the data sent, and output it to stdout, and put this in a loop so that after reading the input from a client you loop around and do another select(). You could also send something back to the client if you want.


Part 2: Network protocols (for credit)

A web "browser" (client) and a web server communicate through the "HTTP" protocol. In this part of the lab we will use nc to learn a bit about this protocol by performing one side or the other manually.

1. Play client

For a web browser to retrieve a web page, it connects to a web server and issues the HTTP "GET" command. There are two parameters to the GET command, and then there are zero or more header lines, then a blank line. After the client (browser) sends the blank line, the web server will send its reply.

The two parameters to the GET command are the partial URL to retrieve, relative to this web server; and the protocol identification. We will first use the protocol HTTP/1.0 which doesn't require any header lines.

1.1) First, go to the URL http://curly.cs.toronto.edu/hello.html in your web browser for comparison.

You can use your web browser's "view source" menu item to look at the raw HTML, or you can use the HTTP protocol directly, as follows:

Now instead make an nc connection to port 80 (the standard HTTP port number) of host curly.cs.toronto.edu by typing:

nc curly.cs.toronto.edu 80
The web server will accept the connection, but not transmit anything yet — in the HTTP protocol, the client transmits first.

Into that nc session, enter the command:

GET /hello.html HTTP/1.0
followed by a blank line. You will see the HTML source of hello.html, but only after the server's "headers", which contain a bunch of information, some useful, some useless.

Something else you might notice is that the web server has replied not with the HTTP/1.0 protocol, but with the HTTP/1.1 protocol. I suspect that this is actually not a valid response, but people don't care so much about the HTTP/1.0 protocol these days. Anyway, one of the innovations of the HTTP/1.1 protocol is that one TCP connection can be used for multiple HTTP requests; so the web server hasn't disconnected after sending the response, thus nc hasn't exited. You can terminate the nc command with control-C for now.

1.2) Now retrieve this same hello.html file using the HTTP/1.1 protocol. (The web server on curly.cs.toronto.edu, like all modern web servers, supports both protocols.) The GET command is modified in the obvious way.

What happens if the GET command is immediately followed by a blank line, like in section 1.1? Try it! This is not valid. (And note the friendly document content in the response, suitable for display by the web browser.)

For HTTP/1.0 there are no mandatory header fields in the GET request, but for HTTP/1.1 there is one mandatory field, which specifies the host name, i.e. "curly.cs.toronto.edu". So, supply as a header field, after the GET but before the blank line:

Host: curly.cs.toronto.edu
and then enter the blank line.

This ability to specify the host name, which was new in the HTTP/1.1 protocol (a long time ago now), allows one web server to serve multiple "virtual hosts" from the same computer (and same IP address).

The first file to submit

Copy and paste the transcript of your (successful) nc command and session in part "1.2" (the HTTP/1.1 version), into a plain text file named "lab11a". Your lab11a file needs to show the shell prompt and nc command; what you sent to the server to retrieve the hello.html file; and the server's full response.

2. Play server

Now you will act as a web server, connecting from the "Firefox" web browser as available on the teach.cs machines.

2.1) Port 80 is a "privileged port number", so you will listen on some other port number. Choose any number between 1025 and 65535; if that is in use, try another. (To connect to other machines in the lab, just use the hostname of the other machine (looks like "b3175-49", or type "hostname" on a computer to find its hostname).)

The URL standard allows you to make an HTTP connection to a non-standard port number with a syntax in which the host name is followed by a colon and then the port number.

(You might be used to omitting the "http://" part and having your web browser fill this in; this is generally fine, but the algorithm most web browsers use to decide whether or not to add this prefix won't work here because of the colon following localhost, so you might actually have to type the "http://" this time.)

So how do you "play server"? You use the "−l" option to nc, which stands for "listen" (as opposed to "connect").

Substituting 1234 below for whatever port number you choose to use, type a command something like

nc -q0 -l 1234
and then go to the URL
http://localhost:1234/foo/bar
or replace "localhost" with the appropriate host name.

You will see the command and header fields from the web browser. Note, for example, the "Host:" header field like what you supplied while playing client in part 1.

Now it's your turn to reply. The first line of the response indicates the response code. You've probably heard of the HTTP response code "404". Less-often joked about but more commonly transmitted is the response code "200" which means "ok", and that the document follows.

The response banner begins with "HTTP/1.1", then a space, then the response code, then a space, then arbitrary human-readable text describing the situation. Why have human-readable text? For exactly the purposes we are currently embarked upon — manual experimentation or debugging by using nc!
(Actually, it works without the human-readable text too, with or without the preceding space.)

The response banner is followed by zero or more header fields, then a blank line, then the content. For now, just enter a blank line, then type some content. This isn't quite going to work, but it will be pretty close if you then terminate the data stream by typing control-D to nc — we included the "-q0" option to enable this to work.

2.2) Let's do this properly now. There are a large number of possible response fields, but there are actually two mandatory ones.

One of the mandatory response fields (header fields) is Content-Type. This tells the web browser whether the document is plain text, HTML, an image, PDF file, etc. (The web server often derives this information from the part of the file name after the final dot (the filename "extension"), but that's a matter of how the web server computer is configured, not part of the HTTP protocol.)

The Content-Type possible values are strings, and they are hierarchical. One valid string is "text/plain"; another is "text/html". Similar to the formatting of the Host: header, you say "Content-Type", colon, space, then the content of the field. (If you're unclear on this, look at the part 1 examples!)

The other mandatory response field is Content-Length.

Do this again, and send Content-Type: text/html, and for the data we are going to send this HTML code:

<h1> Hello message </h1>
This is a <i>hello</i> message.
Since you're entering unix newlines rather than network newlines (it really should be network newlines, but web browsers are generally permissive about this), this is 57 bytes. So before sending this, send "Content-Length: 57". And then the blank line separating the headers from the content.

See that it works in your web browser. Even though the web browser hasn't dropped the TCP connection, so nc is still running, it should know to display the data now because of the Content-Length value. (If you are using the lynx web browser, it doesn't fully implement the HTTP 1.1 protocol so it won't show you the page yet. Press control-D in nc, and I hope you still ran it with -q0.)

The second file to submit

Copy and paste the transcript of your nc command and session (the "2.2" version, the proper one) into a plain text file named "lab11b". Your lab11b file needs to show the shell prompt and nc command; everything the web browser sent; and everything you manually sent back to the web browser to cause a proper display of an HTML document, including the correct mandatory two header fields.


To submit:

Please remember to run "/u/csc209h/summer/present", during the tutorial time, on the console of a tutorial lab workstation, or to get the TA to mark you as present.

The desired contents of lab11a and lab11b are specified in part 2 above. They must be copy-and-pasted from terminal windows as described above (you can still do this from home over ssh, so long as you can copy and paste). Make sure that lab11a and lab11b are plain text files by typing "cat lab11a" and "cat lab11b" at a shell prompt (e.g. word-processor files or pictures will receive a grade of zero).

Note: Both files must show the shell prompt plus the nc command (as well as all of the subsequent input or output).

Then, by the end of Friday July 29, you must submit these lab11a and lab11b files with a command similar to:

submit -c csc209h -a lab11 lab11a lab11b
You can submit them simultaneously as above, or at different times. And all other 'submit' commands and options are, of course, still available.