computer networks

We've seen that an operating system, running on a single CPU, can orchestrate communication with the attached hardware: RAM, mass storage, keyboards, monitors, even multiple CPUs in the case of a multi-processor machine. What happens when two distinct machines, running separate OSs, have to cooperate on task?

Cooperation, in this cases, depends on developing protocols, or fixed procedures to choreograph cooperation. At the simplest level, the protocols are closely related to the physical connection between machines in a network. Messages must be passed to reveal information contained in one machine's memory to another, or to request that some action be taken on a remote machine. Let's consider a few examples.

token ring
Several machines can be connected in a ring, with a defined direction of transmission --- a particular machine has a fixed sender to it, and a fixed receiver from it. If a machine sends information to another machine on the ring, all the machines on the ring pass the information along the line until the intended recipient gets the message. It doesn't stop there, however, the same message keeps propagating in the ring until the original sender receives it, and can conclude that it was received. Since all the machines in a ring must cooperate to propagate messages, there must be just one sender at a particular time (this might be modelled as a conversation where the current speaker raises his or her hand). They do this by passing a signal, a bit pattern called a token, around the ring. A machine that has messages to send retains the token until it can confirm that its messages have been received, and then passes it along. Machines that have no messages to send pass along the token as soon as they receive it. [Wikipedia on token rings]

star configuration
A single hub is connected to several machines, and routes messages from one to another. Failure of the hub machine breaks the star. A disadvantage of the star configuration is that if the main machine goes down, they will all go down. [Wikipedia on star configuration].

bus (e.g. ethernet) configuration
[BusNetwork]
A group of machines are simultaneously connected to all other machines in the group, all waiting to receive messages addressed to them, and (potentially) able to transmit messages at any time. If a machine detects that another machine is transmitting at the same time (a collision), it waits for a random interval of time before re-transmitting [see collision detection]. The success of the protocol depends on the random intervals being (on the average) different for different machines. Failure of a single machine doesn't break the network, although other machines may re-transmit many messages to it.

Protocols may be closed, under the proprietary control of a corporation that sells and maintains networking equipment and software (e.g. Novell's IPX), or open, a published standard that can be implemented by anyone with the technical means to do so (e.g. TCP/IP, the protocol used on the Internet).

networks of networks, the network of networks

To move messages between networks with different physical topologies, or different protocols, a gateway that is capable of translating from one protocol to another and passing the appropriate message to the appropriate network is needed. By connecting two or more networks, an internet is created. By the 1970s a large network of networks --- the Internet --- was established connecting some large universities, research, and military establishments. The role of massive public funding (for example, from the U.S. department of defense and National Science Foundation) is notable, since it was a couple of decades before there was widespread commercial interest in the Internet.

You can gain some insight into how information moves over the internet by using traceroute on your CDF account. For example, try traceroute www.google.ca. If you know a remote machine's address, you can use traceroute to figure out all the intermediate routes between you and the destination. You can also experiment with route, which indicates what decisions your machine makes about routing packets.

The address space for this network of networks is comprised of four bytes, the dotted quad, which divides into a portion to identify the network, and a portion to identify the particular machine (or host) on that network. These four bytes contain enough information for a gateway to decide whether the recipient of a message is on the local network (in which case it sends the message directly to it) or whether it is on some other network (in which case it forwards the message to another router, and eventually it reaches the network hierarchy responsible for that network). A U.S. corporation, Internet Corporation for Names and Numbers (ICANN) registers top-level domains, a connected segment of the Internet that takes responsibility for its own sub-domains (the fact that ICANN is U.S.-based, and partly corporate-controlled, is controversial).

Numerical representation of host and domain (portions of the Internet) addresses is difficult even for technically-focussed humans, so there is a translation between the dotted-quad and symbolic names (e.g. seawolf.cdf.toronto.edu). You can investigate this with nslookup on cdf machines (try nslookup seawolf.cdf.toronto.edu). The symbolic names indicate the top-level domains (.org, .com, .net, .edu, and two-character country codes) that group many networks under them.

The hierarchy of domains and sub-domains in the symbolic name indicate the chain of responsibility for keeping track of the mapping between symbolic and numeric addresses, and (for example) which machines handle mail for other machines.

Some of the capabilities that made (and make) the Internet popular:

web, static and dynamic

If you click Control-U while you are browing a web page, you will see intense mark-up conventions that involved lots of "<" and ">" symbols (called tags). This is called the Hypertext Markup Language (HTML), and the tags indicate the logical (and sometimes typographical) structure of thedocument: what the title is, which headings are most important, less important, links to other pages at other sites. There are standards for how the tags are used maintained by the [W3 Consortium], with a [Guide to creating your own HTML]. It contains universally accepted HTML coding that could be read by all browsers such as IE (Internet Explorer), Safari, and Firefox, which interpret the HTML provided by another host in accessible (often graphical) form. Web-pages containing files in HTML format are specified/located through a Uniform Resource Locator (URL).

The HTML page is interpreted by the web browser you use: it uses particular fonts and arrangements of components to represent the markup that it finds. But how does that mass of "<" and ">" get from some remote site to your machine? The HTML pages at a particular site are "served" by an HTTP (hypertext transport protocol) server, a program which runs continuously and can accept requests to send a page to another host. On CDF the http server will honour requests for pages that are in a user's public_html directory, and only then if the page is accessible to any user of the machine. You can explore this further by clicking the "net" tab inside firebug, and then opening any web page.

HTML can contain more than formatted text, images, and sound. Programs written in Java, Javascript (DrRacket can transform itself to Javascript), and several other languages can be transmitted from the originating site to the web browser, and then run on the machine where the browser is running (if the browser allows it). Typically these programs are restricted by (a) the size of file that can reasonably be transmitted over the net, and (b) caution about allowing a program from a remote site to carry out certain functions --- for example accessing your hard drive.

If the http server allows it, a browser can request more than simply delivering a file over the web, but can ask that programs be run on the server side (the machine where the http server is running). These programs might provide access to a data base or some interactive program that runs on the machine where the html page is located. The standard for these programs is the Common Gateway Interface (CGI), and these programs are also usually restricted to prevent malicious, or inadvertent, abuse through requests from browsers.

Getting a high rank in a small domain

Here's a challenge you might take up (it does not involve any marks). Search engines such as Google are constantly scouring the web for pages, and indexing those pages according to key words that occur in them. You might want to increase the visibility of your blog by adding some key words to its content. Have a look at [tips for keyword placement]. You can experiment with small groups of keywords that have very few hits, for example, it's pretty hard to come up with three English words that generate fewer than 20 hits.