Memory-mapped i/o

I/O hardware devices are called "peripheral devices" -- if the CPU is the heart, these things are towards the outside. They're connected to the system bus, just like the main memory unit. The CPU can communicate with them.

One way for the CPU to communicate with the peripheral devices is by "memory-mapped i/o", in which the various little registers which control the operation of the i/o hardware have main memory locations. The CPU operates the hardware by storing values in these memory locations, and reading these memory locations.

To construct such a situation, we make a small change to the design of the main memory unit. The main memory unit begins with decoding circuitry, and it sends on the memory request (read or write) to the appropriate memory chip -- mostly RAM chips, with a few ROM chips thrown in. The change we make to create a memory-mapped i/o situation is to make some ranges of memory addresses connect to this various i/o hardware, instead of to RAM or ROM.

Thus the registers in the i/o hardware which control and monitor its operation have memory addresses. This is especially important in the design of modern "personal computers" in which we use a general-purpose CPU which has no built-in knowledge of the particular hardware it will be connected to. We can operate hardware which was designed after the CPU was designed, by writing to the appropriate memory addresses. That is, the software operating the i/o hardware has to be designed in tandem with the i/o hardware itself, but the CPU doesn't have to be designed to take the i/o hardware into account.

But that's not the only reason to use a memory-mapped i/o strategy. The PDP-11 used only memory-mapped i/o even though its CPU was designed along with its i/o hardware. Memory-mapping of i/o registers is a good strategy which allows the programmer to use the instructions they already know to perform i/o by reading and writing memory addresses, and the machine language instruction set doesn't have to include a large number of i/o-specific instructions which are otherwise unused and are less flexible than the normal instructions.

For a simple example of memory-mapped i/o, let's take the example of the sound output of the Apple II. The speaker of the Apple II either had the full voltage going to it, so that the speaker cone would be in one extreme position, or no voltage going to it, so that the speaker cone would be in the other extreme position. This is controlled by a logic line coming from a JK flip-flop, and one memory-mapped i/o memory location is connected to both J and K of this JK flip-flop. In fact, it's the control line which is connected in this way, so that either reading or writing this memory location toggles the position of the speaker cone.

The program makes sound by moving the speaker cone back and forth manually at the appropriate rate. This is fairly fast by human standards -- you couldn't do that by hand -- but it's slow by computer standards and was well within the capability of the 1 MHz "6502" CPU (yup, that's a 1, i.e. a two-thousandth the cycle speed of a 2 GHz CPU).

Rather than going into the 6502 machine language here, let's continue to use the PDP-11 machine language notation to present an example:

LOOP:	TSTB SPEAKER
	BR LOOP

We use a "test" instruction because we want to read it, i.e. put the address value (which I've referred to by the symbolic constant "SPEAKER") into the MAR and assert the Read control line. A TST instruction does this. But it doesn't do too much else, and we don't really want to do anything else.

We use TSTB rather than a plain TST because the address SPEAKER+1 is some other memory-mapped i/o address, and we don't want to do anything with that. In fact, memory-mapped i/o is the primary application of all the byte instructions on a machine such as the PDP-11.

In fact that loop moves the speaker cone fast enough that the sound is at slightly too high a pitch to hear. The following loop makes a lower tone, still somewhat high-pitched but easily heard:

LOOP:	TSTB SPEAKER
	NOP
	BR LOOP

And the following loop makes an even lower-pitched tone:

LOOP:	TSTB SPEAKER
	NOP
	NOP
	BR LOOP

By looking in a 6502 technical reference manual (the 6502 is the CPU chip in the Apple II), you can find the timing for various instructions, and replace that NOP with an instruction or sequence of instructions which takes the amount of time you want.

It's not possible to make all sounds with this small degree of control over the speaker, but you can make quite a range of sound. But you can't really make anything which is enough like speech to understand it, and you certainly can't play recorded music over it.

 
Let's consider some more modern sound hardware which allows us, for example, to play a file containing digital sound (e.g. an "MP3" file).

The data involved in this case is a sequence of numbers representing the position of the speaker cone, perhaps as unsigned integers from -32768 to 32767 (-215 to 215-1). As for the timing, these numbers represent the position of the speaker cone at a succession of regularly-spaced points in time. The number of points per second is known as the "sampling rate". One common sampling rate is 44100 samples per second; this is the sampling rate on audio CDs.

An MP3 file is encoded for compression, but I don't want to talk about that, obviously; let's suppose we have the decoded array of numbers (sound "sample" values) in a range of memory somewhere and we want to play it.

If we have hardware analogous to the above Apple II sound hardware, but which takes a value for the speaker cone position rather than just one bit of information, then we'd have to send the samples at the right rate.

But more modern audio hardware is likely to control the rate of the playing of the sound samples itself. All we have to do is have the next sample ready in time; the audio hardware itself will pace us, as you'll see below.

The PDP-11 didn't have any audio output hardware, so we'll have to make some up if we want to continue to use the PDP-11 instruction set. Let's suppose that memory location SOUNDSTATUS contains a bit which says whether it's ready for the next sound sample, and SOUNDSAMPLE is where we have to write it to.

That is,

1. Suppose a one-byte register named SOUNDSAMPLE.
We're supposed to write the successive sample values to this register.

But if we write the next sample point too soon, it will overwrite the sample before it goes out to the sound hardware. So we also have a bit which indicates whether or not it's ready for the next sample. In this way, the sound hardware itself controls the timing, unlike the Apple II speaker example. This is much better, although it requires more hardware; such hardware was omitted on bare-bones machines such as the Apple II; but hardware to control the timing, to do basic operation of the i/o devices so you don't have to do it in software, is present in all modern general-purpose computers, including the cheap ones.

How can we memory-map a bit? Well, if you have eight bits you want to memory-map, you can combine them into one memory-mapped byte. Sometimes you have fewer than eight bits, so some are missing; in that case, we'll power or ground them or something. (This is a matter for the hardware design, and would be documented for us as programmers trying to use the hardware.)

Thus,

2. Suppose a one-byte register named SOUNDSTATUS.
The second-most-significant bit in this case will be the bit that indicates whether or not the SOUNDSAMPLE register is ready to receive a new sample.

What are identifiers such as SOUNDSTATUS? These are particular standardized memory addresses for a given computer, which the hardware is set up to divert to these i/o registers rather than sending the requests to normal memory locations.

So you have to know this constant address value to be able to access the sound i/o registers. Your assembler may start off with a bunch of relevant names in its symbol table such as, for the PDP-11, KBSTATUS (see subsequent example below); or you may have to put things like this in your program:

SOUNDSAMPLE = 177570
SOUNDSTATUS = 177572
You'd put that at the top. For those of you familiar with the C programming language, in a unix-like environment, things like this would probably be in an 'include' file for the assembly language, analogous to include files for C.

However these symbols get defined, we can then write a simple program to play the sound data:

	MOV #SOUNDDATA, R0
	MOV SOUNDSIZE, R1
 LOOP:	BITB #100, SOUNDSTATUS
	BEQ LOOP  ; not ready yet
	MOVB (R0)+, SOUNDSAMPLE
	DEC R1
	BGT LOOP
	HALT
The "BITB" instruction tests only one byte. As noted earlier, the main use of byte instructions (as opposed to word instructions) is in i/o. This is because we don't want to do anything to the byte addressed as location 177573 (i.e. SOUNDSTATUS+1). It might be some other i/o register, perhaps for completely different hardware. Furthermore, for memory-mapped i/o registers, even reading from the memory location can cause i/o to occur. A "word" instruction specifying address 177572 would access the two bytes forming that word, namely those at locations 177572 and 177573. A "byte" instruction only accesses one byte, and the operand address can be odd if desired, since it is a byte address.

So a byte instruction is what we want to access this one-byte register.

Now, if we wanted to test the top bit of this byte, we'd use a TST instruction, and since the top bit is the sign bit, the N condition code would tell us whether or not the top bit had been on. An example of this appears below in the tty i/o example, next.

But in the current case, we need to use something else, which is the BIT(B) instruction. Here's how that goes.

There are various bit operations which perform logical operations such as AND and OR. Suppose we wanted to test the third-least significant bit of R4. We could write:

	BIT  #4, R4
What this does is an "AND" of the number 4 and the contents of R4. But it doesn't put the result anywhere, just like a TST. On the other hand, it DOES set the condition codes. So then you could test the Z flag.

Recall ANDing an entire number:
It means you go through and AND each bit, one by one. So for example:

You can do these by writing them out in bits. All ALU ops operate on whole words at once, and these are no exception. You can still use them with plain 1s and 0s and get the expected results, or you can in effect simultaneously do 8 or 16 logical calculations at once.

Finally, note that as usual, "HALT" is an unlikely ending for this program, but let's leave it. Or at most, replace it with an RTS so that this is a subroutine. That's not the point right now.

Another example

Here is another i/o example. This example illustrates how keyboard input actually works on the PDP-11.

On the PDP-11, we have memory-mapped i/o locations for one byte of terminal input and one byte of output, and status bytes each way. They can be defined as:

	KBSTATUS = 177560
	TTYIN = 177562
	PRSTATUS = 177564
	TTYOUT = 177566
but these definitions are probably built in to the assembler, or something like that, just as discussed for the SOUNDSTATUS and SOUNDSAMPLE registers above.

TTYIN and TTYOUT are 8-bit i/o registers through which a single ascii character code is transferred. When the user presses a key, its ascii code goes into the TTYIN register. At this point, the most-significant-bit of the 8-bit KBSTATUS register is set to 1. When you MOVB the byte out of TTYIN, that most-significant-bit goes to 0. Until the next keypress. We have to use byte instructions for all of this because we don't want to do i/o operations on whatever's in 177561, etc.

For output, you write the ascii character code into TTYOUT. At this point, PRSTATUS's high bit goes to 0. Eventually the character gets printed. As soon as the hardware is ready for another character code (which is during the printing, so that you can keep it going continuously; that is, it lets us know as soon as it no longer needs TTYOUT's contents to be stable with the old value), PRSTATUS's high bit goes to 1.

Here's a routine to read and echo a line, ended by carriage return, into bytes beginning at LOC. Control-M is 15 in octal. This is also a good example of an "equate" (CR = 015) which is not memory-address-related. The assembler just substitutes subsequent occurrences of "CR" to "015", whatever the context.

        CR = 015
        MOV #LOC, R0

RWAIT:  TSTB KBSTATUS
        BPL RWAIT
        MOVB TTYIN, @R0

WWAIT:  TSTB PRSTATUS
        BPL WWAIT
        MOVB @R0, TTYOUT

        CMPB (R0)+, #CR
        BNE RWAIT
        HALT

Note that this suffers from a buffer overrun problem -- if the user keeps typing, the bytes will keep overwriting more and more memory, and there surely is some limit to how many bytes have been reserved after location LOC. But let's not complicate the loop.

Note that since the most significant bit of the status bytes indicates readiness, after a TST (using the byte version, because you don't want to consult an entire word, just one byte), the N bit indicates whether the system is ready for the next byte to be processed. For KBSTATUS, the msb is 1 if a character has been typed, and for PRSTATUS, the msb is 1 if the previous character has finished being output -- depending on how you look at it, they're opposite, but it means that the same BPL loop does a busy wait for either register, which is presumably why they did it that way.

Of course, it is not coincidental that the most frequently-used flag in KBSTATUS is the easiest to test. If we want to test bits other than the most-significant bit, we can't just do a TST and BPL, we have to use the BIT technique used in the sound example.

Finally, note that the auto-increment addressing in the CMPB adds only 1 because it is a byte instruction.


[list of course notes topics available] [main course page]