RISC (Reduced Instruction Set Computer)

Here's the basic point-form text, at least most of what I wrote on the chalkboard.

Goals

Goal: execute one instruction per cycle

Basic design ideas

Pipelining

Instructions X, Y, and Z which do not affect control flow:
addrinstr
2X
3Y
4JUMP 8
5 
6 
7 
8Z

Phases of instruction execution and some nicknames for them:

Traditionally, execute each phase in turn for each instruction.

But consider that once we've finished fetching the X instruction, during the decoding we're not using the memory path (MAR/MDR). We may or may not need the memory path to fetch the operands; let's assume we didn't. Then there's no reason we couldn't do "FOX" and "FIY" simultaneously.

Lots of issues.

Pipelining issues:

Some typical features of RISC CPUs (not all ubiquitous)

Delayed load

If a "LOAD" requires an extra cycle, we say that we simply may not access the load target in the next instruction!

We want:

ADD R1, R3
LOAD 500, R4
ADD R4, R3

we can't access R4 on that third line (it's an operand as well as the target)

could insert a NOP

better yet, rewrite:

LOAD 500, R4
ADD R1, R3
ADD R4, R3

Delayed branch

like delayed load but more bizarre
timefetchexecute
0X
1YX
2JUMPY
3[5]JUMP
4Z
5[9]Z
At time 4, no instruction is executed. At time 3, we do a useless fetch, and then we squash it.

There are pipelined architectures in which we do fetch instruction Z at time 3...

But there's a better way, which is a more typical RISC approach. It's quite bizarre, but it involves no extra circuitry, in fact even less than squashing. But, it's quite bizarre from the assembly-language programming point of view and is another good reason to write in high-level languages only.

This is the delayed branch rule. What the delayed branch rule means is quite simply that the instruction following a branch, which will have been fetched and decoded, is EXECUTED, even if the branch is taken! So in this example, instruction 5 WILL be executed. As with the delayed load, we can just put a NOP there, but more usually we try to be clever. If that jump op is not conditional on the results of instruction Y, we can write:
addrinstr
2X
3JUMP 8
4Y
5 
6 
7 
8Z
and the instructions executed will be X, Y, Z. This special memory location following a branch is known as the "delayed branch slot", a place where you put one additional instruction to be executed. If the jump IS conditional on the result of instruction Y, perhaps we can do Y, then the conditional jump, then X. Failing everything, we use a NOP. But assuming we usually manage to fill the delayed branch slot with a useful instruction, we're managing to avoid squashing the pipeline on every jump, which saves a lot of time.

Note that we probably still have to squash upon interrupt.

Three-register instruction format

TypicalTypical RISC
ADD R1, R2ADD R1, R2, R2
MOV R1, R3
ADD R2, R3
ADD R1, R2, R3

Lots of registers

Extra space on chip: no microprogram, and most circuitry is simpler. Registers really speed up stuff.

Digression on the use of registers by compilers required:

Programs in high-level languages do not refer to registers; compilers can decide to put variables in registers.

Of course we also need registers to be accumulators, that is, to do arithmetic in; and to store intermediate results. We also use them to pass parameters.

The following:

    i = x * y;
    j = x + 2;
can be compiled to something like:
    LOAD X, R1
    LOAD Y, R2
    MULT R1, R2
    STORE R2, I
    ADDI 2, R1
    STORE R1, J
The naive compilation of the high-level language code results in two loads of x. But we can keep x stashed in R1 and avoid re-loading it. We can also identify common sub-expressions (i.e. larger than a single variable) and use registers to keep their value for subsequent reuse.

All these uses of registers make a program run faster. The more registers you have, the more speedups a compiler (or human programmer) can produce by clever use of registers.

Register windows

Huge number of registers. Only some visible at a time. See example language below. JSR slides window one way; RTS slides it back.

Summary

The points we discussed are:

Not all of these ideas are necessarily found in any one RISC CPU, but a CPU called a RISC will probably have most if not all of the above ideas.

In the other direction, just about any modern CPU does some pipelining, even if it can't do some of these other things because it has a backward-compatible instruction set.

Some of these ideas are arguably just as applicable to non-RISC CPUs, e.g. register windows. But register windowing is more likely to be found in a RISC because it's more likely to have more registers, because more chip space is available due to the lack of microcode and other complexities.

Example language

Here is a description of a hypothetical RISC machine language with some example code. I designed it to be severe along these lines. In practice, modern RISC machine languages are normally not quite so severe, but exhibit some of these strange characteristics.

References to "assignment three" in that document are actually a previous year's assignment three, but the examples are just as good even if they weren't assignment questions, I think. I can translate some of this year's assignment three solutions to this RISC machine language if people ask.


[list of course notes topics available] [main course page]