First, a simple set of instructions to compute C := A + B. There's really nothing very interesting to do with the delayed load slots, so we don't, except for the last.
LOAD A, R1
NOP ; can't do a load in the delayed load slot, either...
LOAD B, R2
NOP
ADD R1, R2, R2
NOP ; and you can't do STORE R2 immediately after ADD ,,R2
STORE R2, C
HALT ; ha ha! Store will complete while the CPU is halted!
Now, some more-interesting code: the Ackermann function (with as much overcommenting as in the published assignment three solutions):
First, the C/Java code we're translating:
int ack(int m, int n) // precondition: m and n are non-negative
{
if (m == 0)
return n + 1;
else if (n == 0)
return ack(m - 1, 1);
else
return ack(m - 1, ack(m, n - 1));
}
Or in Python:
def ack(m, n):
if m == 0:
return n + 1
elif n == 0:
return ack(m - 1, 1)
else:
return ack(m - 1, ack(m, n - 1))
Our RISCy version (remember that the parameters are in R1 and R2) (lots of delayed branch slots! also after JSR and RTS! sometimes random stuff goes there which is noted as "harmless"... else NOP):
ACK: ADD R0, R1, R1 ; test m
BNE ELSE1
NOP ; delayed branch slot
ADD R0, R2, R1 ; n -> R1
RTS ; that is, if (m == 0) return n + 1
; delayed branch slot:
ADDI 1, R1 ; might overflow, in which case we return with V set
ELSE1: ADD R0, R2, R2 ; test n
BNE ELSE2
ADD R0, R1, R8 ; delayed branch slot: R8 <- m in any case
ADDI -1, R8 ; first parameter is m - 1
JSR ACK ; ack(m - 1, 1)
MOVEI 1, R9 ; delayed branch slot: second parameter is 1
BVS OVERFLOW
ADD R0, R8, R1 ; move result into R1 (harmless if V set)
RTS ; that is, else if (n == 0) return ack(m - 1, 1)
; warning: delayed branch slot follows!
ELSE2: ; the delayed branch slot above set m as first parameter
ADD R0, R2, R9 ; put n in R9; harmless for RTS
JSR ACK
ADDI -1, R9 ; delayed branch slot: second parameter is n - 1
BVS OVERFLOW
ADD R0, R8, R9 ; return value is second parameter to the next call
ADD R0, R1, R8 ; m
JSR ACK
ADDI -1, R8 ; delayed branch slot: first parameter is m - 1
BVS OVERFLOW
NOP ; delayed branch slot -- can't find anything to put here
RTS ; that is, else return ack(m - 1, ack(m, n - 1))
ADD R0, R8, R1 ; delayed branch slot: move result in R1
OVERFLOW: RTS
SEV ; delayed branch slot -- set 'V' condition code
Example #2:
Prime-testing subroutine:
int isprime(int p)
{
int d;
for (d = 2; d < p; d++)
if (p % d == 0)
return 0;
return 1;
}
Or in Python:
def isprime(p):
for d in range(2, p):
if p % d == 0:
return 0
return 1
In RISC code:
ISPRIME: MOVEI 2, R2 ; the number we're trying to divide into p (== R1)
LOOP: SUB R1, R2, R0
BLE YEP ; when we want to return 1
; now calculate p%d
DIV R1, R2, R4 ; R5 <- [R1] % [R2]; harmless if BLE YEP
; if (p % d == 0) return 0
ADD R0, R5, R5 ; test R5
BNE LOOP
ADDI 1, R2 ; delayed branch slot: this is the loop increment
RTS
ADD R0, R0, R1 ; delayed branch slot: return 0
YEP: ; return 1
RTS
MOVEI 1, R1 ; delayed branch slot
Program code to sum the numbers in memory locations 100 through 110, inclusive, illustrating delayed load rule issues:
MOVEI 100, R2 ; pointer
MOVEI 110, R3 ; target
ADD R0, R0, R1 ; sum
LOOP: ADD R0, R2, R4
IND R0, R4, R4 ; load indirect R4, into R4 -- i.e. R4 <- [[R4]]
ADDI 1, R2 ; can't access R4 in this cycle
SUB R2, R3, R0 ; if (p <= 110)
BLE LOOP
ADD R4, R1, R1 ; delayed branch slot (happens anyway): sum += *p
HALT
A problem from the VELMA problems page, but simplified so as not to care about overflow (we already showed dealing with that in the Ackermann function):
Write a subroutine to compute the sum of an array of words. The calling sequence to add 20 words starting at location 3000 is:
MOVEI 3000, R8 MOVEI 20, R9 JSR ROUTINE
An answer:
ROUTINE: ADD R0, R1, R3 ; get pointer 'p' out of the way of the sum
ADD R0, R0, R1 ; initialize sum to 0
LOOP: ADD R0, R3, R4
IND R0, R4, R4 ; load indirect R4, into R4
ADDI 1, R3 ; p+=1 (can't access R4 in this cycle)
ADDI -1, R2 ; count
BGT LOOP
ADD R4, R1, R1 ; delayed branch slot: sum += *p
RTS
NOP