Computer Architecture Homework #4

Due: 10/12/04 (Tuesday)

1. Tomasulo's Algorithm: A loop-based example

Loop: LD F0, 0 (R1)

MULTD F4, F0, F2

SD F4, 0 (R1)

ADDI R1, R1, 8

BNE R1, R2, Loop ; Branch if R1 is Not Equal to R2

Assuming that instructions for three successive iterations of the loops get issued before either of the Load Double (LD) operations complete. Assuming that MULTD takes 4 clock cycles to execute. After all the instructions are issued to the reservation stations, assume that the Load (LD) operations complete on consecutive clock cycles. Assume that there are multiple (at least three) FP multiplers.

a) How long would it take to complete these three loop iterations once the Load operations start to complete?

b) Estimate how long it would take to complete these three loop iterations if register renaming (via Tomasulo's algorithm) was NOT being used. (You'll just need to make up your methodology for determining this. A paragraph explaining your methodology and assumptions would be nice, especially if you want some partial credit.)

2. Translate the following code to Itanium assembly language to eliminate the branch instructions by using predicate register(s).

if (R2 < R3) then

R1 = R1 + 1

else

R1 = R1 - R2

end if

3. The figure on page 3 (taken from Computer Organization & Architecture by William Stallings) shows a typical programs Call-Return pattern. Like the Itanium, the program is being run on a computer with a large number of registers with overlapping register windows (see Figure 14.1). Each register window holds a procedure/function/method call-frame, with only the call-frames on top of the run-time stack being held in registers. Call-frames are pushed to the run-time stack only when the register windows are full and a procedure is called. Call-frames are popped from the run-time stack into a register window when only the top call-frame on the run-time stack remains in the register and that procedure returns.

It is assumed in the diagram that 5 call-frames will fit into register windows throughout the programs execution. The gray boxes on the diagram show when the hardware must automatically push or pop a call-frame to or from memory. In this diagram of 100 calls/returns, only 19 call-frames must be pushed or popped during the program's execution.

a) For this same program, determine the number of times that the call-frames must be pushed or popped if the computer had enough registers to hold 6 call-frames.

b) For this same program, determine the number of times that the call-frames must be pushed or popped if the computer had enough registers to hold only 4 call-frames.