Computer Systems Test 1

Question 1. (20 points)

a) Why are complex instructions of CISC (Complex Instr. Set Computer) machines difficult to pipeline?

b) Why are RISC machines usually Load & Store machines (i.e., only Load and Store instructions access memory)?

c) One characteristic of RISC machines is to have either large register files or large on-chip caches. Answer the following questions related to this characteristic.

i) One problem with using large register files is the increased number of bits needed to specify a register operand in the machine language instructions. How does SPARC avoid this problem?

ii) How does the overlap of SPARC register windows improve program performance?

Question 2. (35 points)

The whole question refers to a pipelined, RISC machine with five stages:

a) Complete the following timing diagram assuming NO by-pass signal paths.

Without by-pass

signal paths

Time
  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
ADD R1, R3, R4 F D E M W                              
ADD R2, R4, R5                                        
ADD R3, R2, R1                                        
LOAD R2, 12(R3)                                        
STORE R2, 16(R2)                                        

b) Complete the following timing diagram assuming by-pass signal paths as shown above.

With by-pass

signal paths

Time
  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
ADD R1, R3, R4 F D E M W                              
ADD R2, R4, R5                                        
ADD R3, R2, R1                                        
LOAD R2, 12(R3)                                        
STORE R2, 16(R2)                                        

c) If the outcome of a conditional branch instruction is known at the end of the Execute stage, then what would be the branch penalty for a taken conditional branch? (Assume no delayed branching)

d) If the outcome of a conditional branch instruction is known at the end of the Execute stage, then what would be the branch penalty for a taken conditional branch assuming a one-slot delayed branch? .

Question 3. (20 points)

a) If the above "for-loop" is recompiled on a machine that allows the compiler to select different opcodes to statically predict the branch outcome, then should the compiler select BGT_LIKELY or BGT_UNLIKELY for the conditional branch? (Justify your answer)

b) If the above "for-loop" is recompiled on a machine with a branch-history table to dynamically predict the branch outcome, then how does the branch-history table reduce the branch penalty?

c) Would a machine benefit from (1) allows the compiler to select different opcodes to statically predict the branch outcome, AND (2) having a branch-history table? (Justify your answer)

Question 4. (25 points)

Assume the above superscalar processor organization:

a) For the below program, indicate all pairs of instruction that have

i) write-read/read-after-write (RAW)/"true" data dependencies -

ii) output/write-write/write-after-write dependencies -

iii) antidependencies/read-write/write-after-read dependencies -

b) For out-of-order issue with out-of-order execution, show the stages that each instruction is in for the following program. (Note: "Add R1, R2" performs R1 := R1 + R2) (Assume NO by-pass signal paths, i.e., the decode stage for a dependent instruction cannot occur until after the write-back stage!)

  Instructions Cycle
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
1 Add R1, R2 f1 d1 a1 a2 s1                              
2 Mul R2, R4                                        
3 Or R3, R2                                        
4 Add R4, R5                                        
5 Or R7, R9                                        
6 Or R3, R6                                        
7 Add R7, R1                                        
8 Load R3, (R1)