Computer Architecture Homework #3

Due: 9/24/04 (Friday)

You are to assume the same 5-stage pipeline as discussed in class (see http://www.cs.uni.edu/~fienup/cs142f04/lectures/lec6_9-9-04.htm) when answering these questions.

Assume that the first register in an arithmetic operation is the destination register, e.g., in "ADD R3, R2, R1" register R3 receives the result of adding registers R2 and R1.

1. What would the timing be without bypass-signal paths/forwarding (use "stalls" to solve the data hazard)?

(This code might require more that 22 cycles)

  Time
Instructions 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
ADD R3, R2, R1 IF ID EX ME WB                                  
STORE R3, 4(R4)   IF - - ID EX ME WB                            
SUB R3, R4, R5         IF ID EX ME WB                          
LOAD R4, 16(R3)           IF - - ID EX ME WB                    
SUB R6, R4, R8                 IF - - ID EX ME WB              
ADD R5, R3, R6                       IF - - ID EX ME WB        

2. What would the timing be with bypass-signal paths? (This code might require more that 22 cycles)

  Time
Instructions 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
ADD R3, R2, R1 IF ID EX ME WB                                  
STORE R3, 4(R4)   IF ID EX ME WB                                
SUB R3, R4, R5     IF ID EX ME WB                              
LOAD R4, 16(R3)       IF ID EX ME WB                            
SUB R6, R4, R8         IF ID - EX ME WB                        
ADD R5, R3, R6           IF - ID EX ME WB                      

3. Draw ALL the bypass-signal paths needed for the above example.

4. Consider the following insertion sort algorithm that sorts an array numbers:

InsertionSort(numbers - address to integer array, length - integer)

integer firstUnsortedIndex, testIndex, elementToInsert;

for firstUnsortedIndex = 1 to (length-1) do

testIndex = firstUnsortedIndex-1;

elementToInsert = numbers[firstUnsortedIndex];

while (testIndex >=0) AND (numbers[testIndex] > elementToInsert ) do

numbers[ testIndex + 1 ] = numbers[ testIndex ];

testIndex = testIndex - 1;
end while

numbers[ testIndex + 1 ] = elementToInsert;

end for
end InsertionSort

a) Where in the code would unconditional branches be used and where would conditional branches be used?

b) If the compiler could predict by opcode for the conditional branches (i.e., select whether to use machine language statements like: "BRANCH_LE_PREDICT_NOT_TAKEN" or "BRANCH_LE_PREDICT_TAKEN"), then which conditional branches would be "PREDICT_NOT_TAKEN" and which would be "PREDICT_TAKEN"?

c) Assumptions:

Under the above assumptions, answer the following questions:

i) If fixed predict-never-taken is used by the hardware, then what will be the total branch penalty (# cycles wasted) for the algorithm? (Here assume NO branch-history table) For partial credit, explain your answer.

for while end while end for Total
1 when you drop out of loop 99 - one for each time you drop out of the loop, but you execute the while 99 times 1+2+3+...+99 =

99x100/2 =

4950

99 - one for each branch to the beginning of the for 5,149

ii) If a branch-history table with one history bit per entry is used, then what will be the total branch penalty (# cycles wasted) for the algorithm? (Assume predict-not taken is used if there is no match in the branch-history table) For partial credit, explain your answer.

for while end while end for Total
1 when you drop out of loop 1+98x2 - one for the first while, and 2 for the remaking whiles 1 for the first time you execute it and its not in the BHT 1 for the first time you execute it and its not in the BHT 200

iii) If a branch-history table with two history bits per entry is used as in Figure 8.12, then what will be the total branch penalty (# cycles wasted) for the algorithm? (Assume predict-not taken is used if there is no match in the branch-history table) For partial credit, explain your answer.

for while end while end for Total
1 when you drop out of loop 99 - one for each time you drop out the the while loop 1 for the first time you execute it and its not in the BHT 1 for the first time you execute it and its not in the BHT 102