## Computer Architecture Test 1 Question 1. (10 points) Consider the high-level assignment statement X = A - B / C + A\*B\*C. a) As in homework #1, write the LOAD and STORE assembly language instructions for this statement. Load RI, A Load R2, B Load R3, ( 5 DIV R4, R2, R3 MUL R5, R1, R7 MUL R5, R5, R3 SUB R6, R1, R4 ADD R6, R6, R5 STORE RLIX b) As in homework #1, write the 0-address (Stack machine) assembly language instructions for this statement. PUSH A PUSH B PUSH C PUSH C PUSH A PUSH B MUL PUSH C MUL ADD Pop X ABC/- ABXC++ Question 2. (13 points) Characteristics of CISC (complex instruction set computers) computers are: - variable length instruction format - both simple and complex instructions that require a variable number of cycles to execute - large number of addressing modes with some complex addressing modes Why do these characteristics make CISC computers hard to pipeline? Time to fetch instruvaries with variable length instructions. Time to execute varies with simple & complex instructions. Complex addr. modes take longer to calculate operand addr. Pipeline stages should be short and smilling in length. Question 3. (20 points) ## Note that: - The first register is the destination register, e.g., "ADD R2, R6, R7" performs R2 ← R6 + R7 - LOAD R1, 16(R2) loads register R1 with the memory value from the address 16 + (address in R2) - STORE R2, 8(R6) stores register R2 to memory at the address 8 + (address in R6) a) For the five stage pipeline of discussed in class (see above), complete the following timing diagram assuming NO by-pass signal paths. | | | | | | | | | | Tin | ie – | <b>&gt;</b> | | | | | | | | |-------------------|---|---|---|---|-----|---|---|-------|-----|------|-------------|---------|----|----|----|------------------|----|----| | Instructions | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | | ADD R6, R8, R5 | F | D | Е | M | W | | | | | | | | | | | | | | | ADD (\$2), R3, R4 | | F | p | E | IN. | W | | | | | | | | | | | | | | LOAD R5, 8(R3) | | | F | 0 | E | M | W | | | | | | | | | | | | | MUL (R62R7,(R2) | | | | F | ~- | | D | E | M | W | | | | | | | | | | LOAD (R7, 1(R6) | | | | | | | F | ***** | · | 1000 | D | 6 | M | W | | | | | | STORE (R7) 4(R6) | | | | | | | | | | | F | Series. | | e | Ω | 55071.<br>502.11 | M | W | b) Complete the following timing diagram assuming by-pass signal paths. | T4 | | | | | | | | | Tin | ıe → | • | | | | | | | | |------------------------|---|---|---|----|---|---------------------------|-----|-------------------|-----|------|----|----|----|----|----|----|----|----| | Instructions | 1 | 2 | 3 | 4 | 5 | , 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | | ADD R6, R8, R5 | F | D | Ε | M | W | | | | | | | | | ļ | | | | | | ADD <b>R</b> 2, R3, R4 | | | 0 | E | M | W | | | | | | | | | | | | | | LOAD R5, 8(R3) | | | F | () | 1 | $\mathbb{N}_{\mathbb{N}}$ | , W | | | | | | | | | | | | | MUL(R6) R7, R2 | | | | F | D | E | M | W | | | | | | | | | | | | LOAD (R7)4(R6) | | | | | | D, | 3 | $\mathcal{N}_{v}$ | W | | | | | | | | | | | STORE (R7) 4(R6) | | | | | | TELESPA<br>ASSISTA | D | 13 | M | W | | | | | | | | | c) In the diagram at the top of the page, add all by-pass signal paths used in part (b). Question 4. (12 points) Consider the conditional and unconditional branch instructions of a five-stage (F, D, E, M, W) pipelined RISC computer with 32-bit addresses. | Assembly-language<br>Example | Machine-language Format | Description of Example<br>Semantics | |------------------------------|--------------------------------------------------------|----------------------------------------| | bgt R4, R5, ELSE | opcode reg. # reg. # PC-relative displacement to label | If R4 > R5, then branch to ELSE label | | jump END_IF | opcode PC-relative displacement to label | Unconditionally branch to END_IF label | | | | | | · · | | |----|-------------------|--------------------|-------------------|-------------------|------------------------------| | a) | What advantage(s) | does a PC-relative | displacement to a | label have over a | n absolute (32-bit) address? | - tewe # of bits - relocatable - code can be moved in ithout being modified b) Why is the branch penalty of a conditional branch instruction 2 cycles? NGT FDE Known after E FD STF branch is taken, need to discard F Why is the branch penalty of an unconditional branch instruction 1 cycle? TMP FDE target of branch known Sonly need to throw a way one instr. Question 5. (10 points) Consider the memory hierarchy of a five-stage (F, D, E, M, W) pipelined RISC computer with 32-bit addresses below: **CPU** R0 R31 32KB L1 32KB L1 Instruction Data Cache Cache 1 MB L2 Cache Containing both Instructions and Data | CS | ociow. | | |----|---------------------|--| | N | Iain Memory<br>4 GB | | | 0 | | | | 1 | | | | 2 | | | | | | | | | : | | | | | | | | [ | | | | | | | | | | | | | | | | | | | | | | | | | | | | ] | | | | | | | | ļ | | | | | | | | | | a) In a pipelined CPU, why is the L1 cache usually split into two independent caches: an instr. cache and data cache? Fetch from Frstradt same time as accers data in data cache. b) A hit ratio of 90 % in the L1 caches is common. How is this possible eventhough the program is much bigger than the L1 caches? locatity of netpronce Question 6. (15 points) Consider the following partial program that takes a two-dimensional array M that is 100 rows x 100 columns and forms two sums: - positiveSum sum of all the positive values, and - negativeSum sum of all the negative values. ``` positiveSum = 0 negativeSum = 0 for row = 0 to 99 do for col = 0 to 99 do if M[row][col] > 0 then positiveSum = positiveSum + M[row][col] else negativeSum = negativeSum + M[row][col] negativeSum = negativeSum + M[row][col] negativeSum = negativeSum + M[row][col] end for ``` - a) Where in the code would unconditional branches be used and where would conditional branches be used? - b) If the compiler could statically predict by opcode for the conditional branches (i.e., select whether to use machine language statements like: "BRANCH LE PREDICT NOT TAKEN" or "BRANCH LE PREDICT TAKEN"), then which conditional branches would be "PREDICT NOT TAKEN" and which would be "PREDICT TAKEN"? - c) Under the below assumptions, answer the following questions. - all the values in M are negative - the five-stage pipeline from class (F, D, E, M, W) - the target address (i.e., address of label) of all branches is known at the end of the D stage - the outcome of conditional branches is known at the end of the E stage i) If static predict-never-taken is used by the hardware, then what will be the total branch penalty (# cycles wasted) for the algorithm? (Here assume NO branch-prediction buffer) For partial credit, explain your answer. ii) If a branch-prediction buffer with one history bit per entry is used, then what will be the total branch penalty (# cycles wasted) for the algorithm? (Assume predict-not taken is used if there is no match in the branch-prediction buffer) For partial credit, explain your answer. iii) If a branch-prediction buffer with two history bit per entry is used, then what will be the total branch penalty (# cycles wasted) for the algorithm? (Assume predict-not taken is used if there is no match in the branch-prediction buffer) For partial credit, explain your answer. | Name: | |-------| |-------| Question 7. (10 points) On a 32-bit computer, suppose we have a 1 GB ( $2^{30}$ bytes) memory that is byte addressable, and has a 4 MB ( $2^{22}$ bytes) cache with 16 ( $2^{4}$ ) bytes per block. a) How many total lines are in the cache? $$\frac{2^{2e}}{2^{4}}=2^{18}$$ lines b) If the cache is direct-mapped, how many cache lines could a specific memory block be mapped? c) If the cache is direct-mapped, what would be the format (tag bits, cache line bits, block offset bits) of the address? (Clearly indicate the number of bits in each) | e the number of bits in each) | 2 4 | |--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------| | | 11 offsel | | 1 tag line | | | and the second s | and the best of the second to the second of the second of the second of the second of the second of the second | - d) If the cache is 4-way set associative, how many cache lines could a specific memory block be mapped to? - e) If the cache is 4-way set associative, how many sets would there be? Question 8. (5 points) Why are full-associative caches limited to a small number (8 to 64) of cache lines? Question 9. (5 points) The format of a memory address using a direct-mapped cache is: | tag ons | tag bits line # offs | et in block | |---------|----------------------|-------------| |---------|----------------------|-------------| tag bits Why would swapping the order of the tag bits and line # fields: line # give really bad perform of the cache? Since the line # is the most significant bits, sequential blocks would mapto same cache like causing constant misser with sequential accers