Control Hazards - branching causes problems since the pipeline can be filled with the wrong instructions.

Example: Two possible "streams" of instruction

SUB R3, R2, R1 After what stage is the value of R3 known?

BEQZ R3, ELSE

ADD R4, R5, R6 Should not be executed if the branch is taken

.

.

.

ELSE: ADD R3, R3, R2

Assume the branch is taken:

  Time
Instructions 1 2 3 4 5 6 7 8 9 10 11 12
SUB R3, R2, R1 FI DI CO FO EI WO            
BEQZ R3, ELSE   FI DI CO FO EI WO          
ADD R4, R5, R6     FI DI CO FO            
        FI DI CO            
          FI DI            
            FI            
ELSE: ADD R3, R3, R2             FI DI CO FO EI WO

If the branch is taken, then there is a branch penalty of 4 cycles.

If the branch is not taken and we continue to fetch instructions sequentially, then there is no branch penalty.

How could reduce the branch penalty in the pipeline above?

Delayed Branching - define the branch such that one (or two) instruction(s) after the branch will always be executed.

Compiler automatically rearranges code to fill the delayed-branch slot(s) with instructions that can always be executed. Instructions in the delayed-branch slot(s) do not need to be flushed after branching. If no instruction can be found to fill the delayed-branch slot(s), the a NOOP instruction is inserted.

Without Delayed Branching With Delayed Branching
SUB R3, R2, R1

BEQZ R3, ELSE

ADD R4, R5, R6

.

.

ELSE: ADD R3, R3, R2

BEQZ R3, ELSE

SUB R3, R2, R1 ; Alway executed

ADD R4, R5, R6

.

.

ELSE: ADD R3, R3, R2

Branch Prediction to reducing the branch penalty

Main idea: predict whether the branch will be taken and fetch accordingly

Fixed Techniques:

a) Predict never taken - continue to fetch sequentially. If the branch is not taken, then there is no wasted fetches.

b) Predict always taken - fetch from branch target as soon as possible

(From analyzing program behavior, > 50% of branches are taken.)

Static Techniques: Predict by opcode - compiler helps by having different opcodes based on likely outcome of the branch

Consider the HLL constructs:

HLL AL

CMP x, #0

While (x > 0) do BR_LE_PREDICT_NOT_TAKEN END_WHILE

{loop body}

end while END_WHILE:

Studies have found about a 75-82% successful prediction rate using this technique.

Dynamic Techniques: try to improve prediction by recording history of conditional branch

We need to store one or more history bits to reflect whether the most recent executions of the branch were taken or not.

Problem: How do we avoid always fetching the instruction after the branch?

Solution:

Branch-History Table (BHT)- small, fully-associative cache to store information about most recently executed branch instructions. (Figure 8.13b)

<Valid bit, Branch address (tag), Target address, Taken / Not Taken prediction bit(s)>

During the FI stage, the Branch-History Table is checked to see if the instruction being fetched is a branch (if the addresses match) instruction.

If the instruction is a branch instruction and it is in the Branch-History Table, then the target address can be supplied by the BHT.

If the branch instruction is in the Branch-History Table, will the target address supplied correspond to the correct instruction to be execute next?

What if the instruction is a branch instruction and it is not in the Branch-History Table?

Should the Branch-History Table contain entries for unconditional as well as conditional branch instructions?