Body

Instruction Pipelining - assembly-line idea used to speed instruction completion rate

Assume that an automobile assembly process takes 4 hours.

If you divide the process into four equal stages, then ideally

time between completions =

Problems:

stages might not be balanced
overhead of moving cars between stages
two stages need same specialized tool (structural hazard)

Serial Execution

Pipelined Execution

Instruction Pipelining Example: One possible break down of instruction execution. (Different than text's)

Stage Abbreviation Actions

Fetch Instruction FI Read next instruction into CPU

Decode Instruction DI Determine opcode and operand specifiers

Calculate
Operands
CO Calculate the effective addresses of all operands

Fetch
Operands
FO Fetch operands from memory or register file

Execute
Instruction
EI Perform the indicated operation

Write
Operand
WO Write operand to memory or register file

Pipeline latches/registers between each stage. Hold temporary results and act like an IR. Some of the hardware components used (e.g., Memory and Register File) are shown as if they are duplicated, but they are not.

Problems that delay/stall the pipeline:

structural hazard - a piece of hardware is needed by several stages at the same time, e.g., Memory in FI, FO, and WO. This might require stages to sequentially access the hardware.
data hazard - an instruction depends on the results of a previous instruction which has not been calculated yet. (RAW) read-after-write example: ADD R3, R2, R1 ; R3 R2 + R1

SUB R4, R3, R5 ; R4

R3 + R5

In what stage does the ADD instruction update R3?

In what stage does the SUB instruction read R3?

control hazard - branching makes it difficult to fetch the "correct" instructions to be executed

Data Hazards

Time

Instructions 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

ADD R3, R2, R1 FI DI CO FO EI WO

SUB R4, R3, R5 FI DI CO FO EI WO

Alternatives:

1) Introduce stalls

Time

Instructions 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

ADD R3, R2, R1 FI DI CO FO EI WO

SUB R4, R3, R5 FI DI CO stall stall FO EI WO

2) Add additional hardware (bypass-signal paths) to "foward" R3's new value to the SUB instruction:

No stalls needed in this case.

Time

Instructions 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

ADD R3, R2, R1 FI DI CO FO EI WO

SUB R4, R3, R5 FI DI CO FO EI WO

What would control the MUX?

MUX Operation:

Consider the following code: ADD R3, R2, R1

LOAD R4, 4(R3)

What would the timing be without bypass-signal paths/forwarding?

Time

Instructions 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

ADD R3, R2, R1 FI DI CO FO EI WO

LOAD R4, 4(R3) FI DI DI DI DI DI CO FO EI WO

This assumes that R3 cannot be written and the new value read in the same stage.

If we assume that R3 can be written in the first half of the WO stage and its new value read in the last half of the DI stage, then we get:

Time

Instructions 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

ADD R3, R2, R1 FI DI CO FO EI WO

LOAD R4, 4(R3) FI DI DI DI DI CO FO EI WO

What would the timing be with bypass-signal paths?

Time

Instructions 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

ADD R3, R2, R1 FI DI CO FO EI WO

LOAD R4, 4(R3) FI DI stall stall CO FO EI WO

Draw the bypass-signal paths needed for the above example.

Stage	Abbreviation	Actions
Fetch Instruction	FI	Read next instruction into CPU
Decode Instruction	DI	Determine opcode and operand specifiers
Calculate Operands	CO	Calculate the effective addresses of all operands
Fetch Operands	FO	Fetch operands from memory or register file
Execute Instruction	EI	Perform the indicated operation
Write Operand	WO	Write operand to memory or register file

	Time
Instructions	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15
ADD R3, R2, R1	FI	DI	CO	FO	EI	WO
SUB R4, R3, R5		FI	DI	CO	FO	EI	WO