Hazards in Pipelining CPU
Pipelining in processors allows you to save a lot of time but also introduces a lot of hazards in them
Hazards in pipelining CPU
Hazards are stalls that happen because of various reasons in the CPU. They're broadly of three kinds:
Structural Hazards
Take the unified L1 cache.
| clock cycle 1 | clock cycle 2 | clock cycle 3 | clock cycle 4 | clock cycle 5 |
|---|---|---|---|---|
| IF1 | ID1 | EX1 | DA1 | WB1 |
| IF2 | ID2 | EX2 | DA2 | |
| IF3 | ID3 | EX3 | ||
| IF4 | ID4 | |||
| X |
The X denotes that IF5 cannot happen because DA2 and DA3 are already occupying the bus. Hence Split L1 Cache == better.
Data Hazards
| instruction | cycle 1 | cycle 2 | cycle 3 | cycle 4 | cycle 5 |
|---|---|---|---|---|---|
| add t2, t0, t1 | IF1 | ID1 | EX1 | DA1 | WB1 |
| add t4, t2, t1 | IF2 | X | X | ID2 |
Stalls occur because of dependencies in data.
Other examples:
- Write after read: when a later write writes before an earlier read.
- Write after write: when a later write writes before an earlier write.
- Read after write: when it reads the earlier value instead of the later one.
Mitigations include:
- instruction reordering: has to be done by the compiler but can also be done by the processor although it complicates hardware.
- Operand forwarding: once data is in buffers forward the data from the buffers to the next instructions before writing to the registers.
Control Hazards
Branch instructions: Branch condition outcomes are only known at execution time.
Simple solution: Stall until execution time.
Slightly better but complicated solution: Move the check to decode stage. Eg. XOR for equality.
This removes the stall in case the branch was not taken but if it was taken then the buffer has to be flushed and the later instruction fetched.
This can be accomplished using simple counter structures known as branch predictors.
