The problems start in cycle 3. We need to check register r1 to see if it is not equal to zero. But the add instruction is still in the EX phase: the value for r1 is still being calculated! As for the other hazards that we have seen, we can ignore this problem, and call it a feature.
We call also stall the pipeline, but we would have to stall it for two cycles (we would have to wait until the add gets to the WB phase). Even then, we would only have the result towards the end of the cycle (the register file is updated halfway through the cycle, and we still have to do the comparison against zero), but that is acceptable. That is quite a long stall, however, and this situation is very common (regard the add as a compare instruction and you'll understand why).
Finally, we could try to forward the result, as we did for data hazards. That means, we would have to forward the contents of OUT0 (connection c) and OUT1 (d) to the comparator. But that is not enough! We also need to forward the result of the ALU as it is being calculated to the comparator (b). Remember, this result will only be written to OUT0 on the next clock cycle - which is too late. This adds an asynchronous (pronounce: dodgy) element to the element, but it is necessary to avoid a stall in this - very common - situation.
Try it out. See all three possible solutions (no zero forwarding, zero interlock and zero forwarding) in the animation.
Summary of the Pipeline Stalls
The following table lists all possible stalls exhaustively, with the conditions for the stall. For each individual stall, you can run an animation that will show a program in which the stall occurs. Note that only for the control hazards, the PC register is updated during the stall. For all other stalls, the PC register is not updated (see section on control hazards for more information).
The sample programs for each of these hazards allow to change the circuit configuration and the contents of the instruction memory, if you wish. This allows you to try and remove the hazard by changing the program or the circuit configuration.
| Hazzard | Condition | Demo |
|---|---|---|
| RAW | No ALU forwarding | Run |
| No ALU forwarding | Run | |
| RAW | No forwarding to SMR | Run |
| (store) | No forwarding to SMR | Run |
| RAW | No zero-forwarding | Run |
| (branch) | No zero-forwarding | Run |
| (Always) | Run | |
| (Always) | Run | |
| Load | Load interlock | Run |
| Control | Calculated PC ≠ pc | Run |
| Calculated PC ≠ pc | Run | |
| Calculated PC ≠ pc | Run |
Questions
Question 1. Why is the MA phase idle for all instructions except loads and stores?
Question 2. In the section introduction of pipelining, the same program is executed in the non-piplined and in the pipelined processor. Confirm that the speedup is 300%, and explain why this is less than the expected 500%.
Question 3. Draw a pipeline diagram to show the hazard mentioned in the section on forwarding to the SMDR. Then write a program to show this hazard, and observe the effects of the different circuit configurations (Store operand forwarding, Store interlock, and No store interlock).
Question 4. The table with all pipeline stalls lists two situations that always cause a stall. What extra hardware would you need to avoid these stalls?
Question 5. The diagram shows four control lines for multiplexer MUX3 (see the section on conditional branches). Are all of these four lines strictly necessary?
Question 6. There are quite a number of stalls during the execution shift-and-add multiplier. Rewrite the program to avoid nearly all of these stalls.
Bibliography
| [1] | Computer Architecture: A Quantitative Approach, 3rd edition, John L. Hennessy & David A. Patterson, Morgan Kaufmann Publishers [now Elsevier] |
| [2] | Logic and Computer Design Fundamentals, 2nd edition, M. Morris Mano and Charles R. Kime, Prentice Hall |
advertisement
advertisement