ALU Forwarding
We want to be able to use the result of the ALU directly, without having to wait for the result to be written back to the register file. For this, we need to forward the result of the ALU directly back into the ALU.

Before the result gets written to the destination register, it is written into the intermediate registers OUT0 and OUT1 first (see diagram). We need to forward the contents of OUT0 and OUT1 back into the input of the ALU (connections marked 1a and 1b connect OUT0 to the ALU, and connections marked 2a and 2b connect OUT1 to the ALU). Of course, we then also need a multiplexer to select the right operand (MUX6 and MUX7).

With this solution, there are no more stalls, and the code is executed the way you would expect from reading the source code. If you don't understand it, try the animation - it should become clear.

 

Load Hazards
Consider the following code fragment.

l r1, 1(r2) // Load the value at address 1+r2 into r1
add r3, r1, r2

How is this executed in the pipeline?

Cycle IF ID EX MA WB
1 lb
2 add lb
3 add lb
4 add lb
5 add lb

Again, the lb will only write the value from memory into the register file in the WB phase, whereas the add retrieves its operands in the ID phase. Thus, the add will actually use an old value of r1. As before, we could ignore this problem and call it a feature, and that would be fine. We could also stall the pipeline for two cycles:

Cycle IF ID EX MA WB
1 lb
2 add lb
3 add lb We detect a stall, issue a nop instead of the add
4 add nop lb And another one
5 add nop nop lb

 

The lb will write its result into OUT1 before it is written to the register file, however, so if we use pipeline forwarding, and we forward the result of OUT1 directly back to the ALU, we can reduce this to a one-cycle stall.

Cycle IF ID EX MA WB
1 lb
2 add lb
3 add lb We detect a stall, issue a nop instead of the add
4 add nop lb
5 add nop lb

 

We cannot do better than this, simply because the value is not loaded from memory until the MA phase. Note that the load calculates the effective address of the memory operand in the ALU phase; in the above example, it would calculate 2+r2.

Try it out. Consider each solution (ignoring the problem, interlock and interlock with forwarding) in the MIPS pipeline.

Paging Previous 1 2 3 4 5 6 7 8 9 10 11 12 Next



Back to Top