There are two observations to be made here. First of all, instruction i takes 1 (long) clock cycle in the non-pipelined processor, and it takes 5 (short) clock cycles in the pipelined processor. Note that this "short" cycle is 1/5 of the "long" cycle, so it actually takes just as long to execute an instruction in the non-pipelined version as it does in the pipelined version.
The point is, however, that even though it takes just as long to execute one instruction, both processors finish the execution of an instruction after every clock cycle. Instruction i+1 finishes one clock cycle after instruction i for both processors. BUT: the clock cycle for the pipelined processor is a fifth of the clock cycle for the non-pipelined processor. Hence, the throughput, the number of instructions executed per unit time, is 5 times higher for the pipelined processor than it is for the non-pipelined processor. In other words, the pipelined processor is 5 times faster than the non-pipelined processor.
Try it out. To illustrate this point, watch the execution of a single instruction in the non-pipelined and in the pipelined processor. Notice that in the pipelined processor, the clock period is much shorter, and every stage now has its own set of registers. Also note that despite this, the instruction takes the same time to execute in either processor.
Try it out. Then consider the following simple program and watch the execution in both the non-pipelined and the pipelined processor and compare execution speed. You should notice that it takes 6 long cycles to execute the program in the non-pipelined processor, and 10 short cycles to execute the program in the pipelined processor.
addi r1, r0, 4 // r1 := 4
addi r2, r0, 5 // r2 := 5
addi r3, r0, 6 // r3 := 6
slli r1, r1, 2 // r1 := r1 << 2 (r1 := r1 * 4)
subi r2, r2, 3 // r2 := r2 - 3
add r3, r3, r3 // r3 := r3 + r3
Data Hazards
Of course, it is not as simple as it seems (is anything ever?). The above program was carefully chosen for the pipelining to work. Consider the following code fragment:
add r1, r2, r3 // r1 := r2 + r3
sub r3, r1, r2
xor r3, r1, r2
srl r3, r1, r2
ori r3, r1, 10
If we look at the pipeline during the executing of these instructions, we see:
| Cycle | IF | ID | EX | MA | WB |
|---|---|---|---|---|---|
| 1 | add | ||||
| 2 | sub | add | |||
| 3 | xor | sub | add | ||
| 4 | srl | xor | sub | add | |
| 5 | ori | srl | xor | sub | add |
| 6 | ori | srl | xor | sub |
In cycle 3, the sub instruction reads the value of register r0. The add instruction, however, whose destination register is r0, has not reached the write-back phase yet. Hence, the sub will read the wrong value for r0!
This situation is called a data hazard. There is another data hazard cycle 4 (which instructions?). The srl in cycle 5 is OK, because the add will write the result back to the register file in the first half of the clock cycle, and the srl will read its operands in the second half of the clock cycle (two phase memory access). It should be clear that the ori in cycle 6 is OK too - the add has finished execution completely in this cycle.
advertisement
advertisement