From Multi-Cycle to Pipelining

The last lecture introduced the multi-cycle design to fix the inefficiency of the single-cycle model. Instead of one long clock cycle for all instructions, it uses multiple short cycles and lets each instruction take only as many as it needs.

Major benefits of using Multi-Cycle Design is

Critical path design
- Recap: In a single-cycle processor, the clock speed for all instructions is limited by the single slowest instruction.
- If loading from memory takes a long time, even a simple, fast add instruction is forced to take that same long time. The "critical path" is this longest possible operation.
- In a multi-cycle design, the clock cycle only needs to be long enough for the slowest step, not the slowest entire instruction. This allows the clock to run much faster. A complex instruction will simply take more of these short clock cycles to finish, but it doesn't slow down every other instruction in the system.
Bread and butter (common case) design
- Not all instructions are created equal or used with the same frequency. Some instructions (like add) are the "bread and butter" of many programs—they are very common. Others are less frequent.
- Can optimize the number of states it takes to execute “important” instructions that make up much of the execution time
balanced design
- In a single-cycle design, every piece of hardware needed to execute an instruction must be available simultaneously. If an instruction needs to both calculate PC+4 and perform an arithmetic operation, it might require two separate adders because everything happens in one go.
- Because the instruction is executed over several cycles in Multi-Cycle, a single piece of hardware can be reused for different steps.
- For example, the processor can use the main Arithmetic Logic Unit (ALU) in the first cycle to increment the PC and then use that same ALU in a later cycle to perform the instruction's main calculation. This avoids the need for duplicate hardware, which makes the processor less expensive and more power-efficient.

The major downside of the multi-cycle approach is that

it needs to store the intermediate results at the end of each clock cycle
- which leads to register setup/hold overhead
most of the processor's hardware is idle at any given time.
- For example, while one instruction is executing, the hardware for fetching the next instruction is doing nothing.

Pipelining is the fundamental technique to solve this problem. It works like an assembly line for instructions, allowing the processor to work on different stages of multiple instructions at the same time, which dramatically increases throughput, the number of instructions completed per unit of time.

The Core Idea of Pipelining

An instruction is broken down into a sequence of steps or "stages" (e.g., Fetch, Decode, Execute, Memory access, Write back).

In a pipelined processor, each stage has its own dedicated hardware. As one instruction moves from the Fetch stage to the Decode stage, the next instruction in the program can enter the now-vacant Fetch stage.

In an ideal scenario, once the pipeline is full, one instruction finishes every single clock cycle.

From Multi-Cycle to Pipelining

The Core Idea of Pipelining

Why Pipelining Isn't Perfectly Efficient: The "Need to Know" Problems