CPU Pipelining
One key performance metric of a CPU is the clock frequency. While the frequency by itself does not tell the whole story, it is still nonetheless a very important indicator of CPU performance. The same CPU running at 800 Mhz will be able to perform twice as many tasks as the same CPU running at 400 Mhz. But what does it mean to run at 400 Mhz?
When we discussed the execution of an instruction, we broke it down into multiple steps. The amount of time it takes to finish all the steps will limit how fast your CPU can run. In normal applications, you finish one instruction per cycle. The longer it takes for all the steps to finish, the longer your cycle time is. A clock speed of 400 Mhz simply states that the CPU runs at a rate of 400 million cycles per second. A clock speed of 1.0 Ghz means the CPU runs at a rate of 1 billion cycles per second, or alternatively, each cycle only lasts 1 nanosecond.
If we look at the earlier breakdown, is it possible to finish all five steps: fetch, decode, issue, execute, and write-back in 1 nanosecond? The answer is no. The key to increasing clock frequency is a technique called pipelining.
Pipelining allows us to break up the task of executing a single instruction into multiple steps. By breaking it up into multiple steps, the cycle time will be limited not by the total time for all the steps, but rather by the longest step. In fact, pipelining is something that comes naturally to people in other parts of life.
Take the example of doing your laundry. Laundry can be broken down into 3 steps: Washing, Drying, and Folding. If we assume that each of those 3 tasks takes 1 hour, then it should take somebody 3 hours to do one load of laundry, and another 3 hours to do their second load of laundry. The total time it takes to perform two loads of laundry will be 6 hours:

However, if we overlap the tasks, we can finish both loads much faster. All we have to do is start the Washer for the second load (blue load) while the first load (yellow) is in the Dryer:
In fact, each additional load of laundry will only take an additional hour. Notice how the pink load finishes 1 hour after the blue load. Our "cycle" time is now 1 hour, instead of 3 hours! The CPU is 3 times faster.
If we look at our CPU, we simply have to break the execution of instructions down into multiple steps, and we can utilize pipelining. From our earlier discussion, we have already broken the task of a CPU down into multiple steps: Fetch, Decode, Issue, Execute, Writeback. Each step is known as a stage. A CPU broken down into 5 steps will result in a 5 stage pipeline.
Rather than the total sum of the 5 steps determining your cycle time (and therefore your clock frequency), the cycle time is determined only by the longest of the 5 steps.
If we break down CPU execution into 5 steps, and get 5 times the clock frequency, you may be asking if we can break it down into 10 steps and get 10 times the frequency? The answer is that theoretically, you can, but you quickly run into other issues that limit the maximum speed up you can achieve. Picking and balancing your pipeline length is a key design component to CPU development.
Conclusion
Now you've had a chance to learn about the inner-workings of a CPU. From mobile phones to set-top boxes and even cars, embedded CPUs deliver a wide variety of capabilities, enhancing the digital experience. Marvell's Sheeva CPU core is a scalable solution that can be tuned to power a vast range of digital devices. Check back later for even more information on CPUs.
For additional product information please register for the Extranet or contact your local Sales Representative.





















