Billy Dally, the Chief Scientist at Nvidia, has a very interesting classification of throughput vs. latency processing.

However you package it, the PC of the future is going to be a heterogeneous machine. It could have a small number of cores (processing units) optimized for delivering performance on a single thread (or one operating program). You can think of these as latency processors. They are optimized for latency (the time it takes to go back and forth in an interaction). Then there will be a lot of cores optimized to deliver throughput (how many tasks can be done in a given time). Today, these throughput processors are the GPU. Over time, the GPU is evolving to be a more general-purpose throughput computing engine that is used in places beyond where it is used today. (Source)

It makes a lot of sense to think about chips this way.  General purpose CPUs (like Intel x86 chips) have been optimized to reduce latency.  They have all sorts of fancy predictive logic to guess what instructions could be executed next, big caches to keep data local, and are designed to keep the CPU pipe full.  (They have all sorts of tricks for achieving this: superscaling, out of order execution, instruction pipelining, etc.) But at some point, this breaks down. Your cache may have a high hit rate, but it isn't 100%. You may guess pretty well at future operations, but eventually the logic required to support this guessing hits diminishing returns.

Throughput processors (eg: GPUs) are different. They don't have huge caches or complex management logic to fill the instruction pipe. But if you can feed them, they can crank. That's why a current Intel x86 CPU has 4 cores, and a current Nvidia as 240.  They spend their transistors on different logic. For GPUs, it is having a lot of very simple processors that can't do much, but can do what they do very quickly.  The challenge is keeping these throughput processors full and busy, so you can take advantage of their potential speed.


comments powered by Disqus