Why use GPUs ?

Ion Thruster
4 min readNov 16, 2019

While many of may fall under “Oh yeah, of course GPUs are awesome for performance !” category there may be some who still continue to say “CPUs still rock !”. As companies continually “market” to prove their worthiness/strengths — lets just step back here and take a neutral view to basic question - why GPUs ?

How did it start ?
GPUs evolved organically due to a a fundamental limitation of CPUs to render/draw real-time graphics on screens. Now, if you are familiar the math behind Computer Graphics you probably know that it involves “heavy” use of vector algebra, and often has a lot of parallel and independent calculations. Computer scientists call these kind of problems “Embarrassingly Parallel” and it turns out there are tons of embarrassingly parallel problems other than just graphics. i.e => GPUs can potentially be very effective in accelerating these kinds of problems as well.

Building an accelerator hardware is only part of the problem — programming it is usually the real bottleneck for widespread adoption. This programability was “kind-of-solved” in graphics via the use of DirectX/OpenGL shaders, but expressing a general purpose algorithm as “triangles” and “pixels” is often very cumbersome. Hence NVIDIA invented CUDA C++ which emerged out of the need to “express” General-Purpose Algorithms to run on GPUs and this has been instrumental in its meteoric rise to fame.

There are probably 2 questions in your mid right now :

  1. How many problems are “really” embarrassingly parallel ?
  2. So are GPUs bad at everything else other that problems which fall under category #1 ?

Answers :

  1. More than you probably imagined, while several problems in today’s world aren’t easily parallelizable, a good chunk of them are (at least partially). One of the biggest driving factors for GPUs in recent times has been Deep Learning / AI (which falls in this category), and the growth rate of AI doesn’t seem to be stopping anytime soon.
  2. Partly true ! while GPUs in general are “throughput” oriented machines and CPUs more latency oriented, there are other other benefits in using a GPUs which turns out be VERY helpful too. One of them is “memory bandwidth”, and incidentally a LOT of problems today are still memory bandwidth bound — i.e the memory “read” and “write” times are so large compared to the computation time that Read/Write is the primary bottleneck and having a faster CPU/GPU cores isn’t really going to speedup the solution.

Now to give you ACTUAL number / some perspective : lets look at the GPU strong points first :

  1. Peak theoretical compute throughput :
    Intel® Core™ i9–9900KS : 640 GFLOPS [ shaky estimate; 8 cores, AVX2, @5GHz ]
    GeForce RTX-2060 Super : 6900 GFLOPS [2176 cores @1.6GHz] ; excludes Tensor Cores / RTX Cores
    GFLOPS = Giga Floating Point Operations per Second.
  2. Peak theoretical Memory Bandwidth :
    Intel® Core™ i9–9900KS : 42 GB/sec [ DDR4–2666 ]
    GeForce RTX-2060 Super : 450 GB/sec

So if you were to build a top notch desktop using the above CPU + DRAM costing over $700 + a low end GPU like the RTX-2060 Super ($400 ) — you would theoretically get ~10X better memory bandwidth + 10X computation power !! out of the GPU compared to the CPU — just Wow !!. (No wonder they were used for Bitcoin Mining 😐). And we haven’t even mentioned a word about Tensor-Cores and Ray Tracing cores, so just think about the value add when you put it all together — simply amazing!.

NVIDIA RTX2060 Super vs Intel® Core™ i9–9900KS — Theoretical Peak FMA Throughput
NVIDIA RTX2060 Super vs Intel® Core™ i9–9900KS — Theoretical Peak Memory Bandwidth

While this makes GPUs seem “magical” — this limitation is fundamentally because of the differences in the way they are designed/optimized. CPUs are optimized to better handle random memory accesses with minimal latency and GPUs are optimized to handle large well-behaved memory access with very high throughput. Even with that reasoning, a 10X perf difference while being 1.75X more expensive seems a bit too much IMHO ! — which is one of the reasons why we need AMD to compete better with Intel as it might force them to reduce CPU prices[their profit margin on the CPU is ~60%+].

Now to give some strong points for the CPU (latency) :

  1. CPU latency [rough numbers] :
L1 hit : ~4  cycles
L2 hit : ~10 cycles
L3 hit : ~40 cycles
DRAM : ~80–100 ns

2. GPU Latency [rough numbers] :

L1 Hit : ~30 cycles
L2 Hit : ~190 cycles
DRAM : 300+ cycles [Depends]

Its obvious from the above numbers that, for application which are bound by latency.The CPU will perform much better than GPUs. Apart from this the CPU also has dedicated hardware pre-fetchers, branch-predictors, very deep instruction pipelines with forwarding schemes of which all aim to reduce the latency even more. So overall for general applications — like using a web-browser, using word-processors etc. the latency plays a very critical role in creating “responsive” applications.

But if your application can “hide” this latency via the 200X more GPU threads available or saturate the memory bandwidth — then GPUs will offer tremendous benefits.

So given all this computation capability and memory bandwidth — what application would you build ? :)

Sources :

[1] https://ark.intel.com/content/www/us/en/ark/products/192943/intel-core-i9-9900ks-processor-16m-cache-up-to-5-00-ghz.html
[2] https://www.anandtech.com/show/8423/intel-xeon-e5-version-3-up-to-18-haswell-ep-cores-/11
[3] https://arxiv.org/pdf/1804.06826.pdf
[4] https://stackoverflow.com/questions/4087280/approximate-cost-to-access-various-caches-and-main-memory

PS : I was paid by no one to write this article, and any opinions expressed here are solely my own. Comments and feedback are welcome, feel free to let me know if there are any erratas as well.

--

--