If you’ve seen my other my posts, you’ve probably realized that CUDA these days is syntactically very similar to CPU code, and that is because it embraces (supports) most of the modern C++ constructs / syntax [up to c++14]. So it’s obvious that there are libraries which were built on top to enhance the productivity of CUDA developers (like like CPU).
In the part I’ll introduce “Thrust” — which is one of iconic / oldest CUDA libraries which embodies the STL principles, but can be used on GPUs. While some of you still might continue to use C++ 98/03, either…
In this section/article I would like to introduce the popular cuBLAS library. “BLAS” here stands for “Basic Linear Algebra Subroutines” which basically is a specification for a bunch of Linear-Algebra computations which are commonly used.
You might be thinking — why use a library for “basic” linear algebra computations like matrix multiply ? to give you a perspective, this is the performance difference you’ll observe between a naive implementation on the CPU vs using a library like cuBLAS !.
>>> /usr/local/cuda/bin/nvcc cublas_matmul.cu -o matmul -lcublas -lcurand -std=c++11
Naive CPU Time : 8777.51 ms
Welcome to the first part in a series of short posts which I’d like to call “The Learning CUDA” series. Here we start off by first installing the necessary Driver and SDK required to start learning / playing with CUDA.
For folks with a system having CUDA capable NVIDIA GPU, trying to get started with CUDA development.
If you prefer an “official” document for installing CUDA SDK — please use the links below :
But if you are like me and like watching videos than reading long documents, check out these links…
While many of may fall under “Oh yeah, of course GPUs are awesome for performance !” category there may be some who still continue to say “CPUs still rock !”. As companies continually “market” to prove their worthiness/strengths — lets just step back here and take a neutral view to basic question - why GPUs ?
How did it start ?
GPUs evolved organically due to a a fundamental limitation of CPUs to render/draw real-time graphics on screens. Now, if you are familiar the math behind Computer Graphics you probably know that it involves “heavy” use of vector algebra, and often…
A beginner’s first hand experience.
Since my college days (CUDA’s early years ~2008) — I’ve always been fascinated with CUDA, and how to use it effectively to speedup some of the most interesting and challenging problems of the day, but I never got the time to learn cuda “formally” and I’d definitely not consider myself an expert. But having said that — I definitely do aspire to gain mastery over the subject and this is me documenting/sharing my journey as it unfolds.
Back in 2008, it was still mainly about game physics, and may be Photoshop which was starting to…
FP16 is an IEEE format which has reduced #bits compared to traditional floating point format (i.e 32bits = “float” keyword we use in C/C++). The main reason for going about using this reduced precision FP16 is because there is hardware speedup available with using FP16 (if you are okay with the precision loss that is) and there is 2X space savings.
Specifically, certain GPUs offer anywhere between 2X to 8X speedup on FP16 compared to FP32. Despite this, we often stick to using FP32 (as beginners) because getting started on FP16 can be a bit tricky mainly due to: