A primer on GPU architecture and computing
Thank you for writing this post and for the useful GPU overview.
However, please consider dropping the explanation of latency tolerance based on Little's law and all the subsequent references to it, or consult a queuing theory specialist and update the text as needed. In the way it is currently used, I believe it provides a confusing explanation for something simple: you can have relatively large individual instruction latencies and very high throughput by executing lots of things in parallel, which is what GPUs do.
As stated, the explanation based on Little's law moves from arrival throughput (in the equation) to a target throughput (which typically would be the measured average *output* throughput), introducing a conservation of flow constraint which might not be fulfilled by the underlying queuing system when you just plug numbers into the equation (as opposed to measuring a real system).
As I read it, the current explanation essentially says that for a fixed average latency, larger average queue sizes always lead to better average throughput. However, is clearly not always true: having millions of items waiting to be processed does not increase your throughput by magic. On the other hand, having lots of processing units (SPs in this case) and processing more things in parallel does.
In other words, the magic making things work is not in the queuing but in the parallelism. So going to Little's law is both tricky to get right and unnecessary.
Ciao Abhinav, greetings from Italy. I really enjoy and admire your posts. I have written to you via Linkedin, hope that's okay.
Needless to say, this is a fantastic article. Great job and thank you for going so in-depth.
It's interesting that in my time software engineering, I never had to really learn about how GPUs work in-depth. I wish I did. I have a friend working on deep learning over at Nvidia, and he seemingly operates at a different level of technicality than I do. At the same time, I try to remind myself that my expertise and experience is mostly on hyperscale distributed system and what I work on is probably foreign to him.
Regardless, I feel like at least GPU basics should be known knowledge to ambitious software engineers, especially as the world moves forward on GPU-powered computing thanks to AI.
Nice article, refreshed my 2016 memory of CUDA programming.
Keep up the great job 👏
Abhinav, good article! I’m wondering if we can translate your blog into Chinese and post it in Chinese community. We will highlight your name and keep the original link on the top of the translated version. Thank you!