Weekly Code Confessions Digest: Exploring Intriguing Articles, Books, and Courses
Curated resources on topics around AI, Data Structures, Networking, and Performance Engineering
Welcome to Confessions of a Code Addict, the place where I delve into the world of coding, computer science, and everything in between. I'm thrilled to have you here, and I appreciate your support and subscription.
As I continue working on a comprehensive long-form article that requires extensive research, I wanted to share some intriguing resources I discovered last week. This post will become a regular feature in my newsletter, providing you with a weekly dose of fascinating content. So, let's dive right in!
If you are a new reader, please subscribe to Confessions of a Code Addict.
In this article the folks at GitHub share their experience behind building GitHub Copilot, how they started experimenting with the GPT-3 APIs in the form of a static question and answering system, and iterated upon it to ultimately create an interactive plugin for the IDEs. They talk about how this led to the collaboration with OpenAI. They also share some interesting insights behind the prompt engineering they did for Copilot and also their plans for the future of Copilot. Check it out here.
If you are a Java programmer who is interested in performance tuning and measurements, you will find this article very enlightening. We tend to use things like
System.nanoTime() to measure the time taken to execute a block of code. However, these calls themselves can have unreliable latency which can throw off our performance tuning efforts. This article dives deep, starting right from the implementation of these APIs in Java, and going deep into the Linux kernel code in order to determine the source of unreliability. Check out the full article here.
Now, some news for the Python devs out there. If you are a seasoned Python developer, you might know about the fact that Python does not have true concurrency. Even though Python has support for threads, due to the presence of the Global Interpreter Lock (GIL), the Python interpreter allows only one thread to run at any point in time. This means that even if your hardware has multiple CPU cores, they are not being utilized by Python. This has been a long-standing issue for Python developers. Now, there is an effort to mitigate the effects of the GIL. One such effort is discussed in this article. In Python 3.12 (under development), the Python developers have reduced the scope of the GIL to be at the per sub-interpreter level. A sub-interpreter is an instance of the Python interpreter which runs within the main Python interpreter. With this change, each sub-interpreter will have its own GIL, which means we can run multiple sub-interpreters concurrently and utilize multiple CPU cores. This is pretty cool work, read full details in the article here.
Compiler optimization is the most fascinating topic for anyone who is into compilers. One of the optimization techniques is called superoptimization, whose goal is to optimize a given code snippet into its most optimized canonical form. This is a very hard problem and not something that real-world compilers can usually do. This article explores what superoptimization is and also builds a toy implementation. Check out the full article here to learn more about it.
I also have something for the data structure and algorithm fans. Very few people are aware of a fascinating data structure called the bloom filter. It is a probabilistic data structure commonly used in large-scale big data systems to quickly check whether an item exists in a collection or not. Being a probabilistic data structure, it is not 100% accurate. If the bloom filter says an item does not exist in the collection, then that is 100% guaranteed not to exist in the collection. However, if it says that an item might exist, then that might or might not be true (false positive).
The advantage of using this data structure is that it only takes a fraction of memory to index a large amount of data because instead of storing the entire content in memory, it only stores a digest of the data, which is usually a few bits long.
You might use it in situations such as when you need to check if a password is part of a large set of compromised passwords or not. This article shows an implementation of a typical bloom filter and then goes on to show certain optimized implementations taking advantage of the hardware, which makes it 22x faster. These techniques were new to me, and I found them intriguing. Check out the article here
The news about AI experts raising concerns about the possibility of extinction because of AI has been trending this week. However, there is another set of experts who believe that these extinction claims are overhyped, and we have much more important issues to deal with. Seth Lazar, Jeremy Howard, and Arvind Narayanan share their views on this topic in this article. Check it out here.
Courses & Books
If you are interested in learning about computer networks and protocols in a hands-on fashion, then you will love this book. It is an open-source book that teaches everything about computer networks using the systems approach. The authors define the systems approach as a technique that explores how different system components interact with each other, rather than studying each component in isolation. This is particularly important for computer networks, where each network layer works closely with other layers. This means going beyond a single layer and considering issues that span multiple layers. Congestion control is a prime example of this. The approach prioritizes real-world implementation, exemplified by the Internet as a complex and widely-used network system. Check out the book here.
If you are a fan of functional programming, I have something interesting for you. This book discusses the design and implementation of purely functional data structures. These data structures maintain functional purity by not allowing mutations or modifications in their internal state. In order to maintain this property of immutability, these data structures create a new copy with every operation. The main benefit of these data structures is that due to their immutability, they are well-suited for concurrent and parallel programming, and they are very easy to reason about. However, due to the fact that they need to make a new copy with every operation, they can incur high memory overhead. You can find the PDF version of the book here.
This is a course from MIT OpenCourseware on the topic of performance engineering. If you are interested in learning about building high-performance and scalable software systems that extract every ounce of performance from the hardware, then you should take this course.
I hope you enjoy some of these articles, books, and courses. Let me know in the comments if you found something particularly interesting in them and also if you have some interesting resources to share with me.
In another news, I’m travelling this week due to a family emergency, so my next article, which I’m very excited about, might be a little delayed. But it will reach your inbox soon. Thank you for reading Confessions of a Code Addict and supporting me.
If you have not subscribed yet, please subscribe so that you get future posts from me directly in your inbox!