Workshop on Performance Optimizations in Code: The One Billion Row Challenge
Notice: This session was held live on January 28th. I will announce future sessions soon!
Are you participating in the one billion row challenge (1BRC)? The challenge is to read one billion rows from a file, where each row consists of a location name and a temperature value (in plain-text form), and the goal is to output the min, max, and mean temperature for each location as fast as possible. The contest is limited to only Java 21 as far as official rules are concerned.
Challenges like the 1BRC brilliantly highlight the importance of code efficiency at scale. It's a critical lesson that it's not only the choice of data structures and algorithms that matters, but also the execution cost of your code, the impact of CPU instructions, cache misses, branches, and more.
Participating in the 1BRC isn't just an intellectual pursuit — it’s a practical springboard into the world of performance engineering, fostering skills crucial for architecting scalable solution
A Workshop on Learning Optimization Techniques Behind Solving 1BRC
Are you interested in learning what are some of the optimization techniques that the top solutions are employing to reach the top of the leader board? If so, I am going to do a live online workshop on it.
During this workshop, we will:
Employ tools such as flamegraphs and profiling to identify performance bottlenecks.
Dive into various optimization methods, applying them iteratively to enhance the performance of our program.
Have a live Q&A
Our tentative agenda includes:
Topics to be covered
Introduction to performance profiling with flamegraphs.
I/O strategies: unbuffered vs buffered I/O, memory-mapped I/O.
Understanding the cost of system calls and data copying.
Techniques for fast text parsing, including single-byte scanning and multi-byte SWAR techniques.
Instruction costs: weighing integer vs floating-point arithmetic.
Leveraging SIMD instructions for computational optimization.
Cache friendly and branch free coding techniques.
Enhancing program throughput using parallelism.
A basic understanding of CPU and OS concepts such as stack vs heap memory, virtual memory. We will review basic concepts as needed.
While most examples will be in Java, the principles apply across languages, and we will clarify language-specific details.
The workshop will be capped at the first 100 individuals to RSVP (due to Zoom's attendee limit for meetings).
It is free for the paid subscribers. So to attend it, you simply need to upgrade to a paid membership. I plan to offer such workshops and sessions every month going forward, so this will be money worth invested for you!
The recording will be available to the attendees for replay afterwards.
Date & Timing
28th January 2024 (Sunday)
16:30 to 18:30 UTC
Please be prompt in RSVPing to the event to secure your spot, as space is limited to 100 participants. You can RSVP on the event page at the following link: