Reservoir sampling

#626 – May 18, 2025

How to select a fair random sample from a set of unknown size

Reservoir sampling
11 minutes by Sam Rose

Reservoir sampling is a technique for selecting a fair random sample when the size of the set is unknown. Sam explains the mathematics behind it, showing how selecting each new item ensures all items have an equal chance of being selected. This algorithm has practical applications in systems like log collection services, where it allows maintaining a representative sample without exceeding processing thresholds or using unpredictable amounts of memory.

How Top Engineering Teams Slash Microservices Testing Costs While Shipping Faster
sponsored by Signadot

Engineering teams are slashing testing costs while catching more bugs early with Signadot's ephemeral sandboxes. Instead of duplicating expensive environments, our solution uses request-based isolation to cut infrastructure spending by 90% while accelerating testing cycles by 10x. Companies like DoorDash and Brex report dramatic ROI: $4M+ in annual savings and 70% fewer production incidents. Try it free at signadot.com today.

Writing that changed how I think about programming languages
5 minutes by Max Bernstein

Here's a curated list of programming language and compiler resources that fundamentally changed Max's understanding of technical concepts. He includes papers and blog posts covering topics like garbage collection, optimization techniques, regular expressions, neural networks, bytecode interpreters, and compiler design.

Multiplexing
7 minutes by Justin Jaffray

Multiplexing is as a common pattern in programming beyond its traditional electronics context. Justin presents multiplexing as a three-step process: associating objects with a key, sending them through a single channel, and extracting them using the key. He also identifies several non-obvious applications of this pattern, including bitwise operations, request batching, relational database design, and query decorrelation.

How cursor indexes codebases fast
7 minutes by Engineer's Codex

How Cursor uses Merkle trees to efficiently index code. Merkle trees create a hierarchical hash structure that allows Cursor to detect changes in code files, synchronize only modified files with servers, and enable codebase-aware AI features. This system provides efficient incremental updates, data integrity verification, and optimized caching while addressing privacy concerns through path obfuscation and specialized embedding techniques.

The magic of software
13 minutes by Moxie Marlinspike

Moxie argues that while abstraction layers have made software development more accessible, treating these abstractions as black boxes without understanding what lies beneath them limits innovation and quality. This principle extends to engineering organizations, where siloed teams functioning as black boxes prevent the bidirectional relationship between vision and engineering that drives truly innovative products.

And the most popular article from the last issue was:

newsletters