Compilers

The Road to Boost.SML 1.2.0: New API and a Type-Name Heisenbug

A warts-and-all account of cutting the Boost.SML 1.2.0 release — four new public APIs, a behavior change rooted in undefined behavior, a type-name Heisenbug that only fired on GCC and MSVC, a dead “Run” button, and the surprisingly deep rabbit hole of making 30 Compiler Explorer links that actually work.

C++ State Machine benchmarked — switch/case vs GoF unique_ptr vs C++23 variant/visit

C++ State Machine, Three Ways: Switch/Case, GoF unique_ptr, and C++23 Variant — Benchmarked

Three implementations of the same aircraft lifecycle FSM — a switch/case flat machine, the classic GoF State pattern with unique_ptr, and a C++23 compile-time variant/visit design — compiled with GCC 14 at -O2 -std=c++23, measured with nanobench, and linked to four live Godbolt sessions. The headline numbers: for a full seven-event mission cycle, the variant approach is 6× faster than OOP and switch/case is essentially free (the optimizer evaluates the constant-input trace at compile time). On the steady-state telemetry hot path, switch runs at 0.56 ns/event, variant at 1.07 ns/event, OOP at 1.20 ns/event. The culprit is not virtual dispatch — it is heap allocation. A follow-up section adds get_if chains and the [[likely]] attribute: on single-variant dispatch all four strategies land within 0.11 ns.

C++ low-latency patterns benchmarked — HFT, cache-friendly layouts, and the tricks that don't reproduce

C++ Low-Latency Patterns, Benchmarked: 15 Tricks from HFT and CppCon 2025 (and Which Claims Don't Reproduce)

Fifteen C++ performance patterns from three sources — Bilokon & Gunduz’s HFT paper at Imperial, Jonathan Müller’s Cache-Friendly C++ deck at CppCon 2025, and Okade & Baker’s C++ Performance Tips at CppCon 2025 — implemented as single-file nanobench programs, run under Docker on GCC 14 with -O2 -std=c++23, every one of them linked to a working Godbolt session. The surprising half: three of the ’textbook’ speedups do not reproduce on a modern CPU. Cache warming is 7x slower than cold. Bitmask branch reduction is slower than the cascade it replaces. This post is the full runbook, numbers included, with the nuance that explains when each pattern actually earns its keep.

C++ tools that enforce low-latency patterns — builtins, compiler flags, clang-tidy checks

C++ Low-Latency, Enforced: __builtin_*, Compiler Flags, and clang-tidy, Benchmarked

Follow-up to the 15-pattern HFT post. The first post asked ‘which patterns work?’ This post answers ‘how do you force your team to keep using them?’ Seven new nanobench programs, each with a working Godbolt link, covering __builtin_popcountll (24x faster than a bit-count loop), __builtin_unreachable in switch defaults (1.55x), __builtin_bswap64 (same speed as the portable idiom — GCC already folds it), [[likely]] and [[unlikely]] (no measurable effect on tight loops — an honest null result), and a flags matrix showing -ffast-math + -march=x86-64-v3 giving 6.9x over -O2. Then a .clang-tidy config that fails CI on every common perf regression, with the performance-for-range-copy warning demonstrated at 41x real runtime cost.

Six Optiver-Style C++ Problems: Order Books, Dijkstra, DP, and Price Alert Router

Six Optiver-Style C++ Problems: Order Books, Dijkstra, DP, and an AoS→SoA Rewrite

Working through six C++ problems modeled on Optiver’s Senior SWE assessment: supermarket checkout simulation, order book matching, dividend pricing, lattice path DP, Dijkstra with K free edges, and a price-alert router built twice — AoS first, then SoA. Every bug, wrong turn, and data structure tradeoff included.

Fixing GCC False-Positive Warnings in Eigen: A Deep Dive into -Warray-bounds at -O3

How a GCC 13 false-positive -Warray-bounds warning in Eigen’s TensorContraction led to a lesson about if constexpr, C++14 portability, and the right way to suppress compiler warnings in a large codebase.