
C++ Low-Latency, Enforced: __builtin_*, Compiler Flags, and clang-tidy, Benchmarked
Follow-up to the 15-pattern HFT post. The first post asked ‘which patterns work?’ This post answers ‘how do you force your team to keep using them?’ Seven new nanobench programs, each with a working Godbolt link, covering __builtin_popcountll (24x faster than a bit-count loop), __builtin_unreachable in switch defaults (1.55x), __builtin_bswap64 (same speed as the portable idiom — GCC already folds it), [[likely]] and [[unlikely]] (no measurable effect on tight loops — an honest null result), and a flags matrix showing -ffast-math + -march=x86-64-v3 giving 6.9x over -O2. Then a .clang-tidy config that fails CI on every common perf regression, with the performance-for-range-copy warning demonstrated at 41x real runtime cost.


