Cross-process zero-copy NVMM IPC on Jetson — dma-buf fd passing, NvBufSurfaceImport, lock-free pool

Cross-Process Zero-Copy on Jetson: dma-buf fds, NvBufSurfaceImport, and a Cache-Line-Padded Pool

Two processes on a Jetson, one camera frame in NVMM (GPU memory), no copies. The kernel does the heavy lifting via dma-buf fds; SCM_RIGHTS carries the fd across the process boundary; NvBufSurfaceImport reconstructs the surface on the consumer side; a cache-line-padded ring of atomic ref-counts keeps fan-out coherent without locks. With benchmark numbers and a Godbolt-runnable demo of the SCM_RIGHTS pattern.

April 25, 2026 · 22 min · Pavel Guzenfeld
C++ low-latency patterns benchmarked — HFT, cache-friendly layouts, and the tricks that don't reproduce

C++ Low-Latency Patterns, Benchmarked: 15 Tricks from HFT and CppCon 2025 (and Which Claims Don't Reproduce)

Fifteen C++ performance patterns from three sources — Bilokon & Gunduz’s HFT paper at Imperial, Jonathan Müller’s Cache-Friendly C++ deck at CppCon 2025, and Okade & Baker’s C++ Performance Tips at CppCon 2025 — implemented as single-file nanobench programs, run under Docker on GCC 14 with -O2 -std=c++23, every one of them linked to a working Godbolt session. The surprising half: three of the ’textbook’ speedups do not reproduce on a modern CPU. Cache warming is 7x slower than cold. Bitmask branch reduction is slower than the cascade it replaces. This post is the full runbook, numbers included, with the nuance that explains when each pattern actually earns its keep.

April 23, 2026 · 28 min · Pavel Guzenfeld
Anatomy of Four GStreamer Shared Memory Bugs

Anatomy of Four GStreamer Shared Memory Bugs

Four bugs in GStreamer’s shmsink/shmsrc elements — a race condition, a use-after-free, a wrong-pointer dereference, and a page alignment mismatch. What they have in common, how to find them, and what they teach about writing correct GStreamer elements.

March 24, 2026 · 11 min · Pavel Guzenfeld