Zero-copy GStreamer tracking pipeline on Jetson Orin

From a PyTorch Tracker to a Zero-Copy GStreamer Pipeline: Rebuilding SAM2.1/SAMURAI on Jetson, Step by Step

A long, hands-on account of turning a research-grade PyTorch visual tracker (SAM2.1 / SAMURAI) into a real-time, zero-copy GStreamer + CUDA + TensorRT pipeline on a Jetson Orin NX — decomposing the model into engines, exporting and parity-validating each one, porting the per-frame math to CUDA, packaging it as a GStreamer element, and squeezing 8 fps into 24 with queues and frame-skipping. Every stage is validated against a golden reference.

June 15, 2026 · 20 min · Pavel Guzenfeld
A cropped sensor view handing pan authority off to a physical gimbal

Two Pans, One Stick: Blending a Digital Crop Pan with a Physical Gimbal

A long-range zoom camera pans two ways at once — by sliding a crop window across the sensor, and by physically rotating the gimbal underneath it. Here’s how to make one joystick drive both so the operator never feels the seam, the maths of the handoff, and the per-axis bug that kept the gimbal rolling after the stick let go.

June 7, 2026 · 11 min · Pavel Guzenfeld
H.264 vs H.265 vs AV1 end-to-end latency on Jetson Orin — grouped bar chart

H.264, H.265, and AV1 on Jetson Orin: A Real Hardware Latency Benchmark

A rigorous per-stage latency benchmark across H.264, H.265, and AV1 hardware codecs on NVIDIA Jetson Orin (JetPack 6), measuring encode, wire, and decode separately at FHD and HD resolutions. AV1 wins end-to-end at 104 ms FHD / 86 ms HD. H.264 is the worst choice despite being the oldest: its nvv4l2decoder holds ~4 frames in an internal DPB buffer, adding 130–170 ms of hidden latency. Wire latency is governed by parse-element lookahead, not byte volume. Clock sync achieves ±234 µs via chrony. Full pipeline source, CSVs, and reproduction steps included.

May 12, 2026 · 33 min · Pavel Guzenfeld
Cross-process zero-copy NVMM IPC on Jetson — dma-buf fd passing, NvBufSurfaceImport, lock-free pool

Cross-Process Zero-Copy on Jetson: dma-buf fds, NvBufSurfaceImport, and a Cache-Line-Padded Pool

Two processes on a Jetson, one camera frame in NVMM (GPU memory), no copies. The kernel does the heavy lifting via dma-buf fds; SCM_RIGHTS carries the fd across the process boundary; NvBufSurfaceImport reconstructs the surface on the consumer side; a cache-line-padded ring of atomic ref-counts keeps fan-out coherent without locks. With benchmark numbers and a Godbolt-runnable demo of the SCM_RIGHTS pattern.

April 25, 2026 · 22 min · Pavel Guzenfeld
Zero-Copy Video on Jetson: Building gst-nvmm-cpp

Zero-Copy Video on Jetson: Building gst-nvmm-cpp and Contributing to GStreamer

How we built a GStreamer plugin suite for zero-copy NVMM video on NVIDIA Jetson, the bugs we hit along the way, and what it takes to contribute to the GStreamer project — from filing issues to getting an MR merged.

March 23, 2026 · 13 min · Pavel Guzenfeld