Zero-copy GStreamer tracking pipeline on Jetson Orin

From a PyTorch Tracker to a Zero-Copy GStreamer Pipeline: Rebuilding SAM2.1/SAMURAI on Jetson, Step by Step

A long, hands-on account of turning a research-grade PyTorch visual tracker (SAM2.1 / SAMURAI) into a real-time, zero-copy GStreamer + CUDA + TensorRT pipeline on a Jetson Orin NX — decomposing the model into engines, exporting and parity-validating each one, porting the per-frame math to CUDA, packaging it as a GStreamer element, and squeezing 8 fps into 24 with queues and frame-skipping. Every stage is validated against a golden reference.

June 15, 2026 · 20 min · Pavel Guzenfeld
A cropped sensor view handing pan authority off to a physical gimbal

Two Pans, One Stick: Blending a Digital Crop Pan with a Physical Gimbal

A long-range zoom camera pans two ways at once — by sliding a crop window across the sensor, and by physically rotating the gimbal underneath it. Here’s how to make one joystick drive both so the operator never feels the seam, the maths of the handoff, and the per-axis bug that kept the gimbal rolling after the stick let go.

June 7, 2026 · 11 min · Pavel Guzenfeld
H.264 vs H.265 vs AV1 end-to-end latency on Jetson Orin — grouped bar chart

H.264, H.265, and AV1 on Jetson Orin: A Real Hardware Latency Benchmark

A rigorous per-stage latency benchmark across H.264, H.265, and AV1 hardware codecs on NVIDIA Jetson Orin (JetPack 6), measuring encode, wire, and decode separately at FHD and HD resolutions. AV1 wins end-to-end at 104 ms FHD / 86 ms HD. H.264 is the worst choice despite being the oldest: its nvv4l2decoder holds ~4 frames in an internal DPB buffer, adding 130–170 ms of hidden latency. Wire latency is governed by parse-element lookahead, not byte volume. Clock sync achieves ±234 µs via chrony. Full pipeline source, CSVs, and reproduction steps included.

May 12, 2026 · 33 min · Pavel Guzenfeld
Cross-process zero-copy NVMM IPC on Jetson — dma-buf fd passing, NvBufSurfaceImport, lock-free pool

Cross-Process Zero-Copy on Jetson: dma-buf fds, NvBufSurfaceImport, and a Cache-Line-Padded Pool

Two processes on a Jetson, one camera frame in NVMM (GPU memory), no copies. The kernel does the heavy lifting via dma-buf fds; SCM_RIGHTS carries the fd across the process boundary; NvBufSurfaceImport reconstructs the surface on the consumer side; a cache-line-padded ring of atomic ref-counts keeps fan-out coherent without locks. With benchmark numbers and a Godbolt-runnable demo of the SCM_RIGHTS pattern.

April 25, 2026 · 22 min · Pavel Guzenfeld
O3DE multi-camera rendering performance analysis

Chasing 18 Milliseconds: A Performance Deep Dive into O3DE's Render Readback Pipeline

We spent a full session systematically profiling O3DE’s multi-camera streaming pipeline, testing eight different optimization approaches, and pinpointed the exact bottleneck: 18 ms of fixed overhead in the AttachmentReadback scope system. Here’s what we tried, what we measured, and what it means for the engine.

April 17, 2026 · 7 min · Pavel Guzenfeld
Three live Godot camera streams over RTP/UDP rendered by GStreamer clients

From Unity to Godot: Multi-Camera Streaming at 50 FPS with Async GPU Readback

After O3DE’s 18 ms frame-graph readback made 30 FPS streaming impossible, we tried Godot. It got us there — eventually. This is the full path from 105 FPS on nothing to 50 FPS per camera with three live RTP streams, including every wrong turn and every underdocumented Godot behavior we hit on the way.

April 17, 2026 · 12 min · Pavel Guzenfeld
O3DE rendering a ground plane from a camera spawned programmatically inside a headless Docker container

From Unity to O3DE: Multi-Camera Streaming at 1080p in a Headless Docker Container

Exploring whether O3DE can replace Unity as the render engine for a drone simulation that streams multiple 1080p camera feeds via GStreamer. From first scaffold to three live RenderToTexture pipelines in a single session.

April 16, 2026 · 6 min · Pavel Guzenfeld
Anatomy of Four GStreamer Shared Memory Bugs

Anatomy of Four GStreamer Shared Memory Bugs

Four bugs in GStreamer’s shmsink/shmsrc elements — a race condition, a use-after-free, a wrong-pointer dereference, and a page alignment mismatch. What they have in common, how to find them, and what they teach about writing correct GStreamer elements.

March 24, 2026 · 11 min · Pavel Guzenfeld
Why GStreamer shmsink Always Exits with Code 1

Fixing a GStreamer Bug: Why shmsink Always Exits with Code 1

A 2-line fix for a race condition in GStreamer’s shmsink that causes every pipeline using shared memory to exit with an error. How I found it, proved it, and verified the fix with sanitizers.

March 24, 2026 · 4 min · Pavel Guzenfeld
Zero-Copy Video on Jetson: Building gst-nvmm-cpp

Zero-Copy Video on Jetson: Building gst-nvmm-cpp and Contributing to GStreamer

How we built a GStreamer plugin suite for zero-copy NVMM video on NVIDIA Jetson, the bugs we hit along the way, and what it takes to contribute to the GStreamer project — from filing issues to getting an MR merged.

March 23, 2026 · 13 min · Pavel Guzenfeld