The Bug
ROS 2 issue #2942 reported a surprising result: creating 10,000 timers took 429 milliseconds. That’s 23,310 timers per second — not a number that screams “bottleneck” until you realize the work per timer should be constant. Adding N timers shouldn’t take O(N²) time. But it did.
The reproducer was simple:
auto node = std::make_shared<rclcpp::Node>("test");
auto cbg = node->create_callback_group(rclcpp::CallbackGroupType::MutuallyExclusive);
for (int i = 0; i < 10000; ++i) {
node->create_wall_timer(1h, []() {}, cbg);
}
Profiling showed that 96% of wall time was spent inside CallbackGroup::add_timer().
The Root Cause
Here’s what add_timer() looked like before the fix (and every other add_* method had the same pattern):
void CallbackGroup::add_timer(const rclcpp::TimerBase::SharedPtr & timer_ptr)
{
std::lock_guard<std::mutex> lock(mutex_);
timer_ptrs_.push_back(timer_ptr);
timer_ptrs_.erase(
std::remove_if(
timer_ptrs_.begin(),
timer_ptrs_.end(),
[](const rclcpp::TimerBase::WeakPtr & x) { return x.expired(); }),
timer_ptrs_.end());
}
The push_back is O(1). But immediately after, the code runs a full linear scan of the entire vector to remove expired weak_ptrs. On the first call, that scans 1 element. On the second, 2. On the N-th, N. Total: 1 + 2 + … + N = N(N+1)/2 — classic quadratic.
The irony: during bulk addition of timers, none of them are expired. The cleanup pass does nothing useful. It’s just burning cycles scanning a growing vector.
All five add_* methods (add_timer, add_subscription, add_service, add_client, add_waitable) had the same pattern.
The Fix
The insight is that collect_all_ptrs() — called by the executor on every spin iteration — already iterates all entities. It already skips expired entries. It’s the natural place to also remove them.
Step 1: Make add_* methods O(1)
Strip out the cleanup. Just push:
void CallbackGroup::add_timer(const rclcpp::TimerBase::SharedPtr & timer_ptr)
{
std::lock_guard<std::mutex> lock(mutex_);
timer_ptrs_.push_back(timer_ptr);
}
Step 2: Clean up expired entries in collect_all_ptrs
After iterating each vector and calling the user-provided function for live entries, remove the dead ones with remove_if:
void CallbackGroup::collect_all_ptrs(/* ... */)
{
std::lock_guard<std::mutex> lock(mutex_);
for (const auto & weak_ptr : timer_ptrs_) {
auto ref_ptr = weak_ptr.lock();
if (ref_ptr) {
timer_func(ref_ptr);
}
}
timer_ptrs_.erase(
std::remove_if(
timer_ptrs_.begin(), timer_ptrs_.end(),
[](const auto & w) { return w.expired(); }),
timer_ptrs_.end());
// Same pattern for subscriptions, services, clients, waitables...
}
This is safe because:
collect_all_ptrsalready skipped expired entries; now it also removes them- The executor calls
collect_all_ptrson every spin iteration, so expired entries are cleaned up promptly - All
find_*_ptrs_ifmethods already handle expiredweak_ptrs gracefully (they call.lock()and skip nulls)
No public API or ABI change — only the internal cleanup scheduling moved.
The Result
| Time | Throughput | |
|---|---|---|
| Before | 429 ms | 23,310 timers/sec |
| After | 6 ms | 1,666,667 timers/sec |
| Speedup | 71.5x |
The fix is trivially correct. The performance gain is massive. The diff is small. That’s the best kind of optimization.
The Review
Maintainer @jmachowinski had two clean requests:
1. Don’t use mutable on the entity vectors
My initial version marked the five entity vectors as mutable so the const-qualified collect_all_ptrs could modify them during cleanup. The reviewer correctly pointed out: just make collect_all_ptrs non-const instead. All callers lock from WeakPtr to get a non-const SharedPtr, so this is safe.
2. Use remove_if instead of manual index compaction
My first implementation used a hand-rolled collect_and_compact helper that iterated with read/write indices to compact in a single pass. It was correct and cache-friendly, but unnecessary — the standard erase(remove_if(...)) idiom does the same thing and is immediately recognizable to any C++ developer.
Both were good calls. The code got simpler and more idiomatic.
Testing
The PR includes integration tests that verify:
- Expired entries are compacted — Add 100 timers, destroy 50, trigger
collect_all_ptrs, assertsize() == 50 - Only live entities are yielded — Interleave live and expired entries, confirm collection count matches live count
- No quadratic regression — Add 5,000 timers, assert total time < 5 seconds (generous threshold for CI)
- Mixed entity types — Timers + subscriptions cleaned independently
- Interleaved add/remove cycles — Multiple rounds of add and destroy produce correct counts
All tests pass under AddressSanitizer and UndefinedBehaviorSanitizer.
Takeaways
1. Cleanup doesn’t have to happen at insertion time. The original code cleaned up expired entries on every add_* call because it seemed like the right place — “clean up as you go.” But when insertion is frequent and expiration is rare, this creates unnecessary work. Moving cleanup to a point where iteration already happens (the executor spin) makes it free.
2. O(N) per call doesn’t mean O(N) total. Each individual add_* call was O(N) — linear in the current size. That sounds fine until you realize N grows with each call. N calls of O(N) each = O(N²). Always think about the total cost across a sequence of operations, not just one.
3. Profile before guessing. The reproducer made this obvious — 96% of time in add_timer(). Without measurement, you might waste time optimizing the timer callback dispatch or the executor loop.
The PR is ros2/rclcpp#3109. The fix targets the rolling branch.