How GCC's std::fill_n Silently Regressed Eigen's AutoDiffScalar Performance

The Setup

Eigen’s commit c01ff453 (Dec 2024) introduced a std::fill_n optimization for filling dense matrices and arrays with a constant value. Instead of Eigen’s own coefficient-wise assignment loop, the new code delegated to std::fill_n — which, for trivial types like double or int, compiles down to an optimized memset-like loop.

The optimization was gated behind an eigen_fill_helper trait that checked whether the expression type structurally supported contiguous filling (Matrix, Array, contiguous Block, stride-compatible Map). What it did not check was whether the scalar type was suitable for std::fill_n.

// Before: unconditionally enabled for Matrix/Array
template <typename Scalar, int Rows, int Cols, int Options, int MaxRows, int MaxCols>
struct eigen_fill_helper<Matrix<Scalar, Rows, Cols, Options, MaxRows, MaxCols>>
    : std::true_type {};

For Matrix<double, 3, 3>, this is great. For Matrix<AutoDiffScalar<Vector12d>, 12, 12>, it’s a regression.

Why AutoDiffScalar Is Different

AutoDiffScalar<DerType> stores a scalar value and a derivatives vector:

// unsupported/Eigen/src/AutoDiff/AutoDiffScalar.h
protected:
  Scalar m_value;
  DerType m_derivatives;  // e.g., VectorXd — dynamic allocation

Because DerType can be a dynamically-allocated Eigen vector, AutoDiffScalar requires non-trivial copy/move operations. It is not trivially copyable — and std::is_trivially_copyable<AutoDiffScalar<...>>::value is false.

This matters because GCC’s libstdc++ std::fill_n has a subtle performance characteristic: for non-trivially-copyable types, it does not hoist the fill value out of the loop. Each iteration performs an extra move of the source value. For a type like AutoDiffScalar<Vector12d>, that means copying 12 doubles of derivative data on every single iteration — even though the value is the same each time.

Eigen’s own call_dense_assignment_loop doesn’t have this problem. It evaluates the source expression once per coefficient, with no redundant copies.

The Regression

The result: any code filling a matrix of AutoDiffScalar — including common operations like setZero(), setConstant(), or construction from a constant — got measurably slower after c01ff453. This was reported as Eigen issue #2956.

The Fix

The fix is a one-line conceptual change: gate eigen_fill_helper on std::is_trivially_copyable<Scalar> in addition to the structural checks:

// After: only enabled when the scalar type is trivially copyable
template <typename Scalar, int Rows, int Cols, int Options, int MaxRows, int MaxCols>
struct eigen_fill_helper<Matrix<Scalar, Rows, Cols, Options, MaxRows, MaxCols>>
    : std::is_trivially_copyable<Scalar> {};

When Scalar is trivially copyable, std::is_trivially_copyable<Scalar> inherits from std::true_type — the optimized std::fill_n path is taken. When it’s not (AutoDiffScalar, custom types with non-trivial copy), it inherits from std::false_type — Eigen falls back to its own coefficient-wise loop.

This mirrors what eigen_memset_helper (used for setZero()) already does:

template <typename Xpr>
struct eigen_memset_helper {
  using Scalar = typename Xpr::Scalar;
  static constexpr bool value = std::is_trivially_copyable<Scalar>::value &&
                                !static_cast<bool>(NumTraits<Scalar>::RequireInitialization) &&
                                eigen_fill_helper<Xpr>::value;
};

The memset path was already safe because it checked is_trivially_copyable. The fill_n path just needed the same guard.

The Review Iteration

Getting to the final one-liner took a few iterations, which are instructive:

Iteration 1: std::bool_constant The initial implementation used std::bool_constant<std::is_trivially_copyable<Scalar>::value>. Clean, but std::bool_constant is a C++17 feature — and Eigen supports C++14.

Iteration 2: std::integral_constant Switched to std::integral_constant<bool, std::is_trivially_copyable<Scalar>::value> for C++14 compatibility. Functionally correct, but verbose.

Iteration 3: Direct inheritance (reviewer suggestion) Charles Schlosser, the author of the original fill optimization, pointed out that std::is_trivially_copyable<T> already inherits from std::integral_constant<bool, ...>. The wrapper is redundant — just inherit directly:

: std::is_trivially_copyable<Scalar> {};

This is the kind of simplification that’s easy to miss if you’re thinking about the value rather than the type hierarchy. std::is_trivially_copyable<T> isn’t just a compile-time boolean — it’s a type that inherits from std::true_type or std::false_type, which are themselves std::integral_constant<bool, true/false>. Since eigen_fill_helper inherits from this trait, it gets ::value, implicit bool conversion, and everything else for free.

What We Learned

1. std::fill_n is not a universal optimization. Library implementations make trade-offs. GCC’s libstdc++ optimizes fill_n for trivially copyable types (it can use memset or SIMD). For non-trivially-copyable types, the generic fallback adds per-iteration overhead that a hand-written loop avoids. The lesson: profile before assuming a standard algorithm is faster than your domain-specific implementation.

2. Type traits compose through inheritance, not just ::value. When you write struct Foo : std::true_type {}, Foo doesn’t just carry a value — it is a std::integral_constant<bool, true>. You can inherit from type traits directly. This is a common pattern in Eigen’s trait system and throughout the STL.

3. Consistency within a codebase catches bugs. The memset path already had the is_trivially_copyable guard. The fill_n path didn’t. When two code paths have the same preconditions, they should have the same guards. The fix was essentially “make fill_n match memset.”

4. std::bool_constant vs std::integral_constant<bool, ...> is a C++ version boundary. If your library supports C++14, you can’t use std::bool_constant (C++17). But you often don’t need either — inheriting from the trait directly is both shorter and version-agnostic.

5. Code review from the original author is invaluable. The person who wrote the optimization knows the type hierarchy intimately. Their one-line suggestion eliminated unnecessary indirection and made the intent clearer.

The MR

!2313: Guard eigen_fill_helper on trivially copyable scalars — closes #2956.

The Setup#

Why AutoDiffScalar Is Different#

The Regression#

The Fix#

The Review Iteration#

What We Learned#

The MR#