The Asymmetry
While working on Eigen’s Householder module, I noticed something odd. The HouseholderSequence class has two paths for applying a sequence of reflectors to a matrix:
// Left: H * M — has a blocked (BLAS-3) path
void applyThisOnTheLeft(Dest& dst, Workspace& workspace) const {
if (m_length >= BlockSize && dst.cols() > 1) {
// blocked path using compact WY representation
...
} else {
// scalar fallback: apply reflectors one at a time
}
}
// Right: M * H — scalar only
void applyThisOnTheRight(Dest& dst, Workspace& workspace) const {
for (Index k = 0; k < m_length; ++k) {
dst.rightCols(...)
.applyHouseholderOnTheRight(...); // one reflector at a time
}
}
The left-side path groups reflectors into panels of 48 and applies them as BLAS-3 matrix-matrix multiplies. The right-side path applies each reflector individually — a rank-1 update, which is a BLAS-2 operation.
This matters because M * Q (right-side application) shows up in eigensolvers, Schur decompositions, and any code that needs to apply Q to a matrix from the right.
The Math
A Householder reflector is H_k = I - tau_k * v_k * v_k^*. A sequence of n reflectors can be represented in compact WY form as:
P = I - V * T * V^*
where V is unit-lower-triangular (columns are the Householder vectors) and T is upper-triangular (built from the tau coefficients).
Left application is straightforward:
P * A = A - V * T * (V^* * A)
Eigen already implements this as three matrix-matrix operations: tmp = V^* * A, then tmp = T * tmp, then A -= V * tmp.
Right application follows the same pattern:
A * P = A - (A * V) * T * V^*
Three matrix-matrix operations again: tmp = A * V, then tmp = tmp * T, then A -= tmp * V^*.
The key insight: both directions have the same computational structure and the same BLAS-3 character. There’s no mathematical reason the right side should be slower.
The Implementation
The core function is apply_block_householder_on_the_right:
template <typename MatrixType, typename VectorsType, typename CoeffsType>
void apply_block_householder_on_the_right(
MatrixType& mat, const VectorsType& vectors,
const CoeffsType& hCoeffs, bool forward)
{
Index nbVecs = vectors.cols();
Matrix<Scalar, TFactorSize, TFactorSize, RowMajor> T(nbVecs, nbVecs);
if (forward)
make_block_householder_triangular_factor(T, vectors, hCoeffs);
else
make_block_householder_triangular_factor(T, vectors, hCoeffs.conjugate());
const TriangularView<const VectorsType, UnitLower> V(vectors);
// A -= A * V * T * V^*
auto tmp = mat * V;
if (forward)
tmp = (tmp * T.triangularView<Upper>()).eval();
else
tmp = (tmp * T.triangularView<Upper>().adjoint()).eval();
mat.noalias() -= tmp * V.adjoint();
}
Then applyThisOnTheRight uses it the same way the left side does — iterating over blocks of reflectors:
if (m_length >= BlockSize && dst.rows() > 1) {
Index blockSize = ...;
for (Index i = 0; i < m_length; i += blockSize) {
// extract sub-vectors and sub-destination for this block
auto sub_dst = dst.rightCols(dstCols);
apply_block_householder_on_the_right(
sub_dst, sub_vecs, m_coeffs.segment(k, bs), !m_reverse);
}
}
The Subtle Bug
Getting this right required understanding a non-obvious detail about the blocking direction.
The left-side blocked path iterates blocks in reverse order (last block first, working backward). This is because left-multiplication by H_0 * H_1 * ... * H_{n-1} applies H_{n-1} first to the matrix, then H_{n-2}, etc. The blocked version groups these and processes from right to left.
For right-multiplication, M * H_0 * H_1 * ... * H_{n-1}, the first reflector applied to M is H_0, then H_1, etc. So the blocks must iterate forward — the opposite direction from the left side.
I initially copied the left-side iteration order and spent a while debugging why matrices of size > 48 (the BlockSize threshold) produced garbage while smaller ones were fine. The fix was flipping the loop direction:
// Left-side loop (backward):
Index end = m_reverse ? min(m_length, i + blockSize) : m_length - i;
Index k = m_reverse ? i : max(0, end - blockSize);
// Right-side loop (forward):
Index end = m_reverse ? m_length - i : min(m_length, i + blockSize);
Index k = m_reverse ? max(0, end - blockSize) : i;
Bonus: Resolving Two FIXMEs
While in BlockHouseholder.h, I also resolved two long-standing FIXMEs. The original code had scalar loops working around Eigen’s triangular product not supporting in-place operation:
// FIXME: use .noalias() once triangular product supports in-place operation.
for (Index j = nbVecs - 1; j > i; --j) {
typename TriangularFactorType::Scalar z = triFactor(i, j);
triFactor(i, j) = z * triFactor(j, j);
if (nbVecs - j - 1 > 0)
triFactor.row(i).tail(nbVecs-j-1) += z * triFactor.row(j).tail(nbVecs-j-1);
}
The workaround is simpler than the FIXME suggests — just call .eval() to force evaluation into a temporary before assignment:
triFactor.row(i).tail(rt) =
(triFactor.row(i).tail(rt) *
triFactor.bottomRightCorner(rt, rt).triangularView<Upper>()).eval();
Same fix for the second FIXME in apply_block_householder_on_the_left.
Testing
All 62 tests across the Householder/QR/Schur/Eigensolver test suites pass, including the existing right-side application tests that verify M * hseq == M * hseq_mat for random matrices up to 320x320.
The MR is !2341, closing issue #3057.