A positive semidefinite matrix is a symmetric matrix where every possible “weighted combination” of its entries produces a value that’s zero or positive, never negative. More precisely, for any vector x, the expression x^T A x ≥ 0. This property shows up constantly in statistics, optimization, and machine learning because it captures something fundamental: the matrix doesn’t introduce negative values where they shouldn’t exist. Think of it as a matrix that always “plays nice” with the data it transforms.
The Core Definition
A symmetric matrix A is positive semidefinite (often abbreviated PSD) if for every vector x, the quantity x^T A x is greater than or equal to zero. That expression, x^T A x, is called a quadratic form, and it takes a vector and returns a single number. If that number can never be negative regardless of which vector you plug in, the matrix is positive semidefinite.
A simple example: the 2×2 identity matrix. Pick any vector, and x^T I x just gives you the sum of squared components, which is always non-negative. A less obvious example: a matrix with a zero eigenvalue, like [[1, 0], [0, 0]]. The quadratic form here equals x₁², which is zero when x₁ = 0 but never negative. That matrix is positive semidefinite but not positive definite, and the distinction matters.
Equivalent Ways to Identify a PSD Matrix
You don’t always need to check the quadratic form directly. Several equivalent conditions all guarantee the same thing:
- Eigenvalues: All eigenvalues are greater than or equal to zero. This is often the most intuitive test. If even one eigenvalue is negative, the matrix is not PSD.
- Principal minors: Every principal minor (the determinant of every square submatrix formed by deleting matching rows and columns) is non-negative. This is the semidefinite version of Sylvester’s criterion.
- Factorization: The matrix can be written as R^T R for some matrix R, which may be singular or even non-square. This factorization is one reason PSD matrices appear so naturally in practice: any time you multiply a matrix’s transpose by itself, you get something positive semidefinite.
One subtle trap with the minor test: for positive definite matrices, you only need to check the leading principal minors (the upper-left submatrices). For positive semidefinite matrices, that shortcut fails. The matrix [[0, 0], [0, -1]] has all leading principal minors equal to zero, which looks fine, but the quadratic form is -x₂², which goes negative. You need to check all principal minors, not just the leading ones.
Positive Semidefinite vs. Positive Definite
The difference comes down to whether zero is allowed. A positive definite matrix requires x^T A x to be strictly greater than zero for every nonzero vector x. A positive semidefinite matrix relaxes this: x^T A x can equal zero even when x isn’t the zero vector. In eigenvalue terms, positive definite means all eigenvalues are strictly positive, while positive semidefinite allows some eigenvalues to be zero.
This has geometric consequences. A positive definite matrix defines an ellipsoid: the set of points where x^T A x = 1 forms a bounded, egg-shaped surface whose axes have lengths 1/√λ for each eigenvalue λ. When an eigenvalue hits zero, that axis stretches to infinity and the ellipsoid degenerates. The shape flattens into a lower-dimensional object. A positive semidefinite matrix with one zero eigenvalue in 3D, for instance, describes an infinitely long elliptical cylinder rather than a closed ellipsoid.
Positive definite matrices are always invertible. Positive semidefinite matrices can be singular, which is exactly the situation when one or more eigenvalues equal zero.
Why Covariance Matrices Are Always PSD
Every covariance matrix is positive semidefinite, and the reason is elegant. A covariance matrix Σ can be written as E[(X – μ)(X – μ)^T], where X is a random vector and μ is its mean. When you compute a^T Σ a for any vector a, you’re essentially computing the expected value of a squared quantity: E[Y²], where Y = a^T(X – μ). A squared number can’t be negative, and the average of non-negative numbers can’t be negative. So a^T Σ a ≥ 0 for any choice of a.
This property has practical consequences. If you’re building a covariance matrix from data and numerical errors push it slightly away from being PSD (which happens with rounding or incomplete data), downstream calculations can break. Algorithms that assume PSD structure, like those requiring a Cholesky decomposition, will fail or produce meaningless results. That’s why data pipelines often include a step to “fix” near-PSD matrices by clamping small negative eigenvalues to zero.
The Role in Optimization and Convexity
Positive semidefiniteness is the mathematical signature of convexity. A twice-differentiable function is convex if and only if its Hessian matrix (the matrix of second derivatives) is positive semidefinite everywhere. Convexity matters because convex functions have no local minima that aren’t also global minima, which makes optimization straightforward.
When you minimize a function with a PSD Hessian, gradient-based methods are guaranteed to converge to the true minimum. If the Hessian is positive definite (not just semidefinite), the minimum is unique. If it’s only semidefinite, there may be a flat region of equally good solutions, but no false valleys to get trapped in. This distinction shows up in least-squares problems, regularized regression, and nearly every convex optimization routine used in engineering and data science.
PSD Matrices in Machine Learning
Kernel methods, including support vector machines (SVMs), rely fundamentally on positive semidefiniteness. A kernel function k(u, v) measures similarity between two data points, and for it to be valid, the matrix of all pairwise kernel values (called the Gram matrix or kernel matrix) must be positive semidefinite for every possible set of training points. This requirement is known as Mercer’s condition.
The reason is structural. A valid kernel function implicitly maps data into a higher-dimensional space where k(u, v) equals the dot product of the mapped points: k(u, v) = φ(u) · φ(v). The kernel matrix K then equals V^T V, where V stacks up the mapped feature vectors. Any matrix of the form V^T V is automatically PSD, because z^T V^T V z = (Vz)^T(Vz) ≥ 0 for any z. This guarantee ensures the SVM optimization problem is concave in its dual form, meaning it has a clean, solvable structure with no spurious local optima.
Cholesky Decomposition and Computation
The Cholesky decomposition factors a matrix as A = LL^T, where L is a lower triangular matrix. For positive definite matrices, this decomposition always exists and is unique, with all diagonal entries of L strictly positive. It’s roughly twice as fast as a general matrix factorization, which makes it a go-to tool for solving linear systems involving symmetric positive definite matrices.
Positive semidefinite matrices also have Cholesky factorizations, but with a caveat: the diagonal of L can include zeros, and the factorization may not be unique. In practice, algorithms often use a pivoted variant of Cholesky decomposition that handles the zero eigenvalues gracefully. Attempting a standard Cholesky decomposition is itself a common way to test whether a matrix is positive semidefinite: if the algorithm completes without encountering a negative value under a square root, the matrix passes.

