How to Solve Ax=b: Gaussian Elimination and Beyond

Solving Ax = b means finding the vector x that, when multiplied by matrix A, produces vector b. The method you choose depends on the size of A, whether A is square, and whether you need to solve the system once or many times. For most small to mid-sized problems, Gaussian elimination (or its close relative, LU decomposition) is the standard approach. For very large sparse systems, iterative methods take over.

Check Whether a Solution Exists

Before diving into algorithms, it helps to know what kind of answer to expect. For a square n × n matrix A, the system Ax = b has exactly one solution for every possible b when A is invertible. A matrix is invertible when it has n pivots, its columns are linearly independent, and its rank equals n. All of these conditions are equivalent: if one is true, they all are.

If A is not invertible (singular), the system either has no solution or infinitely many solutions, depending on the specific b. If A is not square, you’re dealing with an overdetermined or underdetermined system, which requires a different approach covered later in this article.

Gaussian Elimination: The Core Method

Gaussian elimination is the method most people learn first, and it remains the backbone of how computers solve linear systems. The idea is to transform the augmented matrix [A | b] into a simpler form where the answer can be read off directly. It works in two stages.

Forward Elimination

Start with row 1. Find the leftmost nonzero entry in that row or any row below it. If the current row has a zero in that position, swap it with a row below that doesn’t. This nonzero entry becomes your pivot. Then use row operations to eliminate all entries below the pivot: subtract a suitable multiple of the pivot row from each row beneath it. Move to the next row and repeat. When you’re done, the matrix is in echelon form, meaning all the nonzero entries form a staircase pattern from upper-left to lower-right, with zeros below each pivot.

Back Substitution

Starting from the bottom row of the echelon form, divide each row by its pivot value so that every pivot becomes 1. Then work upward: subtract multiples of each pivot row from the rows above it to eliminate entries above each pivot. The result is reduced row echelon form, and the rightmost column of the augmented matrix now contains your solution x.

For a concrete 3×3 system, this typically involves a handful of row operations. For an n × n system, the computational cost grows as O(n³), meaning that doubling the number of unknowns roughly multiplies the work by eight.

Why Pivoting Matters for Accuracy

Computers use floating-point arithmetic, which introduces tiny rounding errors at every step. These errors can snowball when the multipliers used during elimination are large. If a pivot value is very small compared to other entries in its column, dividing by it creates enormous intermediate values. When those large values later cancel out, you can lose significant digits of accuracy.

Partial pivoting solves this by rearranging rows at each step so the largest available entry in the column becomes the pivot. This keeps all the multipliers at or below 1 in magnitude, which prevents the blowup. Gaussian elimination with partial pivoting is backward stable in practice, meaning the computed solution is nearly as accurate as the problem allows. Every serious numerical library uses partial pivoting by default.

LU Decomposition: Solve Once, Reuse Many Times

LU decomposition is essentially Gaussian elimination repackaged for efficiency. Instead of working on the augmented matrix [A | b] directly, you factor A into two triangular matrices: a lower triangular matrix L (with 1s on its diagonal) and an upper triangular matrix U. The entries of L are exactly the multipliers that arise during Gaussian elimination, and U is the row-reduced upper triangular result.

Once you have L and U, solving Ax = b becomes two quick triangular solves:

Forward substitution: Solve Lc = b for c, working top to bottom.
Backward substitution: Solve Ux = c for x, working bottom to top.

Each triangular solve costs only O(n²), which is dramatically cheaper than the O(n³) factorization step. This matters when you need to solve Ax = b for many different b vectors with the same A. You pay the O(n³) cost once to get L and U, then each new right-hand side b costs only O(n²). In engineering simulations where the same system matrix appears with hundreds or thousands of different load vectors, this speedup is enormous.

Why You Should Never Invert the Matrix

A common first instinct is to compute A⁻¹ and then multiply: x = A⁻¹b. This works in theory but is a bad idea in practice. Computing the inverse requires more arithmetic than LU decomposition and introduces additional rounding errors at every extra step. The result is both slower and less accurate. Numerical computing libraries like NumPy, MATLAB, and Julia all provide dedicated solvers (such as numpy.linalg.solve) that use LU decomposition with partial pivoting internally. These are faster and more numerically stable than computing the inverse yourself.

Large Sparse Systems: Iterative Solvers

Direct methods like Gaussian elimination and LU decomposition work well for dense matrices up to a few thousand rows. But many real-world problems produce matrices with millions of unknowns where most entries are zero. Fluid dynamics simulations, circuit analysis, and structural engineering models all generate these sparse systems. Storing and factoring such matrices directly would require prohibitive amounts of memory and time, since the O(n³) cost becomes impractical.

Iterative methods start with an initial guess for x and refine it step by step until the residual (the difference between Ax and b) is small enough. The conjugate gradient method is the standard choice for systems where A is symmetric and positive definite. Each iteration only requires multiplying A by a vector, which is very fast when A is sparse because you skip all the zero entries. For problems with millions of unknowns, iterative solvers often converge in far fewer than n iterations, making them orders of magnitude faster than direct methods.

Non-Square Systems: Least Squares

When A has more rows than columns (an overdetermined system), there are more equations than unknowns, and typically no exact solution exists. Instead, you find the x that gets as close as possible, minimizing the total squared error between Ax and b.

The standard approach transforms the rectangular problem into a square one using the normal equations. Multiply both sides by A transposed to get (AᵀA)x = Aᵀb. The matrix AᵀA is square and (usually) invertible, so you can solve this system with any of the direct methods above. This gives the least squares solution, which is the x that minimizes the distance between Ax and b.

When A has more columns than rows (an underdetermined system), there are infinitely many solutions. The typical goal is to find the solution with the smallest magnitude. This is done by assuming x = Aᵀt, then solving (AAᵀ)t = b for t, and computing x = Aᵀt.

Choosing the Right Method

Small dense system, one right-hand side: Gaussian elimination with partial pivoting. This is what most library functions do behind the scenes.
Same A, many different b vectors: LU decomposition. Pay the factorization cost once and reuse it.
Large sparse symmetric system: Conjugate gradient or another iterative solver. These exploit the sparsity pattern to avoid storing or operating on zero entries.
Overdetermined system (more equations than unknowns): Least squares via the normal equations or QR decomposition.

In practice, if you’re writing code, you rarely implement these algorithms from scratch. Call numpy.linalg.solve in Python, A\b in MATLAB, or the equivalent in your language. These functions automatically choose partial pivoting and optimized factorization routines. The value of understanding the underlying methods is knowing which tool to reach for and recognizing when a system might be ill-conditioned, singular, or better suited to an iterative approach.