What Is a Corpus? From Linguistics to the Human Body

A corpus is simply a body of something. The word comes directly from Latin, where “corpus” literally means “body,” and it entered English in the late 1300s. Today it shows up across wildly different fields, from linguistics to medicine to artificial intelligence, but the core idea is always the same: a collected whole made up of many parts. Which meaning matters to you depends on context, so here’s a breakdown of the most common uses.

The Latin Root Behind Every Meaning

In Latin, corpus referred both to a physical human body and to any organized collection of material. Both senses were already in play before the word crossed into English. By the mid-1400s, English writers used “corpus” to mean a person’s body, and by 1727, the broader sense of “a collection of facts or things” had taken hold. That double meaning is why doctors, linguists, and data scientists all use the same word today. The plural form is “corpora.”

Corpus in Linguistics

In language research, a corpus is a large, structured collection of real-world text (or speech) assembled for analysis. Linguists use corpora to study how words are actually used rather than how grammar books say they should be used. A corpus can reveal shifting word frequencies, emerging slang, regional differences in phrasing, and patterns that would be invisible in a smaller sample.

Several major corpora illustrate the range. The Corpus of Contemporary American English (COCA) contains one billion words balanced across genres and draws from texts published since 1990. The Corpus of Historical American English (COHA) holds nearly half a billion words and lets researchers trace how the language changed over centuries. The News on the Web corpus is even larger at 20 billion words and was last updated in late 2024, making it one of the most current snapshots of English available. Comparable collections exist for Spanish (about two billion words from 21 countries) and Portuguese (one billion words from four countries).

Corpora also come in specialized types. A monitor corpus collects the same kind of language at regular intervals so researchers can track vocabulary changes over time. A parallel corpus contains the same texts in two or more languages, which is invaluable for translation studies. Contrastive corpora are built specifically to compare varieties of a language, such as British versus American English.

Corpus in Artificial Intelligence

When AI researchers say “corpus,” they mean a training dataset, usually text, fed into a machine learning model. The quality, size, and composition of a corpus directly shape how well an AI system performs. A translation model trained on legal documents, for example, will handle contracts well but struggle with casual conversation.

One approach called Corpus Aware Training tags each piece of training data with information about where it came from, letting the model learn the quality, domain, and stylistic differences between sources directly from the data. During actual use, the model can then switch its behavior depending on what kind of output is needed. The technique highlights a broader principle: a corpus isn’t just fuel for an AI model. Its structure and metadata are tools in their own right.

Corpus Callosum: The Brain’s Bridge

In anatomy, “corpus” appears in several structures. The most well-known is the corpus callosum, a thick band of white matter fibers connecting the left and right halves of the brain. It sits deep in the center of the brain and is the largest single fiber pathway in the entire nervous system.

From front to back, the corpus callosum has four distinct sections: the rostrum, genu, body, and splenium. Each section connects different brain regions. The front portion (genu) links the two frontal lobes, handling higher-level thinking and motor planning. The rear portion (splenium) links the occipital lobes, which process vision. In between, fibers carry auditory and touch-related signals between the two hemispheres. This means the corpus callosum has a kind of map: cognition in the front, sensory processing in the back.

Its primary job is integrating information so your two brain hemispheres can coordinate. It transfers sensory, motor, and cognitive signals and plays an important role in refining movement and complex thinking as the brain matures. It also has an inhibitory function, preventing each hand from acting independently. When the corpus callosum is damaged, a rare condition called alien-hand syndrome can develop, where one hand seems to move on its own.

Corpus Luteum: A Temporary Hormone Factory

After ovulation, the empty follicle that released the egg transforms into a small, temporary structure called the corpus luteum. Its main job is producing progesterone, the hormone that thickens the uterine lining and creates a hospitable environment for a potential pregnancy.

The corpus luteum forms at the start of the luteal phase, which lasts about 14 days. If no pregnancy occurs, the corpus luteum breaks down roughly 10 days after ovulation, progesterone levels drop, and a menstrual period follows. If conception does occur, the hormone HCG signals the corpus luteum to keep producing progesterone for about 12 weeks, at which point the placenta takes over hormone production.

Corpus Cavernosum: Erectile Tissue

The corpus cavernosum refers to one of two tube-shaped chambers running along the top of the penile shaft, one on each side of the urethra, from the pubic bone to the head of the penis. These chambers are made of spongy erectile tissue that fills with blood during arousal, creating an erection. A third structure, the corpus spongiosum, surrounds the urethra underneath and serves a similar blood-filling role.

Other Uses of Corpus

You’ll also encounter “corpus” in legal and financial contexts. Habeas corpus, Latin for “you shall have the body,” is a legal principle requiring authorities to bring a detained person before a court. In trust law, the corpus is the principal amount of money or assets held in a trust, as distinct from any income those assets generate. In all of these cases, the word carries the same fundamental idea from its Latin origin: the main body of something, whether that something is a collection of text, a brain structure, or a pool of money.