How to Read a KEGG Pathway Chart: Shapes and Colors

KEGG pathway charts use a consistent visual language of shapes, colors, lines, and identifiers to represent biological processes. Once you learn what each element means, even the most complex metabolic map becomes readable. Here’s how to decode every part of a KEGG chart, from the shapes on the map to the tools in the viewer.

What the Shapes Represent

Every object on a KEGG pathway map is drawn as a specific shape, and each shape tells you what type of molecule or process you’re looking at.

  • Rectangles (boxes): These represent gene products, most commonly enzymes or proteins. Each box is labeled with an enzyme commission (EC) number, a gene name, or a KO identifier (more on those below). When you see a row of boxes connected by lines, you’re looking at a series of enzymatic steps in a pathway.
  • Circles (small dots): These represent chemical compounds or metabolites. In metabolic pathway maps, circles are the substrates and products that enzymes act on. They’re labeled with C numbers from the KEGG COMPOUND database. For example, C00047 is L-lysine.
  • Rounded rectangles: These represent links to other KEGG pathway maps. They act as portals. Clicking one takes you to a related pathway, showing you how different biological processes connect to each other.

On the large global metabolism map, the visual convention shifts slightly. Circles still represent chemical compounds, but the lines (edges) between them represent sets of reactions rather than individual enzymatic steps. The global map combines roughly 120 individual metabolic pathway maps into one massive connected network, so it’s intentionally simplified.

What the Colors Mean

Color is one of the most important signals on a KEGG chart, and its meaning depends on which type of map you’re viewing.

KEGG maintains two versions of every pathway. The “reference” pathway map (prefixed with “map,” like map00140) is a generic template showing all known genes and reactions for that pathway across all organisms. The “organism-specific” map (prefixed with an organism code, like “hsa” for human) highlights which parts of the pathway actually exist in that species.

On an organism-specific map, boxes colored green indicate gene products that have been identified in that organism’s genome. A green box on hsa00140 (the human steroid hormone biosynthesis pathway) means a human gene has been linked to that enzymatic function. White boxes represent genes that exist in the reference pathway but have not been identified in the organism you’re viewing. This contrast is immediately useful: a pathway full of green boxes means the organism has most of the machinery for that process, while large stretches of white suggest gaps or alternative routes.

When you use KEGG’s mapping tools to overlay your own data (such as gene expression results from an experiment), additional colors appear. These custom colors typically follow whatever scheme the tool or the user applies, often red and blue gradients to indicate up- or down-regulation. If you see colors beyond green and white, someone has added experimental data on top of the default map.

How Lines and Arrows Work

The connections between shapes carry meaning too. Solid arrows with a standard arrowhead indicate a direct molecular interaction, typically a reaction where one compound is converted into another by an enzyme. The direction of the arrow shows the direction of the reaction or signal flow.

Dashed lines indicate indirect effects, meaning the connection involves one or more intermediate steps that aren’t shown on the current map. In signaling pathway maps (as opposed to metabolic maps), you’ll also encounter specific line endings: a flat bar at the end of a line indicates inhibition (one molecule blocks or suppresses another), while a standard arrowhead indicates activation or promotion.

Some lines connect to small “+p” or “+u” labels, indicating phosphorylation or ubiquitination events. These post-translational modifications are common in signaling pathways and change how a protein behaves. The line types are most varied and detailed on regulatory and signaling maps. Metabolic maps tend to use simpler arrow conventions since they mostly show substrate-to-product conversions.

Understanding K Numbers and Identifiers

The labels inside boxes and beside circles are KEGG’s identification system. The most important identifier to understand is the K number, which comes from the KEGG Orthology (KO) database.

A K number represents a functional ortholog: a molecular function that’s been defined in the context of a specific pathway or network. Most K numbers originate from experimentally characterized genes in one organism, then get extended to similar genes in other organisms based on sequence similarity. When KEGG annotates a genome, it assigns K numbers to individual genes rather than writing text descriptions. This means every box on a pathway map is a node in a larger computational network.

The practical value of K numbers is cross-species comparison. The same K number appears on the pathway map regardless of which organism you’re viewing. If K00001 catalyzes a particular reaction, you’ll see that identifier whether you’re looking at the human, mouse, or bacterial version of the pathway. The organism-specific coloring then tells you whether that organism actually has a gene assigned to that function.

Other identifiers you’ll encounter include C numbers for compounds (like C00047 for L-lysine), EC numbers for enzyme classifications, and R numbers for individual reactions. You can search for any of these in the viewer’s search box.

Reference Maps vs. Global Maps

KEGG offers several types of maps, and they’re read differently. Standard pathway maps show a single biological process in detail, with every enzymatic step, compound, and regulatory interaction laid out. These are the maps most researchers work with day to day.

Global metabolism maps take a zoomed-out approach. They combine about 120 individual metabolic pathway maps into one large connected network. On a global map, each circle is a compound and each connecting line is a “net-element,” a segment that may represent a single reaction or multiple reaction steps compressed together. About one-third of these net-elements actually contain hidden intermediate compounds that are visible only when you click through to the detailed pathway map underneath. Clicking a line on the global map opens the corresponding detailed pathway and highlights the specific genes and compounds involved.

This layered structure means you can start at the global level to see how, say, amino acid metabolism connects to the citric acid cycle, then drill into a specific pathway for the full enzymatic detail.

Using the Pathway Map Viewer

The KEGG Pathway Map Viewer has a few tools that make navigation easier once you know where they are.

The “Organism menu” lets you change which species the map displays. You enter an organism code (like “hsa” for human, “mmu” for mouse, or “eco” for E. coli), and the map recolors to show which genes exist in that genome. You can also enter multiple organism codes to compare species side by side, which is useful for identifying pathway components that are conserved or missing across organisms.

The search box lets you find specific objects on the map by their KEGG identifiers or aliases. You can use Ctrl+F (or Cmd+F on Mac) as a shortcut to jump to it. If you enter multiple IDs, the search uses OR logic, highlighting all matching objects. The plus sign next to the search box lets you input your own data using the KEGG Mapper Search tool format, which is how you overlay gene lists or expression data onto the map.

The “Change pathway type” button switches between different map representations without opening a new window. It works as a control panel for the main display, letting you toggle between reference maps, organism-specific maps, and other pathway types from one place.

Reading a Map Step by Step

When you open a KEGG pathway chart for the first time, start by checking the title and map number in the top-left corner. This tells you the biological process and whether you’re on a reference or organism-specific map. Next, scan the colored boxes. Green boxes are present in the selected organism; white boxes are not. Follow the arrows from left to right or top to bottom to trace the flow of the pathway, noting where compounds (circles) are consumed or produced.

Pay attention to rounded rectangles at the edges of the map. These are connections to other pathways, showing you where molecules feed into or arrive from related processes. If a section of the map looks dense and confusing, click individual boxes to open their detail pages, which list the specific genes, reactions, and cross-references associated with that node.

For signaling and disease pathway maps, the layout shifts from a linear metabolic flow to a more network-like structure. These maps emphasize regulatory relationships (activation, inhibition, gene expression changes) rather than chemical transformations. The line types and arrowheads become more varied and carry more of the essential information, so read the connections as carefully as the nodes themselves.