What Is AI in Chemistry: Key Uses and Challenges

AI in chemistry refers to the use of machine learning algorithms and neural networks to solve chemical problems that would take humans or traditional computers far longer to work through. These tools predict molecular properties, design new molecules from scratch, plan how to synthesize compounds, and even run experiments autonomously. The global market for AI in chemicals was valued at $2.29 billion in 2025 and is projected to reach $28 billion by 2034, reflecting how rapidly the field is expanding across drug discovery, materials science, and industrial chemistry.

Predicting How Molecules Behave

One of the most straightforward uses of AI in chemistry is predicting molecular properties before anyone steps into a lab. Chemists need to know things like whether a molecule will dissolve in water, how toxic it might be, or how it interacts with light. Traditionally, answering these questions required either running physical experiments or performing expensive quantum mechanical calculations. AI models can estimate these properties in seconds by learning patterns from databases of known molecules.

The models range from classical machine learning approaches like random forests and support vector machines to more sophisticated graph neural networks that treat a molecule’s atoms and bonds as a connected network. Graph neural networks are particularly well suited to chemistry because they can “read” a molecule’s structure directly, learning how the arrangement of atoms influences its behavior. Researchers have tested over a dozen different architectures for this purpose, and the best ones achieve accuracy close to traditional quantum chemistry methods at a fraction of the computational cost.

Designing New Molecules From Scratch

Beyond predicting properties of existing molecules, AI can invent entirely new ones. This is called de novo molecular design, and it relies on generative models, the same broad category of AI behind image and text generators. Two architectures dominate this space. Variational autoencoders learn to compress molecules into a mathematical space and then generate new ones by sampling from that space, optimizing for properties like drug-likeness along the way. Generative adversarial networks use two competing neural networks: one generates candidate molecules while the other judges whether they look realistic, pushing the generator to produce increasingly plausible structures.

These tools have been used to design molecules with specific physical properties, to create candidates for photovoltaic materials, and to propose drug-like compounds that target particular biological pathways. One research group combined a generative adversarial network with gene expression data, allowing the AI to design molecules predicted to trigger a specific biological response in cells. The practical challenge is that only a small fraction of AI-generated molecules end up being novel, synthesizable, and effective. When sampling from a variational autoencoder’s mathematical space, validity rates for generated molecules can be as low as 0.7% to 7.2%, depending on the model architecture.

Planning Chemical Synthesis

Knowing what molecule you want is only half the problem. Figuring out how to actually build it from available starting materials is a puzzle that can stump experienced chemists, especially for complex structures. AI-powered retrosynthesis tools work backward from the target molecule, breaking it down step by step into simpler precursors until reaching compounds you can buy off the shelf.

Several approaches exist. Some systems use thousands of human-coded reaction rules. Chematica, one of the most established platforms, accumulated over 75,000 such rules over more than a decade. Newer tools take a leaner approach: SynRoute, for example, uses just 263 general reaction templates paired with machine learning classifiers trained on large patent-derived reaction databases. For each possible reaction step, a neural network evaluates whether the proposed transformation would actually work in a real lab. A separate algorithm then stitches together the highest-scoring steps into complete synthesis plans, ranked by practicality for the chemist to review.

A third strategy skips predefined templates entirely. Transformer models, the same architecture behind modern language AI, are trained on millions of published reactions to predict reactants from products (or vice versa) purely from learned patterns. This template-free approach can propose reactions that no human expert explicitly programmed, potentially uncovering unconventional synthetic routes.

Speeding Up Quantum Chemistry

Quantum chemistry calculations model the behavior of electrons in molecules, providing highly accurate predictions of energy, shape, and reactivity. The problem is computational cost: these calculations scale steeply with the number of atoms, making them impractical for screening thousands of candidates. AI offers a shortcut.

Machine learning models can be trained on a set of quantum chemistry results and then predict the same properties for new molecules without repeating the full calculation. One tool called DeePEST uses deep learning to predict a molecule’s energy and the forces on its atoms, enabling rapid geometry optimization (finding the molecule’s most stable 3D shape). From that optimized shape, the system predicts quantum properties like orbital energies and carbon chemical shifts. The result approaches the accuracy of rigorous quantum methods while running at speeds comparable to much simpler molecular mechanics calculations. This makes it feasible to screen large libraries of molecules for properties that previously required days of supercomputer time per compound.

Self-Driving Laboratories

The most ambitious application of AI in chemistry is the self-driving lab, where algorithms don’t just suggest experiments but actually carry them out. These closed-loop systems follow a four-step cycle: design, make, test, and analyze. An AI plans an experiment based on what it wants to learn, robotic equipment synthesizes the compounds and runs measurements, analytical instruments collect the results, and the AI interprets the data to decide what experiment to run next.

Orchestration software like ChemOS coordinates the entire process, scheduling experiments and selecting future ones based on machine learning analysis of prior results. It is designed to be hardware-agnostic, meaning it can control different types of robotic synthesizers and analytical instruments. After a batch of reactions finishes, samples are automatically diluted and fed into instruments like liquid chromatography-mass spectrometry systems for characterization. The synthesized compounds then move through flow cells for further testing, all without a human needing to intervene between cycles. These labs can run around the clock, exploring chemical space far faster than a team of researchers working manually.

What Holds AI in Chemistry Back

For all its promise, AI in chemistry faces real obstacles, and most of them come down to data. Machine learning models, especially deep learning ones, are data-hungry. Chemistry, unlike fields such as natural language processing, doesn’t have billions of freely available, consistently formatted data points to train on. Experimental measurements are scattered across thousands of journals, recorded in different formats, under different conditions, and often locked behind proprietary walls. Pharmaceutical companies in particular hold vast datasets they can’t share due to intellectual property concerns, creating data silos that limit what any single AI model can learn from.

Even when data exists, quality is uneven. Labels may be missing, measurements may not be standardized, and negative results (experiments that didn’t work) are rarely published. This means AI models often train on a skewed picture of chemistry. Deep learning models trained on insufficient data can fail dramatically when asked to predict properties of molecules that look different from their training set. And generative models sometimes propose molecules that closely resemble existing known compounds rather than producing genuinely novel structures, raising questions about whether they’re truly creating or just remixing.

The lack of agreed-upon benchmarks makes it difficult to compare tools fairly. Two research groups might report impressive accuracy numbers, but if they tested on different datasets with different criteria, the comparison is meaningless. Standardizing how the field evaluates AI chemistry tools remains an ongoing challenge.