Medical robots learn to perform tasks through a combination of methods: watching and imitating human surgeons, practicing in virtual simulations, and processing massive datasets of surgical video and sensor recordings. No single approach works alone. Most systems today rely on a layered training process where each method builds on the others, gradually moving a robot from basic motions in a virtual world to precise actions on real tissue.
It’s worth noting that every robotic surgery performed on a human today is still controlled by a surgeon in real time. Fully autonomous surgical robots exist only in research labs and animal studies. But the methods used to teach robots are advancing quickly, and understanding them explains where surgical robotics is headed.
Learning by Watching Surgeons
The most intuitive way to teach a medical robot is to show it what a skilled surgeon does and have it copy those movements. This approach, called imitation learning, starts with recording a surgeon’s hand motions during a procedure. The da Vinci surgical system, for example, captures 76 motion variables from its manipulators at 30 frames per second: tooltip positions, orientations, linear and rotational velocities, and gripper angles. All of this is synchronized with stereo video of the surgical field.
Researchers at Johns Hopkins created a widely used public dataset called JIGSAWS specifically for this purpose. It contains synchronized video and kinematic data from multiple surgeons performing the same robotic surgical tasks at varying skill levels. The robot can then be trained to reproduce the motions of the most skilled operators rather than averaging across all performers.
A newer technique eliminates the need for direct access to a robot’s internal motion data entirely. A system called SurgiPose estimates a surgical tool’s trajectory and joint angles by analyzing standard monocular video, comparing rendered images of where the tool should be against where it actually appears in the footage. This means robots could potentially learn from any recorded surgery, not just those performed on instrumented research platforms. When researchers compared policies trained on these video-estimated movements against policies trained on precise ground-truth measurements, the video-based approach proved viable for teaching real tasks.
Practicing in Virtual Simulations
Before a robot attempts anything on real tissue, it typically trains for thousands of hours in a simulated environment. These simulations model the physics of tissue, the behavior of surgical instruments, and the visual appearance of a surgical field. The robot tries actions, receives feedback on whether they succeeded, and gradually refines its approach through reinforcement learning.
The challenge is that simulations never perfectly match reality. Lighting looks different. Tissue doesn’t deform exactly the same way. Camera angles shift. Researchers bridge this gap using a technique called domain randomization: during training, the simulation deliberately varies lighting conditions, camera positions, and physics parameters so the robot can’t rely on any single visual cue. A study using the da Vinci Research Kit built a full pipeline connecting a 3D simulation to the physical robot, training it on a pushing task. The results showed that randomizing lighting, camera angles, and physics variables all played significant roles in helping the robot perform successfully in the real world, even through an endoscope view, which is particularly difficult because of the narrow field of vision and unusual angles.
Recognizing Surgical Steps in Real Time
Performing a surgery isn’t just about executing individual motions. A robot also needs to understand where it is in the overall procedure. Automated surgical phase recognition uses AI to segment a surgical workflow into its key events, identifying whether the surgeon is dissecting, clipping, cutting, or closing. This acts as a kind of situational awareness.
These recognition systems typically work by processing individual video frames through a neural network that extracts visual features, then feeding sequences of those features into a temporal model that understands how one step follows another. One approach samples frames at one per second and analyzes buffers of consecutive frames to classify what phase of surgery is currently underway. Researchers have trained these systems on datasets of cholecystectomy (gallbladder removal) videos, a procedure performed roughly 750,000 times per year in the United States, giving the AI a large volume of examples to learn from.
Phase recognition serves multiple purposes. For a semi-autonomous robot, it provides context for what action to take next. For surgical education, it enables automated video review and skill assessment. And for safety, it allows a system to flag when something deviates from the expected sequence.
Learning to Feel Through Force Feedback
Surgery requires more than vision. Surgeons rely heavily on the feel of tissue under their instruments to judge whether something is healthy, cancerous, or dangerously close to a blood vessel. Teaching robots to interpret this tactile information is one of the harder problems in the field.
Tactile sensors on robotic instruments can detect local mechanical properties of tissue: how compliant it is, how viscous, what its surface texture feels like. These properties serve as indicators of tissue health. The sensor data can either be fed back to a human operator as force feedback (so the surgeon “feels” what the robot touches) or processed by the robot’s own algorithms to guide autonomous decisions.
Current systems have real limitations here. In experiments using a modified da Vinci system for palpation tasks (pressing on tissue to find hard lumps), direct force feedback helped surgeons identify abnormalities, but only when the mechanical difference between the lump and surrounding tissue was large. Subtle differences were missed. This is an area where the gap between human touch sensitivity and robotic sensing remains wide.
How Autonomous Are Today’s Robots?
As of now, every robotic surgery on a human is teleoperated, meaning a surgeon controls every movement. There is not a single autonomous surgical system in clinical use. Autonomous robots have demonstrated success in simulations and simplified lab settings, handling tasks like peg transfer, needle pickup, and manipulating deformable objects. For more complex tasks like tying suture knots, researchers have turned to learning-based systems rather than pre-programmed instructions.
The most advanced demonstrations involve shared control, where a robot handles routine portions of a task while a human takes over for difficult moments. In experiments on porcine tissue, a shared control strategy improved cutting accuracy by 6.4% compared to pure manual control while reducing the operator’s active work time to 44% of what fully manual operation required. Pure autonomous control, by contrast, actually produced the worst tracking error in these tests, averaging 3.2 mm deviation compared to about 0.9 mm for manual or shared approaches. The takeaway: robots are best when they collaborate with surgeons rather than replace them entirely.
A recent milestone involved a learning-based system trained on videos of gallbladder removals that autonomously executed the procedure on pig cadavers. This was the first demonstration of a robot completing a full surgical procedure without human intervention, though it remains far from clinical application.
How Robots Share Knowledge Across Hospitals
One practical problem is that surgical data is scattered across hospitals, and patient privacy laws prevent simply pooling it all in one place. Federated learning offers a solution: robots at different institutions train on their local data, then share only the updated model parameters (not the patient data itself) with a central system. The central system combines these updates and sends an improved model back to each hospital. Private patient details never leave the institution, but every robot benefits from the collective experience.
How Regulators Keep Up
The FDA reviews medical devices, including AI-powered surgical systems, through pathways based on the level of risk a device poses to patients. But the agency has acknowledged that its traditional regulatory framework wasn’t designed for AI systems that adapt and improve over time. A robot trained through machine learning might change its behavior as it processes new data, which raises the question of whether each update requires a new regulatory review.
In January 2025, the FDA published draft guidance specifically addressing AI-enabled device software, proposing lifecycle management recommendations that cover how manufacturers should handle updates to learning-based systems after they reach the market. The goal is to support innovation without sacrificing the safety checks that keep patients protected.

