By Elijah Chan
Ten years ago, a control engineer could write down a system’s equations, derive a
mathematical proof of stability, and sleep in peace knowing that the math guaranteed safety and predictability. Think of your car’s cruise control—it measures your speed, compares it to your desired target, and automatically adjusts the throttle to keep you on the road. Systems like that are the work of control engineers, who design and understand the math inside and out so they
can guarantee the car will not drive unpredictably. Today, things don’t work like that anymore. Machine learning algorithms are becoming increasingly popular. Meanwhile, we have hit a bottleneck in the validation capability needed to keep these autonomous systems safe.
Traditional systems, such as a car’s cruise control, an airplane’s autopilot, or a rocket’s guidance system, can be analyzed precisely using mathematical tools like Lyapunov analysis. Inside every controller lies a mathematical description of the system’s physics, and as long as the model is correct, the system behaves predictably. Learning-based controllers take a very different path. Instead of relying on equations, it learns through trial and error. It behaves like a
black box: powerful, versatile, yet fundamentally unpredictable. Unlike traditional systems, we cannot look inside and explain why a controller behaves a certain wa — we only know that it was rewarded for doing so in the past. It’s like training a dog to detect contraband: it doesn’t understand chemistry, but it’s pretty good at it, until it encounters something that it wasn’t trained
for.
Figure 1
Comparison Between Traditional Control and Learning-Based Control

Note: Traditional control follows a model-driven process. In contrast, learning-based control relies on training to maximize rewards, resulting in empirically strong but less predictable controllers.
Source: Original figure by author.
Because learning-based controllers are incredibly good at handling complex
environments, engineers are working to make them suitable for deployment in the real world. Current validation of learning-based systems depends on massive simulation and testing. Self-driving car companies, for example, simulate billions of virtual miles to expose their algorithms to rare edge cases. However, testing can only reveal problems, not prove their absence. In the case of self-driving cars, if the controller becomes unreliable, the system may stop and pull over to the side of the road. Other applications however, such as aircraft autopilot systems, must remain reliable at all times. This has been a significant impediment to the use of
learning-based systems in safety-critical industries.
Figure 2
An example of real-world learning-based decision making implementation in self-driving vehicles

Source: Timo Wielink on Unsplash
To address the reliability challenge, researchers are exploring two promising approaches to restore the level of trust we used to have in traditional controllers. The first focuses on formal verification, essentially bringing back the mathematical rigor of traditional control (Landers & Doryab, 2023). In this approach, engineers treat a learned policy as a series of mathematical functions rather than a black box. They then use tools like reachability analysis (which checks whether the system could ever enter an unsafe state) and control-barrier or Lyapunov functions
to prove that the controller will stay within a set safe limit for all possible conditions.
Additionally, because formal verification is difficult to scale to large systems, many
researchers are exploring layered safety structures. In this approach, the learned controller runs under the supervision of a verified model-based controller. The learned controller’s actions are then filtered and can be intervened if dangerous behavior is predicted. These systems — often referred to as shielded — allow learning-based controllers to handle complex tasks while maintaining a hard safety backstop (Brunke et al., 2022).
Together, these efforts reflect a broader shift in the research community’s priorities from maximizing performance to ensuring reliability. This shift is evident in recent data showing that research on safe reinforcement learning has accelerated dramatically over the past decade. According to Scopus keyword analysis, publications mentioning safe reinforcement learning within control applications have grown from virtually none before 2010 to more than 1200
papers per year by 2025 (see Figure 3). Despite this progress, the pace of safety research still lags behind performance breakthroughs, leaving a gap between what learning-based controllers can achieve and what we can confidently trust them to do.
Figure 3
Growth of Reinforcement Learning Research in Control Systems (2000–2025)

Note. Data are derived from Scopus keyword analytics (November 2025). “General RL in Control” includes publications containing the terms “reinforcement learning” and “control”; “Safe RL in Control” includes those also mentioning “safe” or “safety” in the abstract. Source: Author analysis of Scopus data.
Looking ahead, learning-based controllers will start being implemented more frequently in our lives. The next breakthroughs in autonomous systems will not be shaped by how fast these new systems can learn, but by how safely they can act. Although safety-focused learning-based controller research is gaining traction, it still lags behind the pace of performance-oriented studies. To accomplish this goal, we need the same level of public excitement, funding, and recognition for safety advances that we currently reserve for performance breakthroughs.
References:
Landers, Matthew, & Doryab, Afsaneh. Deep Reinforcement Learning Verification: A Survey. ACM Computing Surveys, 55 (14s). Retrieved from https://par.nsf.gov/biblio/10451446.
https://doi.org/10.1145/3596444
Brunke, L., Greeff, M., Hall, A. W., Yuan, Z., Zhou, S., Panerati, J., & Schoellig, A. P. (2022). Safe learning in robotics: From learning-based control to safe reinforcement learning. Annual Review of Control, Robotics, and Autonomous Systems, 5, 411–444.
https://doi.org/10.1146/annurev-control-042920-020211
Elsevier. (2025, November). Scopus publication analytics for “reinforcement learning” AND control AND (safety OR safe), 2000–2025 [Data set]. Scopus.
https://www.scopus.com