Koopman-Certified Safe Actor&ndash;Critic Reinforcement Learning via Robust Projection in Lifted Space

Francisco Maldonado; José Acosta

Previous Article in event

Energy-Optimal Control of Singular Continuous-Time Linear Systems Using the Weierstrass–Kronecker Decomposition.

Next Article in event

Fixed-Time Adaptive Stabilization of Underactuated Euler–Lagrange Systems with Certified Internal Dynamics

Koopman-Certified Safe Actor–Critic Reinforcement Learning via Robust Projection in Lifted Space

Francisco Javier Maldonado

José Ángel Acosta

¹ Multi-Robot and Control Systems Group, Automation and Systems Engineering Department, University of Sevilla, Seville, Spain

Academic Editor: Paolo Mercorelli

Published: 04 June 2026 by MDPI in The 2nd International Online Conference on Mathematics and Applications session Control Theory and Mechanics

Abstract:

Reinforcement learning (RL) can achieve strong performance in nonlinear control, but exploration often violates safety constraints, which limits deployment in safety-critical systems. Koopman-based control offers a promising bridge between nonlinear dynamics and linear control tools, yet practical success depends on explicitly accounting for approximation errors; recent Koopman control theory highlights that bilinear surrogate models with finite-data, proportional error bounds enable rigorous closed-loop guarantees.

We propose a safe actor–critic framework in which a Koopman surrogate model is learned from data together with an error certificate that upper-bounds one-step prediction mismatch as a function of the lifted state and input magnitudes. Using this certificate, we construct a robust safety layer that computes, at every step, the nearest admissible action to the actor’s proposed action such that state and input constraints are satisfied for all model errors consistent with the bound. This safety layer is formulated as a tractable convex projection problem, guaranteeing constraint satisfaction whenever the certified safe action set is nonempty.

We interpret the safety layer as an additional “safety critic”. The filter’s intervention (the difference between proposed and applied actions) provides a direct learning signal that penalizes reliance on the filter, encouraging the actor to produce intrinsically safe actions while the standard critic optimizes task reward.

Simulation results on a nonlinear controlled system with intentional model mismatch show that robust projection eliminates training-time safety violations compared to unshielded and nominally shielded baselines. Moreover, incorporating the safety-critic signal substantially reduces safety interventions while maintaining achieved performance, demonstrating safe learning and improved policy self-safety without sacrificing control quality.

Keywords: Koopman operator; reinforcement learning; nonlinear control

9 Reads
0 Recommendations

Francisco Maldonado

José Acosta