Please login first
Koopman-Certified Safe Actor–Critic Reinforcement Learning via Robust Projection in Lifted Space
, *
1  Multi-Robot and Control Systems Group, Automation and Systems Engineering Department, University of Sevilla, Seville, Spain
Academic Editor: Paolo Mercorelli

Abstract:

Reinforcement learning (RL) can achieve strong performance in nonlinear control, but exploration often violates safety constraints, which limits deployment in safety-critical systems. Koopman-based control offers a promising bridge between nonlinear dynamics and linear control tools, yet practical success depends on explicitly accounting for approximation errors; recent Koopman control theory highlights that bilinear surrogate models with finite-data, proportional error bounds enable rigorous closed-loop guarantees.

We propose a safe actor–critic framework in which a Koopman surrogate model is learned from data together with an error certificate that upper-bounds one-step prediction mismatch as a function of the lifted state and input magnitudes. Using this certificate, we construct a robust safety layer that computes, at every step, the nearest admissible action to the actor’s proposed action such that state and input constraints are satisfied for all model errors consistent with the bound. This safety layer is formulated as a tractable convex projection problem, guaranteeing constraint satisfaction whenever the certified safe action set is nonempty.

We interpret the safety layer as an additional “safety critic”. The filter’s intervention (the difference between proposed and applied actions) provides a direct learning signal that penalizes reliance on the filter, encouraging the actor to produce intrinsically safe actions while the standard critic optimizes task reward.

Simulation results on a nonlinear controlled system with intentional model mismatch show that robust projection eliminates training-time safety violations compared to unshielded and nominally shielded baselines. Moreover, incorporating the safety-critic signal substantially reduces safety interventions while maintaining achieved performance, demonstrating safe learning and improved policy self-safety without sacrificing control quality.

Keywords: Koopman operator; reinforcement learning; nonlinear control

 
 
Top