Lightweight Reinforcement Learning for Real-Time Lunar Landing Control

Shresht Prasad; Umme Rabab Syed; Muhammad Zaid Mohsin

Previous Article in event

PetSense: An Integrated System for Real-Time Cat Activity and Affective Monitoring

Next Article in event

Safe Robot Navigation through Low- and High-Risk Zones: Evaluation of A*, D*, and RRT Algorithms

Lightweight Reinforcement Learning for Real-Time Lunar Landing Control

Shresht Sanjay Prasad

¹,

Umme Rabab Syed

^{*

2},

Muhammad Zaid Zaid Mohsin

¹ University of Europe for Applied Sciences, 14469 Potsdam Campus, Germany
² School of Management Sciences, Ghulam Ishaq Khan Institute of Engineering Sciences and Technology, Topi, 23460, Pakistan

Academic Editor: Lucia Billeci

Published: 03 December 2025 by MDPI in The 6th International Electronic Conference on Applied Sciences session Computing and Artificial Intelligence

Abstract:

Autonomous lunar probe landing presents a complex control challenge due to limited sensor feedback, delayed communications, and dynamic terrain conditions. In this work, we present a lightweight and optimized reinforcement learning solution using a Deep Q-Network (DQN) agent trained on the LunarLander-v3 environment from the Gymnasium library. Our aim is to develop a model capable of precise, resource-efficient landings under constrained simulation settings.

The agent interacts with an 8-dimensional state space and 4 discrete action choices, learning through experience replay and an ε-greedy policy. We systematically evaluated the impact of neural network architecture (Tiny, Base, Wide, Deep) and conducted extensive hyperparameter tuning via grid search across learning rates, discount factors, and update rates. The best-performing configuration—128-128 Wide architecture, learning rate 0.0005, discount factor 0.99, soft update rate 0.01—demonstrated superior performance, achieving an average reward of 262.89 in 355.98 seconds of training.

Final testing revealed that training to a threshold of 250 reward yields a 93% landing success rate, outperforming both under-trained and over-trained agents in terms of efficiency and generalization. This result was validated across 100 test episodes, confirming consistent, high-accuracy autonomous landing behavior.

Our findings highlight the viability of deploying lightweight, well-tuned DQN agents for real-time lunar landing scenarios. The proposed approach serves as a scalable blueprint for future space robotics systems, bridging the gap between simulation and real-world feasibility. Future work will incorporate terrain complexity and uncertainty modeling to extend robustness in dynamic planetary environments.

Keywords: Reinforcement Learning; Deep Q-Networks; Autonomous Systems; Lunar Lander; Neural Networks; Gymnasium Environment; Space Robotics; Planetary Landing; AI-based Control Systems

43 Reads
0 Recommendations

Shresht Prasad

Umme Rabab Syed

Muhammad Zaid Mohsin