Toward Robust RL for Autonomous Driving: Lessons from DQN and A2C

Muhammad Hamza Amin

Previous Article in event

Dynamic Analysis for Optimal Power Flow of Wind, Solar PV and BESS-Based Short-Term Hydro-Thermal Scheduling for Tri-Objective Operation Using Hybrid Adaptive Encoding Learning, Artificial Bee Colony, and NSGA-II

Next Article in event

Quantitative Modeling of Teen Cybersecurity Awareness in Gamified Learning

Toward Robust RL for Autonomous Driving: Lessons from DQN and A2C

Muhammad Hamza Amin

¹ Department of Artificial Intelligence, Ghulam Ishaq Khan Institute of Engineering Sciences and Technology, 23460 Topi, Pakistan.

Academic Editor: Marjan Mernik

Published: 04 June 2026 by MDPI in The 2nd International Online Conference on Mathematics and Applications session Mathematics, Computer Science and Artificial Intelligence

Abstract:

Reinforcement learning (RL) has emerged as a promising paradigm for autonomous control tasks, where agents must learn sequential decision-making in dynamic and uncertain environments. Among RL algorithms, Deep Q-Networks (DQN) and Advantage Actor-Critic (A2C) represent two widely used approaches—value-based and policy-based, respectively—each with distinct strengths and limitations. This study presents a comparative analysis of DQN and A2C applied to the challenging CarRacing-v3 environment, where agents must handle continuous dynamics such as steering, acceleration, and braking.

Both agents were trained using preprocessed input frames, which involved grayscale conversion, cropping, resizing, normalization, and frame stacking to capture temporal dependencies. DQN was implemented with experience replay, target networks, and Double DQN extensions, while A2C employed a shared convolutional encoder for actor and critic networks with entropy regularization to encourage exploration. Training progress was measured using average return, stability, and computational efficiency.

Results revealed that neither DQN nor A2C achieved consistently stable driving policies. DQN struggled with continuous control due to its discretized action formulation, resulting in fluctuating average returns and poor convergence (final average return ≈ –71.5). A2C, while better suited for continuous actions, also stagnated with limited learning progression (average return ≈ –72.4), suggesting inefficiencies in exploration and sensitivity to hyperparameters.

In conclusion, this study highlights the challenges of applying classical RL algorithms to high-dimensional autonomous driving tasks. The findings provide empirical insights into their trade-offs and point toward the need for hybrid or advanced methods—such as DDPG, TD3, or SAC—that combine stability, efficiency, and adaptability for real-world autonomous driving applications.

Keywords: Reinforcement learning; Deep Q-Network (DQN); Advantage Actor-Critic (A2C); Autonomous driving; CarRacing-v3

5 Reads
0 Recommendations

Muhammad Hamza Amin