Residual Value Iteration Algorithm based on Function Approximation

wen Hu

doi:10.3390/mol2net-02-03871

Previous Article in event

Chemometrical analysis of structure-structure and structure-activity trends of cycloartane-based saponins in Astragalus genus

Previous Article in congress

Trajectory-pooled Spatial-temporal Structure of Deep Convolutional Neural Networks for Video Event Recognition

Next Article in event

Multiple Linear Regression Model of Thermolysin Inhibitors

Next Article in congress

Co-evolution importance on binding Hot-Spot prediction methods

Residual Value Iteration Algorithm based on Function Approximation

wen Hu

¹ Institute of Electronics and Information Engineering, Suzhou University of Science and Technology, Suzhou，Jiangsu
² Jiangsu Key Laboratory of Intelligent Building Energy Efficiency, Suzhou University of Science and Technology, Suzhou, Jiangsu
³ Suzhou Key Laboratory of Mobile Networking and Applied Technologies, Suzhou University of Science and Technology, Suzhou, Jiangsu

Published: 18 January 2017 by MDPI in MOL2NET'16, Conference on Molecular, Biomed., Comput. & Network Science and Engineering, 2nd ed. congress USEDAT-02: USA-Europe Data Analysis Training Program Workshop, Cambridge, UK-Bilbao, Spain-Miami, USA, 2016

https://doi.org/10.3390/mol2net-02-03871

Abstract:

With respect to the problem of unstable and slow convergence for traditional Value Iteration algorithm, we proposed an improved Residual Value Iteration Algorithm based on Function Approximation. The algorithm combines traditional Value Iteration algorithm and Value Iteration algorithm with Bellman residual, introduces weight factors and constructs new rules to update value function parameter vector. Theoretically, the new rule for updating value function parameter vector can guarantee the convergence of the algorithm and solve the unstable convergence problem of the traditional value iteration algorithm. In addition, the algorithm introduces a new factor, named forgotten factor, to speed up the convergence of the algorithm. Applying the proposed algorithm, Value Iteration algorithm and LSPI algorithm to the traditional Grid World problem, the experiment results show that the FARVI algorithm has a good performance and robustness to different scale problems.

Keywords: Reinforcement Learning; Value Iteration; Function Approximation; Gradient Descent; Bellman Residual

View Poster

145 Reads

wen Hu