With respect to the problem of unstable and slow convergence for traditional Value Iteration algorithm, we proposed an improved Residual Value Iteration Algorithm based on Function Approximation. The algorithm combines traditional Value Iteration algorithm and Value Iteration algorithm with Bellman residual, introduces weight factors and constructs new rules to update value function parameter vector. Theoretically, the new rule for updating value function parameter vector can guarantee the convergence of the algorithm and solve the unstable convergence problem of the traditional value iteration algorithm. In addition, the algorithm introduces a new factor, named forgotten factor, to speed up the convergence of the algorithm. Applying the proposed algorithm, Value Iteration algorithm and LSPI algorithm to the traditional Grid World problem, the experiment results show that the FARVI algorithm has a good performance and robustness to different scale problems.
Previous Article in event
Previous Article in congress
Next Article in event
Next Article in congress
Residual Value Iteration Algorithm based on Function Approximation
Published:
18 January 2017
by MDPI
in MOL2NET'16, Conference on Molecular, Biomed., Comput. & Network Science and Engineering, 2nd ed.
congress USEDAT-02: USA-Europe Data Analysis Training Program Workshop, Cambridge, UK-Bilbao, Spain-Miami, USA, 2016
Abstract:
Keywords: Reinforcement Learning; Value Iteration; Function Approximation; Gradient Descent; Bellman Residual