Efficient RL Algorithm by Combing AC with Dual Piecewise Model Learning

Shan Zhong; Quan Liu; Qiming Fu

doi:10.3390/mol2net-02-03895

Previous Article in event

Fuzzy Membership Roster Method based Selection Rule for Parameter Reductio

Next Article in event

Study of the functional properties of the corn flour proteins (Zea mays), barley (Hordeum vulgare), quinoa (Chenopodium quinoa), potato (Solanum tuberosum), and wheat (Triticum aestivum) national and imported intended for use in baking and noodles

Next Article in congress

Complex Network Analysis of General Tax Law

Efficient RL Algorithm by Combing AC with Dual Piecewise Model Learning

Shan Zhong

^{*

1, 2, 3},

Quan Liu

^{*

1, 4, 5},

Qiming Fu

^{5, 5, 6}

¹ School of Computer Science and Technology, Soochow University, Suzhou, Jiangsu, 215006
² School of Computer Science and Engineering, Changshu Institute of Technology, Changshu, 215500
³ Jiangsu Province Key Laboratory of Intelligent Building Energy Efficiency, Suzhou University of Science and Technology, Suzhou, Jiangsu, 215006
⁴ Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing, 210000
⁵ Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, 130012
⁶ College of Electronic & Information Engineering, Suzhou University of Science and Technology, Jiangsu, Suzhou, 215006

Published: 24 January 2017 by MDPI in MOL2NET'16, Conference on Molecular, Biomed., Comput. & Network Science and Engineering, 2nd ed. congress USEDAT-02: USA-Europe Data Analysis Training Program Workshop, Cambridge, UK-Bilbao, Spain-Miami, USA, 2016

https://doi.org/10.3390/mol2net-02-03895

Abstract:

As classic methods for handling continuous action space problem for continuous action space problem in RL, the actor-critic (AC) algorithm and its variants still fail to be sample efficiency. Therefore, we propose a method based on learning two linear models for planning. The two linear models refers to state-based piecewise model and action-based piecewise model, which are determined by the divisions for the state and action space, respectively. Through division, the models are learned more accurately. To accelerate the convergence, the sample near the goal is saved and used to learn the model, the value and the policy to balance the distribution of the samples. On two classic RL benchmarks with continuous MDPs, the proposed method shows the ability of learning an optimal policy by combing both models, and it also outperforms the representative methods in terms of convergence rate and sample efficiency.

Keywords: reinforcement learning; model learning; planning; linear approximation

View Poster

159 Reads

Shan Zhong

Quan Liu

Qiming Fu