PDP: Parallel Dynamic Programming
Fei-Yue Wang, Fellow, IEEE, Jie Zhang, Member, IEEE, Qinglai Wei, Member, IEEE, Xinhu Zheng, Student Member, IEEE, and Li Li, Fellow, IEEE
Institute of Automation, Chinese Academy of Sciences, NationalUniversity of Defense Technology, Qingdao Academy of Intelligent Industries,Tsinghua University, China, University of Minnesota, USA
Abstract: Deep reinforcement learning is a focus research area inartificial intelligence. The principle of optimality in dynamic programming isa key to the success of reinforcement learning methods. The principle ofadaptive dynamic programming (ADP) is first presented instead of direct dynamicprogramming (DP), and the inherent relationship between ADP and deepreinforcement learning is developed. Next, analytics intelligence, as the necessaryrequirement, for the real reinforcement learning, is discussed. Finally, theprinciple of the parallel dynamic programming, which integrates dynamicprogramming and analytics intelligence, is presented as the futurecomputational intelligence.
Index Terms: Parallel dynamic programming, Dynamic programming, Adaptive dynamicprogramming, Reinforcement learning, Deep learning, Neural networks, Artificialintelligence.
Citation: F.-Y. Wang, J. Zhang, Q. L. Wei, X. H. Zheng, and L. Li, “PDP: paralleldynamic programming,” IEEE/CAA Journal of Automatica Sinica, vol. 4, no. 1, pp.1-5, Jan. 2017.
Fig. 1. ADP structure.
Fig. 2. The HDP structure diagram.
Fig. 3. Deep neural network structure of HDP.
Fig. 4. The ACP approach of descriptive, predictive, and prescriptive analytics.
Fig. 5. Parallel dynamic programming with three parallel systems.