Improving Reinforcement Learning with Ensembles of Different Learners

Open Access

Issue		MATEC Web Conf. Volume 370, 2022 2022 RAPDASA-RobMech-PRASA-CoSAAMI Conference - Digital Technology in Product Development - The 23^rd Annual International RAPDASA Conference joined by RobMech, PRASA and CoSAAMI


Article Number		07008
Number of page(s)		13
Section		Pattern Recognition
DOI		https://doi.org/10.1051/matecconf/202237007008
Published online		01 December 2022

M. Wiering and H. Van Hasselt, “Ensemble Algorithms in Reinforcement Learning,” vol. 38, pp.930–6, 2008. [Google Scholar]
V. Marivate and M. Littman, “An Ensemble of Linearly Combined Reinforcement-Learning Agents,” in ConferenceProceedings of the 17th AAAI Conference on Late-Breaking Developments in the Field of Artificial Intelligence, 2013. [Google Scholar]
R. Y. Chen, S. Sidor, P. Abbeel and J. Schulman, “UCB exploration via Q-ensembles,” p.arXiv:1706.01502, 2017. [Google Scholar]
R. S. Sutton and A. G. Barto, Introduction to Reinforcement Learning, 2nd ed., Cambridge, MA, USA: MIT Press, 2018. [Google Scholar]
C. J. Watkins and P. Dayan, “Q-learning,” vol. 8, no. 3–4, pp.279–292, 1992. [Google Scholar]
V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg and D. Hassabis, “Human-level control through deep reinforcement learning,” vol. 518, no. 7540, pp.529–533, feb 2015. [Google Scholar]
R. J. Williams, “Simple statistical gradient-following algorithms for connectionist reinforcement learning,” vol. 8, no. 3–4, pp.229–256, 1992. [Google Scholar]
T. Haarnoja, A. Zhou, P. Abbeel and S. Levine, “Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor,” in ConferenceProceedings of the 35th International Conference on Machine Learning, ICML 2018 , Stockholmsmässan, Stockholm, Sweden, July 10–15, 2018, 2018. [Google Scholar]
Z. Wang, V. Bapst, N. Heess, V. Mnih, R. Munos, K. Kavukcuoglu and N. d. Freitas, “Sample Efficient Actor-Critic with Experience Replay,” in 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24–26, 2017, Conference Track Conference Proceedings, 2017. [Google Scholar]
M. Janner, J. Fu, M. Zhang and S. Levine, “When to Trust Your Model: Model-Based Policy Optimization,” in Advances in Neural Information Processing Systems, 2019. [Google Scholar]
R. Laroche and R. Feraud, “Reinforcement Learning Algorithm Selection,” in 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 – May 3, 2018, Conference Track Conference Proceedings, 2018. [Google Scholar]
P. Auer, N. Cesa-Bianchi and P. Fischer, “Finite-Time Analysis of the Multiarmed Bandit Problem,” vol. 47, no. 2–3, pp.235–256, may 2002. [Google Scholar]
F. Fernández and M. Veloso, “Probabilistic Policy Reuse in a Reinforcement Learning Agent,” in ConferenceProceedings of the Fifth International Joint Conference on Autonomous Agents and Multiagent Systems, New York, NY, USA, 2006. [Google Scholar]
S. Li, F. Gu, G. Zhu and C. Zhang, “Context-Aware Policy Reuse,” in Conference Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, Richland, SC, 2019. [Google Scholar]
A. Kurenkov, A. Mandlekar, R. Martin-Martin, S. Savarese and A. Garg, “AC-Teach: A Bayesian Actor-Critic Method for Policy Learning with an Ensemble of Suboptimal Teachers,” in 3rd Annual Conference on Robot Learning, CoRL 2019, Osaka, Japan, October 30 – November 1, 2019, ConferenceProceedings, 2019. [Google Scholar]
B. Rosman, M. Hawasly and S. Ramamoorthy, “Bayesian policy reuse,” vol. 104, no. 1, pp.99–127, jul 2016. [Google Scholar]
R. Laroche, M. Fatemi, H. van Seijen and J. Romoff, “Multi-Advisor Reinforcement Learning,” p.arXiv:1704.00756, apr 2017. [Google Scholar]
D. Ghosh, A. Singh, A. Rajeswaran, V. Kumar and S. Levine, “Divide-and-Conquer Reinforcement Learning,” in 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 – May 3, 2018, Conference Track ConferenceProceedings, 2018. [Google Scholar]
G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang and W. Zaremba, “OpenAI Gym,” p.arXiv:1606.01540, 2016. [Google Scholar]
P. Dhariwal, C. Hesse, O. Klimov, A. Nichol, M. Plappert, A. Radford, J. Schulman, S. Sidor, Y. Wu and P. Zhokhov, OpenAI Baselines, GitHub, 2017. https://github.com/openai/baselines. [Google Scholar]
A. Hill, A. Raffin, M. Ernestus, A. Gleave, A. Kanervisto, R. Traore, P. Dhariwal, C. Hesse, O. Klimov, A. Nichol, M. Plappert, A. Radford, J. Schulman, S. Sidor and Y. Wu, Stable Baselines, GitHub, 2018. https://github.com/hill-a/stable-baselines. [Google Scholar]

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.