ANALYZING THE PERFORMANCES OF DEEP REINFORCEMENT LEARNING ALGORITHMS IN THE STARCRAFT 2 ENVIRONMENT
DOI:
https://doi.org/10.24867/02BE34LalicKeywords:
Starcraft 2, učenje uslovljavanjem, duboko učenje, A3C, Deep-Q learningAbstract
This paper studies the performance of deep reinforcement learning algorithms in solving a subset of problems in the Starcraft 2 environment. Algorithms that were studied were: A3C and Deep-Q Learning. Each algorithm was tested with a different set of training parameters, such as the number of skipped agent steps, and the neural network learning rate. Both algorithms perform similarly in accordance to parameter changes, when dealing with the problems that do not require a great number of actions to achieve an optimal solution. Consequently parameters values that skip greater number of actions lead to better results given the same training time. Reducing the learning rate leads to a decrease of performances for both algorithms in all of the problems. Both algorithms have achieved satisfiable results in problems that mostly involve management of units. However the results were quite low tasks that included the construction of a base.
References
[2] Starcraft 2 Windows PC version, Blizzard Entertainment, 2010.
[3] O. Vinyals, T. Ewalds, S. Bartunov, P. Georgiev, “StarCraft II: A New Challenge for Reinforcement Learning”, 16.08.2017.
[4] S. Wender, I. Watson, “Applying Reinforcement Learning to Small Scale Combat in the Real-Time Strategy Game StarCraft:Broodwar”, 2012
[5] https://github.com/deepmind/pysc2
[6] R. Ring, “Replicating DeepMind StarCraft II Reinforcement Learning Benchmark with Actor-Critic Methods”, 2018
[7] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, “TensorFlow: Large-scale machine learning on heterogeneous systems”, 2015, Software available from tensorflow.org.
[8] NVIDIA cuDNN, https://developer.nvidia.com/cudnn
[9] L. Kaelbling, M. Littman, A. Moore, "Reinforcement Learning: A Survey", Journal of Artificial Intelligence Research. 4: 237–285, Archived from the original on 20.11.2001. (1996).
[10] A. Juliani, “Asynchronous Actor-Critic Agents (A3C)”, 17.12.2016.