USING REINFORCEMENT LEARNING TO TRAIN AN AGENT FOR THE Obstacle Tower ENVIRONMENT
DOI:
https://doi.org/10.24867/06BE06NjegovanovicKeywords:
Reinforcement learning, three-dimensional environment, environment control, navigation, sparse rewards, neural networkAbstract
Reinforcement learning in today's order of things when artificial intelligence is on the rise is a favorable field for new research. One of the problems that was trying to be solved in the last year or two is the problem of environmental control or navigation. This paper presents one form of solution to the problem of navigation and generalization in a three-dimensional environment is presented while there are limits to rewards, by forming an autonomous agent with deep learning techniques. An evaluation of the agent's performance was performed by comparing it with the human performance and results already described in the accompanying scientific papers.
References
[1] AY Ng. (2017, December 14). Practical applications of reinforcement learning in industry. Retrieved June 24, 2019 from https://www.oreilly.com/ideas/practical-applications-of-reinforcement-learning-in-industry
[2] Burda, Y., Edwards, H., Pathak, D., Storkey, A., Darrell, T., & Efros, A. A. (2018). Large-scale study of curiosity-driven learning. arXiv preprint arXiv:1808.04355.
[3] Python. (n.d). Retrieved from https://www.python.org/
[4] Juliani, A., Khalifa, A., Berges, V. P., Harper, J., Henry, H., Crespi, A., ... & Lange, D. (2019). Obstacle Tower: A Generalization Challenge in Vision, Control, and Planning. arXiv preprint arXiv:1902.01378.
[5] Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., & Riedmiller, M. (2013). Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602.
[6] Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167.
[7] Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
[2] Burda, Y., Edwards, H., Pathak, D., Storkey, A., Darrell, T., & Efros, A. A. (2018). Large-scale study of curiosity-driven learning. arXiv preprint arXiv:1808.04355.
[3] Python. (n.d). Retrieved from https://www.python.org/
[4] Juliani, A., Khalifa, A., Berges, V. P., Harper, J., Henry, H., Crespi, A., ... & Lange, D. (2019). Obstacle Tower: A Generalization Challenge in Vision, Control, and Planning. arXiv preprint arXiv:1902.01378.
[5] Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., & Riedmiller, M. (2013). Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602.
[6] Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167.
[7] Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
Downloads
Published
2019-12-21
Issue
Section
Electrotechnical and Computer Engineering