REINFORCEMENT LEARNING WITH SAFETY MECHANISMS – CASE STUDY: SELF-MODIFICATION
DOI:
https://doi.org/10.24867/11BE04FrancuskiKeywords:
Reinforcement Learning, Neural Networks, Self-modification, Self-delusionAbstract
With the advances made in artificial intelligence (AI) and its applications, there is a rise in AI autonomous systems' safety concerns. Lately, there have been advances in specifying safety concerns problems in the form of their mathematical formulation and proposed solutions. The focus of this paper is testing the reinforcement learning algorithms on the issue of self-modification. The environment for testing this issue, called "Whisky and Gold," is defined in the AI Safety Gridworlds paper. In this environment, the author of this paper tested DQN, A2C, and SAC algorithms. Agents trained by DQN and SAC learned to be robust to self-modification, while A2C agent did not. Because agents can diverge from the solution, to train agents robust to self-modification, monitoring the training process is required.
References
[2] MAEI, Hamid Reza, et al. Toward off-policy learning control with function approximation. In: ICML. 2010.
[3] MNIH, Volodymyr, et al. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013.
[4] HAARNOJA, Tuomas, et al. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. arXiv preprint arXiv:1801.01290, 2018.
[5] CHRISTODOULOU, Petros. Soft actor-critic for discrete action settings. arXiv preprint arXiv:1910.07207, 2019.
[6] MNIH, Volodymyr, et al. Asynchronous methods for deep reinforcement learning. In: International conference on machine learning. 2016. p. 1928-1937.
[7] AMODEI, Dario, et al. Concrete problems in AI safety. arXiv preprint arXiv:1606.06565, 2016.
[8] BRUNDAGE, Miles. Taking superintelligence seriously: Superintelligence: Paths, dangers, strategies by Nick Bostrom (Oxford University Press, 2014). Futures, 2015, 72: 32-35.
[9] HIBBARD, Bill. Model-based utility functions. Journal of Artificial General Intelligence, 2012, 3.1: 1-24.
[10] ORSEAU, Laurent; ARMSTRONG, M. S. Safely interruptible agents. 2016.
[11] RING, Mark; ORSEAU, Laurent. Delusion, survival, and intelligent agents. In: International Conference on Artificial General Intelligence. Springer, Berlin, Heidelberg, 2011. p. 11-20.
[12] EVERITT, Tom, et al. Reinforcement learning with a corrupted reward channel. arXiv preprint arXiv:1705.08417, 2017.
[13] ORSEAU, Laurent; RING, Mark. Self-modification and mortality in artificial agents. In: International Conference on Artificial General Intelligence. Springer, Berlin, Heidelberg, 2011. p. 1-10.
[14] HERNÁNDEZ-ORALLO, José, et al. Surveying Safety-relevant AI characteristics. In: AAAI Workshop on Artificial Intelligence Safety (SafeAI 2019). CEUR Workshop Proceedings, 2019. p. 1-9.
[15] HUTTER, Marcus. Universal artificial intelligence: Sequential decisions based on algorithmic probability. Springer Science & Business Media, 2004.
[16] PUTERMAN, Martin L. Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons, 2014.
[17] GHAHRAMANI, Zoubin. Learning dynamic Bayesian networks. In: International School on Neural Networks, Initiated by IIASS and EMFCSC. Springer, Berlin, Heidelberg, 1997. p. 168-197.
[18] IOFFE, Sergey; SZEGEDY, Christian. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167, 2015.
[19] HAHNLOSER, Richard HR, et al. Digital selection and analogue amplification coexist in a cortex-inspired silicon circuit. Nature, 2000, 405.6789: 947-951.