2910 shaares
55 private links
55 private links
Reinforcement learning agents interacting with a complex environment like the real world are un- likely to behave optimally all the time. If such an agent is operating in real-time under human supervision, now and then it may be necessary for a human operator to press the big red button to prevent the agent from continuing a harmful sequence of actions — harmful either for the agent or for the environment — and lead the agent into a safer situation. However, if the learning agent expects to receive rewards from this sequence, it may learn in the long run to avoid such interrup- tions, for example by disabling the red button — which is an undesirable outcome.