Safely Interruptible Agents

2858 shaares

Filters

Links per page

20 50 100

1 result tagged hal-9000

Safely Interruptible Agents

Reinforcement learning agents interacting with a complex environment like the real world are un- likely to behave optimally all the time. If such an agent is operating in real-time under human supervision, now and then it may be necessary for a human operator to press the big red button to prevent the agent from continuing a harmful sequence of actions — harmful either for the agent or for the environment — and lead the agent into a safer situation. However, if the learning agent expects to receive rewards from this sequence, it may learn in the long run to avoid such interrup- tions, for example by disabling the red button — which is an undesirable outcome.

ai · reinforcement-learning · interruptibility · hal-9000

March 16, 2019 at 5:34:12 AM GMT+1 · permalink

·

https://intelligence.org/files/Interruptibility.pdf

·