Avoiding Tampering Incentives in Deep RL via Decoupled Approval https://arxiv.org/abs/2011.08827