Ет само, Anthropic, в которых Google вложилась на $400M, чтобы улучшать своего чатбота (и эта же компания основана выходцами из OpenAI), дропнули новый свежий бенгер
The Capacity for Moral Self-Correction in Large Language Models
Бегом читать, завтра обсудим 👀
"... language models trained with RLHF have the capability to "morally self-correct" [...] We find strong evidence in support of this hypothesis across three different experiments..."
The Capacity for Moral Self-Correction in Large Language Models
Бегом читать, завтра обсудим 👀
"... language models trained with RLHF have the capability to "morally self-correct" [...] We find strong evidence in support of this hypothesis across three different experiments..."