Сиолошная

Ет само, Anthropic, в которых Google вложилась на $400M, чтобы улучшать своего чатбота (и эта же компания основана выходцами из OpenAI), дропнули новый свежий бенгер

The Capacity for Moral Self-Correction in Large Language Models

Бегом читать, завтра обсудим 👀

"... language models trained with RLHF have the capability to "morally self-correct" [...] We find strong evidence in support of this hypothesis across three different experiments..."