
Hallucinations and completely made up answers became a huge issue for OpenAI, Google, etc. since it might cause huge legal issues and health repercussions if the end users follow such advise. So here is a solution OpenAI suggests.
OpenAI’s new anti-fiction strategy is to train AI models to reward themselves for every single, correct step of reasoning as they arrive at an answer, instead of rewarding only the correct final conclusion. The researchers called this approach “process control.” In their opinion, it can lead to the creation of a more logical AI, since this strategy encourages the model to follow a “chain of thought” similar to a human.
[More]
@ppprompt