OpenAI:

In recent years, large language models have greatly improved in their ability to perform complex multi-step reasoning. However, even state-of-the-art models still produce logical mistakes, often called hallucinations.

[…]

We can train reward models to detect hallucinations using either outcome supervision, which provides feedback based on a final result, or process supervision, which provides feedback for each individual step in a chain-of-thought… We find that process supervision leads to significantly better performance, even when judged by outcomes.

This technique was evaluated using questions from a large mathematics dataset. This is an important caveat as math is a domain that is well-versed in the practice of “showing your work.” Presumably GPT-4’s training corpus includes many instances of people walking through math problems step-by-step. The relative preformance of process supervision when it comes to questions from other domains is still unknown.