News
On some of those tests GPT-3 scored better than a group of undergrads. “Analogy is central to human reasoning,” says Webb.
LLMs’ “simulated reasoning” abilities are a “brittle mirage,” researchers find Chain-of-thought AI "degrades significantly" when asked to generalize beyond training.
Some results have been hidden because they may be inaccessible to you
Show inaccessible results