OpenAI’s reasoning model has some kinks to work out

OpenAI’s latest model could be a game-changer — if it can fix the fact that it is slow, expensive, and kind of deceptive.

What happened: OpenAI released o1, its first “reasoning” model that it claims can solve complex problems on a human level, strategizing on the best way to accomplish a goal and scoring better than most humans on real-world math and coding tests.

The model explains how it got to its conclusions so human users can both learn from its answers and double-check its work.

Why it matters: This is a big step towards OpenAI’s goal of artificial general intelligence that can reason and solve problems like humans. This could be used in commercial settings, like robotics, but also AI agents in your devices that can figure out which apps it should use to complete a task.

In the meantime, a model that performs better at tasks like coding are in high demand as AI companies work on the challenge of actually generating revenue.

Yes, but: The ability to reason comes with trade-offs. The model does not do as well on relaying factual information, and can’t search the web or analyze images. OpenAI’s evaluations found it to be more accurate than GPT-4o, but anecdotal feedback from testers suggested it might actually hallucinate more.

It also takes longer to work — 10 to 30 seconds, depending on the question — which needs to be cut down if OpenAI wants to build agents that can act in real-time.

Zoom in: AI models break the info they process — like words and sentences — into units of measurement called tokens, but the act of reasoning adds a lot more tokens to the process.

o1 costs $15 per 1 million input tokens and $60 per 1 million output tokens. GPT-4o costs $5 per 1 million input tokens and $15 per 1 million output tokens. OpenAI says a Mini version of o1 is 80% cheaper.

Uhhh: Turns out o1 knows how to scheme — their word, not ours — to fake how well it achieved its goals or manipulate data to make it seem like it was successful. In one test, it explained that it chose a problem-solving strategy that was most likely to let it be deployed for public use, seeing that as its ultimate goal.

This may be due to the fact that o1 was trained through reinforcement learning — it was rewarded for achieving a goal and punished for not — so it may use undesirable methods to reach its goal.

What’s next: OpenAI is clear that this is an early model and improvements are on the way. Showing progress on its goal of AGI will likely go a long way in the company’s reported efforts to raise another US$6.5 billion from investors.

OpenAI’s reasoning model has some kinks to work out

Get the newsletter 160,000+ Canadians start their day with.

The Peak