OpenAI’s Latest AI System Sparks Debate on the Future of Artificial General Intelligence
In September, OpenAI unveiled its latest artificial intelligence (AI) system, o1, with a bold claim: a “new level of AI capability.” The San Francisco-based company, known for its groundbreaking chatbot ChatGPT, introduced o1 as part of its suite of large language models (LLMs). OpenAI asserts that o1 operates in a way that is closer to human thinking than any of its predecessors. This announcement has reignited a long-standing debate: how close are we to achieving artificial general intelligence (AGI)?
AGI refers to an AI system capable of performing the full range of cognitive tasks that humans can handle, such as abstract reasoning, planning, and learning from the world around it. The potential of AGI is immense—it could help solve global challenges like climate change, pandemics, and diseases such as cancer and Alzheimer’s. However, this power comes with significant risks. “Bad things could happen because of either the misuse of AI or because we lose control of it,” warns Yoshua Bengio, a deep-learning researcher at the University of Montreal, Canada.
While the rapid advancements in LLMs have fueled speculation that AGI might be within reach, many researchers argue that current AI systems, including o1, are not yet sufficient to achieve AGI. “There are still some pieces missing,” Bengio notes. Nevertheless, the conversation around AGI has become more urgent and mainstream. “Most of my life, I thought people talking about AGI are crackpots,” says Subbarao Kambhampati, a computer scientist at Arizona State University. “Now, of course, everybody is talking about it. You can’t say everybody’s a crackpot.”
Why the AGI Debate Has Shifted
The term artificial general intelligence gained traction around 2007, following its mention in a book edited by AI researchers Ben Goertzel and Cassio Pennachin. While its exact definition remains elusive, AGI broadly refers to an AI system with human-like reasoning and generalization abilities. Historically, it has been clear that AI systems were far from achieving AGI. For instance, Google DeepMind’s AlphaGo, which excels at the board game Go, demonstrates superhuman abilities—but only within the narrow scope of that game.
However, the emergence of LLMs has changed the landscape. These models exhibit a breadth of capabilities that some researchers believe could signal the arrival of AGI—or at least bring it closer. This is particularly striking given that researchers only partially understand how LLMs achieve their capabilities. LLMs, such as OpenAI’s o1, Anthropic’s Claude, and Google’s Gemini, are neural networks inspired by the human brain. They are trained using a method called next token prediction, where the model predicts the next word or character in a sequence based on context. This process involves billions of fragments of text, scientific data, and programming code.
The introduction of transformer architecture has significantly enhanced LLMs’ abilities. Transformers allow models to identify relationships between words or tokens, even when they are far apart in a sentence. This enables LLMs to parse language in ways that mimic human understanding, such as distinguishing between the two meanings of the word “bank” in the sentence: “When the river’s bank flooded, the water damaged the bank’s ATM, making it impossible to withdraw money.”
These advancements have enabled LLMs to perform tasks like generating computer programs, summarizing academic articles, and solving math problems. As LLMs grow larger, new capabilities emerge, raising the possibility that AGI could simply “emerge” if the models become big enough. One such capability is chain-of-thought (CoT) prompting, which involves breaking down problems into smaller steps. This technique has allowed LLMs to solve previously challenging problems, but it works best with larger models.
The Limits of LLMs
Despite their impressive capabilities, LLMs like o1 have limitations. OpenAI has integrated CoT prompting into o1, which has significantly improved its performance. For example, o1-preview, an advanced version of o1, correctly solved 83% of problems in a qualifying exam for the International Mathematical Olympiad, compared to just 13% for GPT-4o, OpenAI’s previous most powerful LLM. However, experts like Kambhampati and Francois Chollet argue that o1 is still far from achieving AGI.
One limitation is planning. Kambhampati’s team found that while o1 performs well on tasks requiring up to 16 planning steps, its performance declines rapidly when the number of steps increases to 20–40. Chollet observed similar limitations when testing o1-preview with abstract reasoning and generalization tasks. These tests involved visual puzzles that require deducing abstract rules and applying them to new scenarios—something humans excel at but LLMs struggle with.
“LLMs cannot truly adapt to novelty because they have no ability to basically take their knowledge and then do a fairly sophisticated recombination of that knowledge on the fly to adapt to new context,” Chollet explains. This inability to generalize and adapt is a significant barrier to achieving AGI.
Can LLMs Deliver AGI?
There are arguments both for and against the idea that LLMs could eventually lead to AGI. On the positive side, the transformer architecture underlying LLMs can process various types of data, including text, images, and audio. Andrew Wilson, a machine learning researcher at New York University, suggests that transformers are well-suited to learning patterns in data with low Kolmogorov complexity—a measure of the simplicity of data. This capability, combined with the increasing size of LLMs, could be a step toward universal learning, a key component of AGI.
However, there are also signs that LLMs have inherent limitations. For one, the data used to train these models is finite. Researchers at Epoch AI estimate that the supply of publicly available textual data could run out between 2026 and 2032. Additionally, the performance gains achieved by increasing the size of LLMs are diminishing, raising questions about whether further scaling will yield significant improvements.
Raia Hadsell, vice-president of research at Google DeepMind, argues that the current focus on predicting the next token is too narrow to achieve AGI. She suggests that building models capable of generating solutions in larger chunks could bring us closer to AGI. While some existing systems, like OpenAI’s DALL-E, demonstrate this approach, they lack the broad capabilities of LLMs.
Building a World Model
Neuroscientists believe that the key to AGI lies in creating AI systems capable of building a “world model.” This involves representing the environment, imagining different scenarios, and predicting outcomes. Such a model would enable AI to plan, reason, and generalize skills across different domains. While current LLMs have made significant strides, they still fall short of this level of sophistication.
As the debate around AGI continues, one thing is clear: the journey toward human-level intelligence in machines is far from over. While systems like o1 represent remarkable progress, achieving true AGI will require breakthroughs that go beyond the capabilities of today’s LLMs.
Originally Written by: Nature News Team