Inside NeurIPS 2025: Agents, World Models, and the Best Burritos in AI Research

    Inside NeurIPS 2025: Agents, World Models, and the Best Burritos in AI Research

    Nearly 30,000 attendees, 21,000 paper submissions, and a community grappling with fundamental questions about where AI is headed. Our takeaways from the world's premier machine learning conference.

    1 minuto

    Inside NeurIPS 2025: Agents, World Models, and the Best Burritos in AI Research

    Last December, I traveled to San Diego for NeurIPS 2025, the world's premier machine learning conference. With nearly 30,000 attendees from over 60 countries, 21,575 paper submissions, and five days of non-stop technical content, this was the largest NeurIPS to date. What I found there was a community wrestling with fundamental questions about where AI is headed, even as it celebrates remarkable technical achievements.

    San Diego itself deserves a mention. The city is gorgeous, with perfect December weather and stunning coastal views that provided welcome breaks between dense technical sessions. And yes, the burritos are indeed as amazing as everyone says. California-style burritos have no equal, and I may have consumed more than was strictly professional.

    The Scale of Modern AI Research

    The numbers tell a story of explosive growth. Paper submissions jumped from 9,467 in 2020 to 21,575 in 2025. Of those, 5,290 were accepted (a 24.52% acceptance rate), requiring the coordinated effort of over 20,000 reviewers, 1,600 area chairs, and nearly 200 senior area chairs. The San Diego Convention Center was packed, with registration numbers approaching 30,000 when you include the conference's first-ever secondary location in Mexico City.

    This was no longer just an academic conference. The halls were filled with researchers from leading tech companies, AI startups, finance firms, and research labs from around the world. NeurIPS has become equal parts scientific venue and global industry summit.

    Chance Encounters with AI Legends

    One of the joys of attending NeurIPS in person is the serendipity of hallway encounters. I had the unexpected pleasure of meeting Lex Fridman, who was there conducting what he described as "hundreds of amazing technical conversations" while working on getting back into robotics research. His enthusiasm for the field was infectious, and his observation that by 2026 the focus would shift to AIs that can code autonomously stuck with me.

    I also had the chance to see Geoffrey Hinton, the recently-crowned Nobel laureate, in conversation with Google Chief Scientist Jeff Dean. Moderated by Jordan Jacobs from Radical Ventures, their fireside chat drew one of the longest queues of the entire conference. Watching two architects of modern deep learning reminisce about their collaborative breakthroughs was a reminder of how far the field has come in just a decade.

    The Keynotes: Questioning Our Assumptions

    The invited talks at NeurIPS 2025 offered something rare: leading researchers openly questioning the assumptions underlying current approaches.

    Yejin Choi's Posner Lecture: The Dark Matter of Knowledge

    On December 4th at 8:30am, Yejin Choi delivered the Posner Lecture. Choi, a Stanford professor, MacArthur Fellow, and one of Time's Most Influential People in AI, addressed what she calls the "jagged intelligence" problem. Her argument was striking: despite rapid progress on benchmarks, state-of-the-art models still exhibit fundamentally inconsistent reasoning. Our scientific understanding of artificial intelligence simply has not kept pace with engineering advances.

    Choi pointed out that internet data represents only a small portion of what she termed the "Universe of Knowledge." There exists a vast "Dark Matter of Knowledge" that our models never encounter during training, knowledge essential for out-of-distribution reasoning. Her proposed solution was "Effortful RL" rather than "Effortless RL," combined with "unconventional collaboration" that breaks down barriers between companies and organizations.

    The subtext was clear: scaling alone will not solve our deepest challenges.

    Richard Sutton: The Father of RL Sounds an Alarm

    Richard Sutton, the 2024 Turing Award recipient and author of the foundational textbook on reinforcement learning, delivered an even more pointed critique. "AI has become a huge industry," he said, "and to an extent it has lost its way."

    His talk, "Vision of SuperIntelligence without Bitterness," was a reference to his famous Bitter Lesson essay. Sutton argued that we need agents that learn continually, building world models and performing planning from runtime experience rather than having knowledge injected at design time. His proposed OaK Architecture embodied this vision.

    What made the talk especially memorable was its unexpected optimism. Here was the author of the Bitter Lesson, the man who argued that compute always wins in the end, suggesting that this outcome is not inevitable. Attendees left with a sense that the field might be at an inflection point.

    The Rise of Agentic AI

    If there was one dominant theme at NeurIPS 2025, it was the shift from "System 1" to "System 2" thinking in AI agents. The conference marked a clear transition from models optimized for fast, intuitive responses (which are prone to hallucinations) toward architectures designed for slower, more deliberate reasoning.

    AI agents connected to external tools are now being deployed for real tasks: booking flights, canceling subscriptions, issuing refunds, and interacting with the web on behalf of users. The infrastructure to support this is maturing rapidly. Anthropic's Model Context Protocol (MCP) and Google's A2A (agent-to-agent) protocol provide scaffolding for agents to communicate and coordinate with each other.

    The practical implications for businesses are significant. We are moving from AI as a tool you query to AI as an agent you delegate to. This shift requires new thinking about trust, verification, and human oversight.

    Simulation, World Models, and Embodied AI

    A second major theme was the explosion of interest in world models and simulation environments. Researchers presented SimWorld, a platform for training AI agents in realistic, open-ended simulations with multimodal inputs and social reasoning scenarios. In these environments, agents can pursue long-term goals like building careers or starting businesses, providing researchers with sandboxes to study emergent intelligent behavior.

    The World Models Workshop brought together researchers from generative modeling, reinforcement learning, computer vision, and robotics. The consensus was that world models, which infer and predict real-world dynamics, have become a cornerstone of embodied AI development.

    For robotics and real-time systems with tight latency and safety constraints, researchers presented novel architectures like the Elastic State Model (ESM). This approach combines streaming state-space backbones with geometric correction blocks that activate only when dynamics become fragile. The result is systems that can react quickly while maintaining robustness.

    Stanford's AI Lab presented particularly impressive work on steering behaviors, imitation learning, and theory of mind for embodied AI.

    What It Means for Applied AI

    Stepping back from the technical details, NeurIPS 2025 signaled several shifts that matter for anyone building AI products and systems.

    First, the community is moving beyond "more parameters equals better." The new focus is on reasoning quality, efficiency, and reliability. If 2023 and 2024 were about scale, 2025 was about making models think more carefully.

    Second, agentic architectures are becoming practical. The tools and protocols for building reliable AI agents are maturing, and the use cases are moving from research demos to production systems.

    Third, there is renewed attention to what models do not know. Yejin Choi's "dark matter of knowledge" framing captures a growing awareness that our models have systematic blind spots that more data alone will not fix.

    Finally, the leading researchers are actively questioning current approaches. When Richard Sutton says the field has "lost its way," when Yejin Choi argues that scaling has limitations, the community listens.

    Closing Thoughts

    NeurIPS 2025 was both exhilarating and humbling. Exhilarating because the technical progress on display was remarkable, from 1000-layer self-supervised RL networks to sophisticated world models for embodied agents. Humbling because the smartest people in the field are openly grappling with fundamental limitations in our current approaches.

    For those of us building practical AI systems, the message is clear: the next wave of progress will come not from simply making models bigger, but from making them reason better, learn continuously, and acknowledge what they do not know.

    Also, if you ever find yourself at a conference in San Diego, do yourself a favor and try the burritos. They are genuinely outstanding.

    Share:
    Carlos from Vindler

    Carlos from Vindler

    Founder and AI Engineering Lead at Vindler. Passionate about building intelligent systems that solve real-world problems. When I'm not coding, I'm exploring the latest in AI research and helping teams leverage AWS to scale their applications.

    Get in Touch

    Assine nossa newsletter

    Seja notificado quando publicarmos novos posts sobre desenvolvimento de IA, AWS e engenharia de software.