The Next Breakthrough In AI Agents Is Here
By Y Combinator
Summary
## Key takeaways * **Focus on Orchestration for Cost Efficiency and Transparency:** AI agents leveraging multi-agent orchestration, like Manus, can significantly reduce per-task costs and offer greater transparency and user control by exposing internal processes, which centralized platforms often lack (06:20, 06:40). * **"Wrappers" are a Valid and Valuable Approach:** Dismissing AI applications as mere "wrappers" that integrate existing models overlooks the reality that many successful AI products, including Manus, derive their value from intuitive UI, proprietary evaluations, careful fine-tuning, and sophisticated multi-agent architectures, rather than foundational model development (04:45, 05:40). * **Sustainable Differentiation is Paramount for Founders:** For founders building AI products, the critical challenge is not whether "wrappers" are viable, but how to establish sustainable differentiation through proprietary evaluations, deep workflow integration to increase switching costs, or exclusive access to platforms and datasets (07:50, 08:10). * **Human-like Performance in Complex Tasks is Achievable:** Advanced AI agents are demonstrating near-human performance on challenging benchmarks like GAIA, indicating their capability to handle complex reasoning, multimodal interactions, web browsing, and tool proficiency, suggesting a rapidly closing gap with human-level task execution (04:08, 04:30). ## Smart Chapters * **Manus's Emergence as a General AI Agent (00:00)**: The video introduces Manus, a new multi-agent AI platform gaining significant attention for its promise as a general-purpose AI agent. * **Understanding Manus's Multi-Agent Architecture (01:32)**: This section explains how Manus functions as an executive overseeing a team of specialized sub-agents that dynamically break down and execute complex tasks. * **Manus's Unique Technical Innovations (02:05)**: The segment details Manus's advanced features, including its dynamic task decomposition, chain of thought injection for stability, and integration with open-source tools for cross-platform execution. * **Real-World Capabilities and Benchmark Performance (03:40)**: This part showcases Manus's ability to perform a wide range of practical tasks and highlights its impressive 86.5% score on the GAIA benchmark, nearing human average. * **Addressing the "Wrapper" Critique (04:45)**: The video discusses the common criticism of Manus as a mere "wrapper" and argues that most successful AI applications are effectively wrappers, emphasizing the value of their unique orchestration and integration. * **Advantages of Manus's Design (06:20)**: This section outlines the benefits of Manus, such as lower per-task costs, enhanced transparency, and greater user control compared to more centralized AI platforms. * **Limitations and Vulnerabilities of Manus (07:00)**: The segment addresses the drawbacks, including challenges with scaling task complexity and the vulnerability of its current advantages to replication by competitors. * **Strategic Insights for AI Founders (07:50)**: The video concludes by advising founders on achieving sustainable differentiation in the AI space, focusing on proprietary evaluations, deep user integration, and exclusive data access. ## Key quotes * "Behind all the excitement around Manis is something genuinely innovative: a multi-agent AI system that can seemingly complete all sorts of tasks from travel planning and financial analysis to searching over dozens of files or doing industry research." (01:32) * "Manis smashed the state-of-the-art on Gaia, scoring 86.5%, just a few points shy of the average human." (04:30) * "Most successful AI products today could also qualify as wrappers by this logic." (05:00) * "One of the coolest things Manis figured out was actually exposing the file system so you could see exactly what the agents were doing. ChatGPT requires you to reprompt and it's opaque what's happening when it's thinking. Manis is a glimpse into the future of ChatGPT desktop operating directly on your computer." (06:40) * "Ultimately, the critical challenge isn't deciding whether wrappers are viable, but identifying genuinely sustainable differentiation for your product." (07:50) ## Stories and anecdotes * The video highlights the immediate global attention and hype Manus received upon its launch, with some calling it "China's next DeepSeek moment" and users describing it as the "most impressive AI tool they've ever tried," emphasizing the strong initial reaction to its general-purpose AI agent capabilities. (00:30) * It contrasts Manus's transparency with other AI models by noting that Manus exposes its file system, allowing users to see exactly what the agents are doing, whereas platforms like ChatGPT are opaque during their thinking process, offering a "glimpse into the future of ChatGPT desktop." (06:40) ## Mentioned Resources * OpenAI: Developer of AI models and research platforms (00:00, 04:10, 06:20) * Google: Developer of AI tools and research platforms (00:00) * XAI: AI company mentioned as a competitor (00:00) * DeepSeek: AI platform mentioned as a competitor and a benchmark for Chinese AI innovation (00:00, 00:40) * Anthropic's Claude 3.7 Sonnet: The foundational large language model powering Manus (03:15) * YC company Browserbase: An open-source tool integrated with Manus for advanced website interaction (03:25) * Startup E2B: Provides a secure cloud sandbox environment integrated with Manus (03:30) * GAIA: A benchmark designed to challenge AI agents on reasoning, multimodal handling, web browsing, and tool proficiency (04:08) * Cursor: An example of an AI product that integrates existing LLMs and APIs (05:05) * Windsurf: An example of an AI product that integrates existing LLMs and APIs (05:05) * Harvey: A domain-specific AI agent that combines foundational models with legal-specific tool integrations (05:25) * Yichao Peak G: Co-founder of Manus (05:50)
Topics Covered
- How does Manis achieve general AI agent capabilities?
- Why are "rapper" AI products strategically superior?
- Manis reveals AI's future: transparent desktop agents.
- What builds sustainable differentiation for AI products?
- Success in AI: Stitching models into products users love.
Full Transcript
Usable AI agents are finally here. From
deep research platforms out of OpenAI
and Google to similar tools from XAI and
DeepSeek. Joining the competition now is
Manis, a brand new agentic AI platform
that has taken the world by storm. And
today we're launching an early preview
of Manis, the first general AI agent.
When Manis officially launched, the hype
around it immediately took off. A
Chinese startup unveiling a new AI agent
that some are calling China's next
deepseeek moment. With people calling it
the most impressive AI tool they've ever
tried and the most sophisticated
computer using AI. Unlike some of its
predecessors, Manis wasn't just another
specialized chatbot. It promised to be a
true generalpurpose AI agent. With
invitations rare and access limited, the
question remains, has Manis truly
revolutionized the AI agent landscape.
Let's find
[Music]
out. Behind all the excitement around
Manis is something genuinely innovative.
a multi- aent AI system that can
seemingly complete all sorts of tasks
from travel planning and financial
analysis to searching over dozens of
files or doing industry research. So
how does it work? Rather than relying on
one big neural network, Manis works more
like an executive overseeing a team of
sub agents coordinating and guiding
their every move across a shared action
space. It takes in your prompt as input
and gets to work figuring out what it
needs to do. Instead of tackling your
task in one go, a planner agent first
comes up with a master plan to follow
breaking things down into manageable
subtasks. This way, Manis knows
precisely what needs to be done before
executing and can hand off these tasks
to other sub aents. These are like
Manis's own in-house experts. They share
the same context, but each has its own
delineated domain from knowledge or
memory to execution. Manis can call upon
an extensive suite of 29 different
integrated tools. Whether they're
automating web navigation, securely
running code, or pulling important
information from files, Manis' sub
agents intelligently decide which tools
to use. Finally, when each subtask is
complete, the executor agent combines
the outputs together into a final
synthesized output for the user. Under
the hood, Manis is powered by a pretty
sophisticated dynamic task decomposition
algorithm.
This is what enables it to autonomously
break down complex instructions into
clear execution paths. To ensure
stability even after dozens of rounds of
reasoning and tool use, the Manis team
developed an original technique called
chain of thought injection, enabling
agents to actively reflect and update
plans. At its core, Manis makes use of
Anthropic's Claude 3.7 sonnet. Manis
also features robust cross-platform
execution capabilities thanks to its
seamless integration with open source
tools like YC company browser use for
advanced website interaction and startup
E2B's secure cloud sandbox environment.
So what can Manis actually accomplish?
Impressively, it can take on a wide
range of real world tasks. It excels in
scenarios like creating travel
itineraries, detailed financial
analyses, and educational content. While
it can also assist with valuable tasks
like structured database compilation
insurance policy comparisons, supplier
sourcing, and even assisting with
highquality presentations. To truly
measure Manis' capabilities, we can look
at Gaia, a benchmark designed to
challenge AI agents on reasoning
multimodal handling, web browsing, and
tool proficiency. Humans typically score
about 92% whereas OpenAI's deep
research, in comparison, scored about
74% at its best. Manis smashed the
state-of-the-art on Gaia, scoring
86.5%, just a few points shy of the
average human. Still, despite impressive
benchmark performance, Manis has
reignited a broader conversation about
the nature of AI startups at the
application layer, rappers. Some have
dismissed Manis as merely a rapper since
it stitches together existing
foundational models and various tool
calls. But this dismissal overlooks an
important reality. Most successful AI
products today could also qualify as
rappers by this logic. Cursor and
Windsurf, for example, integrate
existing LLMs alongside external APIs
and developer focused tooling such as
realtime code analysis and debugging
utilities. Domain specific agents like
Harvey combine foundational models with
legal specific tool integrations. case
law retrieval, compliance checks, and
document analysis. Clearly, many useful
applications do fit the rapper mold. And
for many developers, it makes sense to
go this route. As Manis co-founder
Yichchow Peak G told us himself, from
day one, they decided to work
orthogonally to model development
wanting to be excited rather than
threatened by each new model release.
What distinguishes successful rappers
from their less effective counterparts
is typically a bunch of things.
Intuitive UI, proprietary evals, much
more careful fine-tuning of foundational
models, and thoughtfully designed multi-
aent architectures. And this is a good
example of that. Manis itself
illustrates these trade-offs really
well. On the positive side, its
multi-agent orchestration helps deliver
significantly lower per task costs
around $2 a task compared to integrated
competitors like OpenAI's Deep Research.
Manis also offers greater transparency
and user control, letting users directly
inspect, customize, or replace
individual sub agents and tool
integrations. A degree of flexibility
centralized platforms rarely match. One
of the coolest things Manis figured out
was actually exposing the file system so
you could see exactly what the agents
were doing. Chat GPT requires you to
reprompt and it's opaque what's
happening when it's thinking. Manis is a
glimpse into the future of Chat GPT
desktop operating directly on your
computer and it will be cool to see how
much more control you'll get when it's
happening there instead of a browser.
But there are a few clear limitations.
coordination across specialized agents
becomes increasingly difficult as tasks
scale or complexity grows. More
critically, its current advantages, UX
refinements, targeted fine-tuning
thoughtful integrations are vulnerable
to competitors just coming along and
doing that as well. These strengths and
weaknesses are generally shared by
rappers. They allow you to have really
rapid deployment, iteration, and
specialized UX at lower upfront cost
but they're also vulnerable to
disruption such as API pricing changes
or provider policy shifts, which can
quickly erase any of the cost benefits.
Ultimately, the critical challenge isn't
deciding whether rappers are viable, but
identifying genuinely sustainable
differentiation for your product. For
founders, this might mean investing
early and proprietary eval that are
expensive or timeconuming to replicate
embedding your workflows deeply into
specific user routines to increase
switching costs, or identifying
integrations with platforms or data sets
competitors can't easily access. In the
end, success in AI doesn't hinge on
reinventing the wheel, but rather on who
can stitch together the existing models
into a product users genuinely love.
Loading video analysis...