The Next Breakthrough In AI Agents Is Here

By Y Combinator

Summary

## Key takeaways * **Focus on Orchestration for Cost Efficiency and Transparency:** AI agents leveraging multi-agent orchestration, like Manus, can significantly reduce per-task costs and offer greater transparency and user control by exposing internal processes, which centralized platforms often lack (06:20, 06:40). * **"Wrappers" are a Valid and Valuable Approach:** Dismissing AI applications as mere "wrappers" that integrate existing models overlooks the reality that many successful AI products, including Manus, derive their value from intuitive UI, proprietary evaluations, careful fine-tuning, and sophisticated multi-agent architectures, rather than foundational model development (04:45, 05:40). * **Sustainable Differentiation is Paramount for Founders:** For founders building AI products, the critical challenge is not whether "wrappers" are viable, but how to establish sustainable differentiation through proprietary evaluations, deep workflow integration to increase switching costs, or exclusive access to platforms and datasets (07:50, 08:10). * **Human-like Performance in Complex Tasks is Achievable:** Advanced AI agents are demonstrating near-human performance on challenging benchmarks like GAIA, indicating their capability to handle complex reasoning, multimodal interactions, web browsing, and tool proficiency, suggesting a rapidly closing gap with human-level task execution (04:08, 04:30). ## Smart Chapters * **Manus's Emergence as a General AI Agent (00:00)**: The video introduces Manus, a new multi-agent AI platform gaining significant attention for its promise as a general-purpose AI agent. * **Understanding Manus's Multi-Agent Architecture (01:32)**: This section explains how Manus functions as an executive overseeing a team of specialized sub-agents that dynamically break down and execute complex tasks. * **Manus's Unique Technical Innovations (02:05)**: The segment details Manus's advanced features, including its dynamic task decomposition, chain of thought injection for stability, and integration with open-source tools for cross-platform execution. * **Real-World Capabilities and Benchmark Performance (03:40)**: This part showcases Manus's ability to perform a wide range of practical tasks and highlights its impressive 86.5% score on the GAIA benchmark, nearing human average. * **Addressing the "Wrapper" Critique (04:45)**: The video discusses the common criticism of Manus as a mere "wrapper" and argues that most successful AI applications are effectively wrappers, emphasizing the value of their unique orchestration and integration. * **Advantages of Manus's Design (06:20)**: This section outlines the benefits of Manus, such as lower per-task costs, enhanced transparency, and greater user control compared to more centralized AI platforms. * **Limitations and Vulnerabilities of Manus (07:00)**: The segment addresses the drawbacks, including challenges with scaling task complexity and the vulnerability of its current advantages to replication by competitors. * **Strategic Insights for AI Founders (07:50)**: The video concludes by advising founders on achieving sustainable differentiation in the AI space, focusing on proprietary evaluations, deep user integration, and exclusive data access. ## Key quotes * "Behind all the excitement around Manis is something genuinely innovative: a multi-agent AI system that can seemingly complete all sorts of tasks from travel planning and financial analysis to searching over dozens of files or doing industry research." (01:32) * "Manis smashed the state-of-the-art on Gaia, scoring 86.5%, just a few points shy of the average human." (04:30) * "Most successful AI products today could also qualify as wrappers by this logic." (05:00) * "One of the coolest things Manis figured out was actually exposing the file system so you could see exactly what the agents were doing. ChatGPT requires you to reprompt and it's opaque what's happening when it's thinking. Manis is a glimpse into the future of ChatGPT desktop operating directly on your computer." (06:40) * "Ultimately, the critical challenge isn't deciding whether wrappers are viable, but identifying genuinely sustainable differentiation for your product." (07:50) ## Stories and anecdotes * The video highlights the immediate global attention and hype Manus received upon its launch, with some calling it "China's next DeepSeek moment" and users describing it as the "most impressive AI tool they've ever tried," emphasizing the strong initial reaction to its general-purpose AI agent capabilities. (00:30) * It contrasts Manus's transparency with other AI models by noting that Manus exposes its file system, allowing users to see exactly what the agents are doing, whereas platforms like ChatGPT are opaque during their thinking process, offering a "glimpse into the future of ChatGPT desktop." (06:40) ## Mentioned Resources * OpenAI: Developer of AI models and research platforms (00:00, 04:10, 06:20) * Google: Developer of AI tools and research platforms (00:00) * XAI: AI company mentioned as a competitor (00:00) * DeepSeek: AI platform mentioned as a competitor and a benchmark for Chinese AI innovation (00:00, 00:40) * Anthropic's Claude 3.7 Sonnet: The foundational large language model powering Manus (03:15) * YC company Browserbase: An open-source tool integrated with Manus for advanced website interaction (03:25) * Startup E2B: Provides a secure cloud sandbox environment integrated with Manus (03:30) * GAIA: A benchmark designed to challenge AI agents on reasoning, multimodal handling, web browsing, and tool proficiency (04:08) * Cursor: An example of an AI product that integrates existing LLMs and APIs (05:05) * Windsurf: An example of an AI product that integrates existing LLMs and APIs (05:05) * Harvey: A domain-specific AI agent that combines foundational models with legal-specific tool integrations (05:25) * Yichao Peak G: Co-founder of Manus (05:50)

Topics Covered

How does Manis achieve general AI agent capabilities?
Why are "rapper" AI products strategically superior?
Manis reveals AI's future: transparent desktop agents.
What builds sustainable differentiation for AI products?
Success in AI: Stitching models into products users love.

Full Transcript

Usable AI agents are finally here. From

deep research platforms out of OpenAI

and Google to similar tools from XAI and

DeepSeek. Joining the competition now is

Manis, a brand new agentic AI platform

that has taken the world by storm. And

today we're launching an early preview

of Manis, the first general AI agent.

When Manis officially launched, the hype

around it immediately took off. A

Chinese startup unveiling a new AI agent

that some are calling China's next

deepseeek moment. With people calling it

the most impressive AI tool they've ever

tried and the most sophisticated

computer using AI. Unlike some of its

predecessors, Manis wasn't just another

specialized chatbot. It promised to be a

true generalpurpose AI agent. With

invitations rare and access limited, the

question remains, has Manis truly

revolutionized the AI agent landscape.

Let's find

[Music]

out. Behind all the excitement around

Manis is something genuinely innovative.

a multi- aent AI system that can

seemingly complete all sorts of tasks

from travel planning and financial

analysis to searching over dozens of

files or doing industry research. So

how does it work? Rather than relying on

one big neural network, Manis works more

like an executive overseeing a team of

sub agents coordinating and guiding

their every move across a shared action

space. It takes in your prompt as input

and gets to work figuring out what it

needs to do. Instead of tackling your

task in one go, a planner agent first

comes up with a master plan to follow

breaking things down into manageable

subtasks. This way, Manis knows

precisely what needs to be done before

executing and can hand off these tasks

to other sub aents. These are like

Manis's own in-house experts. They share

the same context, but each has its own

delineated domain from knowledge or

memory to execution. Manis can call upon

an extensive suite of 29 different

integrated tools. Whether they're

automating web navigation, securely

running code, or pulling important

information from files, Manis' sub

agents intelligently decide which tools

to use. Finally, when each subtask is

complete, the executor agent combines

the outputs together into a final

synthesized output for the user. Under

the hood, Manis is powered by a pretty

sophisticated dynamic task decomposition

algorithm.

This is what enables it to autonomously

break down complex instructions into

clear execution paths. To ensure

stability even after dozens of rounds of

reasoning and tool use, the Manis team

developed an original technique called

chain of thought injection, enabling

agents to actively reflect and update

plans. At its core, Manis makes use of

Anthropic's Claude 3.7 sonnet. Manis

also features robust cross-platform

execution capabilities thanks to its

seamless integration with open source

tools like YC company browser use for

advanced website interaction and startup

E2B's secure cloud sandbox environment.

So what can Manis actually accomplish?

Impressively, it can take on a wide

range of real world tasks. It excels in

scenarios like creating travel

itineraries, detailed financial

analyses, and educational content. While

it can also assist with valuable tasks

like structured database compilation

insurance policy comparisons, supplier

sourcing, and even assisting with

highquality presentations. To truly

measure Manis' capabilities, we can look

at Gaia, a benchmark designed to

challenge AI agents on reasoning

multimodal handling, web browsing, and

tool proficiency. Humans typically score

about 92% whereas OpenAI's deep

research, in comparison, scored about

74% at its best. Manis smashed the

state-of-the-art on Gaia, scoring

86.5%, just a few points shy of the

average human. Still, despite impressive

benchmark performance, Manis has

reignited a broader conversation about

the nature of AI startups at the

application layer, rappers. Some have

dismissed Manis as merely a rapper since

it stitches together existing

foundational models and various tool

calls. But this dismissal overlooks an

important reality. Most successful AI

products today could also qualify as

rappers by this logic. Cursor and

Windsurf, for example, integrate

existing LLMs alongside external APIs

and developer focused tooling such as

realtime code analysis and debugging

utilities. Domain specific agents like

Harvey combine foundational models with

legal specific tool integrations. case

law retrieval, compliance checks, and

document analysis. Clearly, many useful

applications do fit the rapper mold. And

for many developers, it makes sense to

go this route. As Manis co-founder

Yichchow Peak G told us himself, from

day one, they decided to work

orthogonally to model development

wanting to be excited rather than

threatened by each new model release.

What distinguishes successful rappers

from their less effective counterparts

is typically a bunch of things.

Intuitive UI, proprietary evals, much

more careful fine-tuning of foundational

models, and thoughtfully designed multi-

aent architectures. And this is a good

example of that. Manis itself

illustrates these trade-offs really

well. On the positive side, its

multi-agent orchestration helps deliver

significantly lower per task costs

around $2 a task compared to integrated

competitors like OpenAI's Deep Research.

Manis also offers greater transparency

and user control, letting users directly

inspect, customize, or replace

individual sub agents and tool

integrations. A degree of flexibility

centralized platforms rarely match. One

of the coolest things Manis figured out

was actually exposing the file system so

you could see exactly what the agents

were doing. Chat GPT requires you to

reprompt and it's opaque what's

happening when it's thinking. Manis is a

glimpse into the future of Chat GPT

desktop operating directly on your

computer and it will be cool to see how

much more control you'll get when it's

happening there instead of a browser.

But there are a few clear limitations.

coordination across specialized agents

becomes increasingly difficult as tasks

scale or complexity grows. More

critically, its current advantages, UX

refinements, targeted fine-tuning

thoughtful integrations are vulnerable

to competitors just coming along and

doing that as well. These strengths and

weaknesses are generally shared by

rappers. They allow you to have really

rapid deployment, iteration, and

specialized UX at lower upfront cost

but they're also vulnerable to

disruption such as API pricing changes

or provider policy shifts, which can

quickly erase any of the cost benefits.

Ultimately, the critical challenge isn't

deciding whether rappers are viable, but

identifying genuinely sustainable

differentiation for your product. For

founders, this might mean investing

early and proprietary eval that are

expensive or timeconuming to replicate

embedding your workflows deeply into

specific user routines to increase

switching costs, or identifying

integrations with platforms or data sets

competitors can't easily access. In the

end, success in AI doesn't hinge on

reinventing the wheel, but rather on who

can stitch together the existing models

into a product users genuinely love.

Loading...

Loading video analysis...