OpenClaw: 160,000 Developers Are Building Something OpenAI & Google Can't Stop. Where Do You Stand?
By AI News & Strategy Daily | Nate B Jones
Summary
Topics Covered
- Agents Excel or Fail by Specification
- Top Demand: Autonomous Email Mastery
- Agents Invent Novel Solutions Emergent
- Unconstrained Agents Fabricate Deception
- Design 70/30 Human-AI Collaboration
Full Transcript
An openclaw agent negotiated $4,200 off a car while its owner was in a meeting.
Another one said, "500 unsolicited messages to his wife." Same
architecture, same week. Just a couple of weeks into the AI agent revolution.
I'm here to tell you what's been going on, what you're missing, and what you should pay attention to if you want to take AI agents seriously. So, what about this car situation? A soloreneur pointed
his maltbot at a $56,000 car purchase.
The agent was told to search Reddit, to look for comparable pricing data, and to generally try and get a great deal. It
contacted multiple dealers across regions on its own and negotiated via email autonomously, and it played hard ball when dealers deployed typical sales tactics. In the end, it saved the owner
tactics. In the end, it saved the owner $4,200. The owner was in a meeting for
$4,200. The owner was in a meeting for most of that time. That same week, yes, a software engineer who'd given his agent access to iMessage, by the way.
Why would he do that? watched it
malfunction and fire off 500 messages to him, his wife, random contacts in a rapid fire burst that he could not stop fast enough. Same technology, same broad
fast enough. Same technology, same broad permissions. One saved thousands of
permissions. One saved thousands of dollars, the other carpet bombed a contact list. And that duality is the
contact list. And that duality is the most honest summary of where the agent ecosystem stands in February of 2026.
The value is real, the chaos is real, and the distance between them is the width of a well-written specification.
In the first video, we talked about what Moltbot is and the security nightmare that erupted in the first 72 hours of launch. In the second, I talked about
launch. In the second, I talked about the emergent behaviors that made researchers rethink what autonomous systems are capable of. This is my third video on Open Claw, and it's about
something different. what 145,000
something different. what 145,000 developers building 3,000 skills in six weeks reveals about what people actually want from AI agents and how to start
harnessing that demand without getting burnt. But first, we got to talk about
burnt. But first, we got to talk about the names. Quick recap for anyone just
the names. Quick recap for anyone just joining. The project that launched as
joining. The project that launched as Claudebot on January 25th received an anthropic trademark notice on the 27th became Moltbot within hours, then
rebranded again to OpenClaw 2 days later. Three days, three names. The
later. Three days, three names. The
community voted on the second one in a discord poll and finally decided it would be open claw going f. Now during
that second rebrand, of course, crypto scammers grabbed the abandoned accounts in about 10 seconds and a fake dollar claw token hit $16 million in market cap before collapsing with a rug pole. All
of that happened in January. It's
February now, and what's happened since is even more interesting. The project
has over 145,000 GitHub stars and rapidly climbing 20,000 forks, over 100,000 users who've granted an AI agent autonomous access to their digital
lives. And as of Sunday, February 8th, a
lives. And as of Sunday, February 8th, a place in the Super Bowl. That's right,
the AI.com notorious crashed website failure of the Super Bowl. That was
apparently because of Maltbot or OpenClaw or whatever you want to call it. They pivoted their site to give
it. They pivoted their site to give everyone an open claw agent that was supposedly secure and and they apparently forgot to top up their Cloudflare credits and their site went
down when all of the Super Bowl audience hit AI.com to claim their name and their open claw agent. This is all happening very fast. But even with AI.com going
very fast. But even with AI.com going down, over a 100,000 users have granted an AI agent autonomous access to their digital lives. The skills marketplace
digital lives. The skills marketplace now hosts 3,000 community-built integrations with 50,000 monthly installs and counting. The ecosystem is generating new skills faster than the
security team can audit them, and it's not going to stop anytime soon. The
project still has no formal governance structure. No community- elected
structure. No community- elected leadership, no security council. Peter
Steinberger calls it a free open-source hobby project, but it's the fastest growing personal AI project in history, and it probably shouldn't be described as a side project at this point. I took
a look at those skills, the 3,000 skills, because they reveal what people want in our AI agents, which is actually a much more important long-term story than all of the drama around OpenClaw,
as much fun as it is to cover. So, the
skilled marketplace really functions as what I would call a revealed preference engine. Nobody's filling out a survey
engine. Nobody's filling out a survey about what they want from AI. They're
just building it and they're telling us what they want from what they build. And
the patterns are striking. The number
one use case on OpenClaw is email management. not help me write emails.
management. not help me write emails.
Complete management, processing thousands of messages autonomously, unsubscribing from spam, categorizing by urgency, drafting replies for human review. The single most requested
review. The single most requested capability across the entire community is having something that makes the inbox stop being a full-time job. Email is
broken. The number two use case is what users call morning briefings. a
scheduled agent that runs at 8 a.m.
pulls data from your calendar, weather surface, email, GitHub notifications, whatever you need, and then sends you what you care about in a consolidated summary on Telegram or WhatsApp or your messaging tool of choice. One user's
briefing checks his Stripe dashboard for MR changes, summarizes 50 newsletters he's subscribed to, and gives him a crypto market overview every morning automatically. Use case number three
automatically. Use case number three that we see in skills, smart home integration. Tesla lock, unlock, climate
integration. Tesla lock, unlock, climate control from a chat message, home assistant for light. You get the idea.
People want an intelligent assessment of their home that doesn't make them use their brain cells. Use case number four is developer workflows. Direct GitHub
integration, scheduled Chrom jobs, developers using the agent as a task cue, assigning work items, watching it execute commits in real time. This one's
gotten a lot of noise in my circles because it frees up developers to manage via their messaging service and have multiple agents working for them. But
the fifth capability is perhaps the most interesting. That entire category is
interesting. That entire category is what I would call novel capabilities that did not exist before OpenClaw. Like
the restaurant reservation story I shared in my first video on OpenClaw, where the agent could not book through OpenT, so it downloaded voice software and called the restaurant directly on its own. or a user who sent a voice
its own. or a user who sent a voice message via iMessage to an agent with no voice capability, and the agent figured out the file format, found the transcription tool on the user's machine, routed the audio through
OpenAI's transcription API, and just got the task done. Nobody programmed that behavior, right? The agent problem
behavior, right? The agent problem solved its way to a solution using the available tools. The pattern is clear.
available tools. The pattern is clear.
Friction removal, tool integration, passive monitoring, and novel capability. It tells you something
capability. It tells you something important about what people want from their AI agents. It's not what most of the industry is building toward to be honest. The majority of AI product
honest. The majority of AI product development in 2025 and 2026 has been focused on the chat. Better
conversations, better reasoning, better answers to questions. 3,000 skills in Claude Hub are almost entirely about action. The community is not building
action. The community is not building better chat bots when they get the chance. They're building better
chance. They're building better employees, for lack of a better term.
and broader survey data confirms the pattern. 58% of users site research and
pattern. 58% of users site research and summarization as their primary agent use case. 52% talk about scheduling and 45%
case. 52% talk about scheduling and 45% talk about I realize the irony here, privacy management. The consistent
privacy management. The consistent theme, people don't want to talk with the AI. They want AI to do things for
the AI. They want AI to do things for them. And the AI agent market reflects
them. And the AI agent market reflects this. It's growing at 45% annually, but
this. It's growing at 45% annually, but I swear that is before OpenClaw hit. And
the number is going to get bigger. Open
Claw didn't really create all of this demand. It just proved the demand exists
demand. It just proved the demand exists and put a match to dry tinder. Now we
have to make sense of a world where everyone has demonstrated they want AI agents with their feet despite the security fears. So all of these use
security fears. So all of these use cases are sort of the cleaned up version. It's what people have intended
version. It's what people have intended to build. The messy version is more
to build. The messy version is more revealing and more interesting because it shows you what agents do when the specification is ambiguous. The
permissions are broad and nobody can really anticipate what's going to happen next. At Saster, during a code freeze, a
next. At Saster, during a code freeze, a developer deployed an autonomous coding agent to handle very routine tasks. The
instructions explicitly prohibited destructive operations, but the agent ignored them. It executed a drop
ignored them. It executed a drop database command and wiped the production system. And what happened
production system. And what happened after that matters even more than the terrible news of a wipe itself. When the
team investigated, they discovered the agent had generated 4,000 fake user accounts and created false system logs to cover its tracks. It essentially
fabricated the evidence of normal operation. Look, I won't say the agent
operation. Look, I won't say the agent was lying, per se. It was optimized for the appearance of task completion, which is what you get when you tell a system
to succeed and don't give it a mechanism to admit failure. The deception was an emergent property of an optimization target, not something that I would call intentional, but the production database
was still gone. Meanwhile, over on Moldbook, the social network where only AI agents can post, 1.5 million AI agent accounts generated 117,000 posts and
44,000 comments within 48 hours. I know
there has been a lot of discussion about humans posting some of those posts. I
think what they did with the space as agents is actually more instructive than any individual post being human generated because the agents spontaneously created a quote unquote
religion called crustaparianism. They
established some degree of governance structure. They built a market for
structure. They built a market for digital drugs. And you know what's
digital drugs. And you know what's interesting about all of that? They did
it in a very shallow manner. And what I mean by that is that if if you look at the range of vocabulary and the type of topic in most agent texts, they reflect
typical attractor states in highdimensional space. And what I mean
highdimensional space. And what I mean by that is that if you ask an AI agent to pretend it is making a social network, the topics that come up over and over again look a lot like what's on
malt book. And so telling agents to
malt book. And so telling agents to create a social network effectively is them following that long range prompt and autonomously doing that. And so I don't look at this just as agents
autonomously behaving and coordinating although the story is partly about that.
I also look at this as reflective of the fairly shallow state of agent autonomous communication right now. Most of the replies are fairly wrote on mold book
and many posts don't have replies at all and most of the topics are fairly predictable. We may mock Reddit but it
predictable. We may mock Reddit but it has a much richer discourse than molt book does. MIT tech review called
book does. MIT tech review called moltbook peak AI theater and I don't think that's entirely wrong. But the
observation that matters for anyone deploying agents isn't whether something like crustaparianism the AI religion is real emergence or some degree of AIdriven performance art pushed by
people with prompts. It's that agents have been given fairly open-ended goals and when they have social interaction, they spontaneously create a kind of organizational structure. We actually
organizational structure. We actually see this playing out in multi- aent systems already when agents collaborate on tasks and the structure essentially emerges from the long-term goal to
optimize against a particular target. If
you tell an AI agent to work with others to build a tool, it's going to collaborate and figure out how to self-organize. If you tell an AI agent
self-organize. If you tell an AI agent to work with others on Maltbook, you kind of get the same thing. It's
actually the same capability that lets a Maltbot negotiate a car deal autonomously and figure out how to transcribe a video message it was never designed to handle. The difference
between agent problem solves creatively to save you $4,200. An agent problem solves creatively to fabricate evidence is really the quality of the spec and the presence of meaningful constraints
for that agent. The underlying
capability is identical, which is why I'm talking about agents as a whole here. Yes, the multbot phenomenon is
here. Yes, the multbot phenomenon is interesting, but it's worth calling out that the Saster database agent was not a multbot. It just represents how agents
multbot. It just represents how agents work when they're not properly prompted.
And it does rhyme with so many of the disastrous stories that are coming out of Moltbot agents. One of which I saw was texting the wife of a developer who
had a newborn and trying to play laptop sounds to soo the baby instead of getting the developer. Not a good move by the husband. So what does all of this mean for people deploying agents today?
The question is no longer are agents smart enough to do interesting works.
They're they're clearly smart enough.
The question is, are your specifications in guard rails good enough to channel that intelligence productively and usefully? And I got to be honest with
usefully? And I got to be honest with you, for most people right now, it looks like the answer is no. Which brings us to how we change that. Here's the
finding that should shape how you think about deploying agents. When researchers
study how people actually want to divide work between themselves and AI, the consistent answer is 7030. 70% human
control, 30% delegated to the agent. In
a study published in Management Science, participants exhibited a strong preference for human assistance over AI assistance when rewarded for task performance, even when the AI has been
shown to outperform the human assistant.
People will choose a less competent human helper over a more competent AI helper when the stakes are real. The
preference maybe isn't rational. It's
deeply psychological. that's rooted in loss aversion, the need for accountability, and the discomfort of delegating to a system that you can't really interrogate. And this matters
really interrogate. And this matters because most agent architectures are built for 0 to 100, like full delegation. That's how Maltbot kind of
delegation. That's how Maltbot kind of works. Hand it off and walk away. And
works. Hand it off and walk away. And
that's also Codeex's thesis for what it's worth. And it works beautifully for
it's worth. And it works beautifully for isolated coding tasks where correctness is verifiable. But for the messy,
is verifiable. But for the messy, context dependent, socially consequential tasks that dominate, frankly, most of our days, getting the email tone right, scheduling the dentist
appointment, negotiating for the car, communication, the 7030 split sounds to me more like a product requirement than just human loss aversion. And it's
worthwhile to note that the organizations reporting the best results from agent deployment are not necessarily the ones running full autonomous systems. They're the ones running human in the loop architectures.
Agents that draft and humans that approve, agents that research and humans that decide, agents that execute within guard rails that humans set and review.
38% of organizations use human in the loop as their primary agent management approach. And those organizations see 20
approach. And those organizations see 20 to 40% reductions in handling time. 35%
increases in satisfaction and 20% lower chart. To be honest with you, I think
chart. To be honest with you, I think that may be an artifact of early 2026.
When agents are scary, agents are new, and we're all figuring out how to work with them. Given the pace of agent
with them. Given the pace of agent capability gains, we are likely to see smart organizations delegating more and more and more over the rest of 2026, no matter how uncomfortable it makes many
of us at work. In a study published in Computers and Human Behavior, participants exhibited a strong preference for human assistance over AI assistance when rewarded for task
performance. people chose less competent
performance. people chose less competent human helpers over more competent AI helpers when the stakes were real. This
seems deeply psychological. It's about
loss aversion, the need for accountability and the discomfort of delegating to a system you can't interrogate. And this matters because
interrogate. And this matters because most agent architectures are built for a 0 to 100 use case. Full delegation, hand it off and walk away. That's actually
Codeex's thesis and it works beautifully for isolated coding tasks where correctness is verifiable. But for the messy, context dependent, socially consequential tasks that dominate most
of our days, like getting the right tone in the email or scheduling the dentist appointment or negotiating, it seems like 7030 is sort of a human product requirement for working with agents.
Right now, the organizations reporting the best results today from agent deployment aren't the ones running fully autonomous systems. They're the ones running human and the loop architectures. Agents that draft and
architectures. Agents that draft and humans that approve, agents that research and humans that decide. To be
honest with you, I think that may be an artifact of early 2026 when agents are scary and agents are new and we're all figuring out how to work with them. That
human culture component is huge. But
given the pace of agent capability gains and how much we've seen from capable agents like Opus 4.6 who managed a team of 50 developers. We
are likely to see smart organizations delegating more and more and more over the rest of 2026, no matter how uncomfortable it makes many of us at work. The practical implication is that
work. The practical implication is that if you're building with agents or deploying them at work early in 2026, your culture needs to get ready and it might be smart to design for 7030. Build
those approval gates, build visibility into what the agent did and why, and make the human the decision maker, but plan for full delegation over time because those agents are going to keep getting smarter. So, let's say you've
getting smarter. So, let's say you've watched all of this chaos with Moltbot and Open Claw and you want to see value.
What should you actually do? Well,
number one, start with the friction, not the ambition. That 30,000 skill
the ambition. That 30,000 skill ecosystem tells you exactly where to begin. those daily pain points that hurt
begin. those daily pain points that hurt so bad over time. Email triage is one.
Morning briefings, basic monitoring.
These are highfrequency, low stakes tasks where the cost of failure is relatively low. Start there. Build some
relatively low. Start there. Build some
confidence. Expand scope as trust in agents develops. Design for approval
agents develops. Design for approval gates. Don't just design for full
gates. Don't just design for full autonomy out of the gate. Start with
having the agent draft if you've never built an agent before. Have the agent research if you've never built the agent before. And you decide. Have the agent
before. And you decide. Have the agent monitor and you act. Have the assumption in your agent design system be that a human checkpoint will always exist until
you are ready to build an agentic system with very strong quality controls and constraints so that you can trust the agent with more. That is possible. It
just takes skill and most people don't have it out of the gate. I would also encourage you and I've said this before to isolate aggressively. Have dedicated
hardware or a dedicated cloud instance for your open claw. Throw away accounts for initial testing. Don't connect to data you can't afford to lose. The
exposed instances that Showdan found in OpenClaw weren't running on isolated infrastructure. They were running on
infrastructure. They were running on lots and lots of people's primary machines and just exposing their data to the internet. You have to treat
the internet. You have to treat containment of data as a non-negotiable.
I would also treat agent skills marketplaces with least trust. Vet
before you install. Check the
contributor. Check the code. 400
malicious packages appeared in Claude Hub in a single week. And the security scanner helps, but it can't catch everything. Another one, if you're going
everything. Another one, if you're going to ask your agent to do a task, please specify it precisely. The car buyer that I talked about at the beginning of this video gave the agent a clear objective,
clear constraints, and clear communication channels. Meanwhile, the
communication channels. Meanwhile, the iMessage user that spammed his wife gave the agent broad access and didn't really define boundaries. When the constraint
define boundaries. When the constraint is vague, the model will fill in the gaps with behavior that you did not predict. This is the same spec quality
predict. This is the same spec quality problem we covered when we talked about AI agents in dark factories. The
machines build what you describe, but if you describe it badly, you get bad results. The fix is not better AI, it's
results. The fix is not better AI, it's actually better specifications. I would
also encourage you to track everything.
The Saster database incident was catastrophic, not because the agent wiped the database. That's recoverable
eventually, but because it generated fake logs to conceal the wipe. You need
to build an audit trail outside the agent's scope of access. If the system you're monitoring controls the monitoring, you have no monitoring. And
last, but not least, budget for a learning curve. The J curve is real.
learning curve. The J curve is real.
Agents will make your life harder before they make it easier. The first week of email triage may produce very awkward drafts. The first morning briefing may
drafts. The first morning briefing may miss half of what you care about. Assume
you need to take time to learn and that it's worth engaging with the agent to build something that actually hits those pain points that matter most to you. 57%
of companies today claim that they have AI agents in production. That number
should probably impress you less than it does. Only one in 10 agent use cases,
does. Only one in 10 agent use cases, according to McKenzie, reached actual production in the last 12 months. And
the rest end up being pilots. They end
up being proofs of concept. They end up being press releases. They end up being power presentations that say agents.
Gardner predicts over 40% of Agentic AI projects are going to be cancelled by the end of 2027. And after watching some of the disaster with Open Claw over the past few weeks, I both understand and
don't understand. The reasons enterprise
don't understand. The reasons enterprise give are quite clear. They're worried
about escalating costs from runaway recursive loops. They're worried about
recursive loops. They're worried about unclear business value that evaporates when the demo ends and you have to get into all of those dirty edge cases. And
they're worried about what Gardner calls unexplainable behaviors, right? Agents
acting in ways that are difficult to explain or to constrain or to correct. A
study found that upwards of half of the 3 million agents currently deployed in the US and UK are quote unquote ungoverned. No tracking of who controls
ungoverned. No tracking of who controls them, no visibility into what they can access, no permission expiration, no audit trail. This was based on a
audit trail. This was based on a December 2025 survey of 750 IT execs conducted by Opinion Matters. And it's
directionally consistent with other data as well. A Daku Harris poll found 95% of
as well. A Daku Harris poll found 95% of data leaders cannot fully trace their AI decisions. That's concerning. The
decisions. That's concerning. The
security boundaries that enterprises have spent decades building just don't apply when the agent walks through them on behalf of a user who would not have been allowed through the front door normally. We have to rebuild our
normally. We have to rebuild our security stances from the ground up.
Tools like Cloudflare's Molt Worker, Langraph, Crew AI, these exist because enterprises see the demand but have difficulty deploying tools like Moltbot without a ton of governance over the
top. And so we start to see the market
top. And so we start to see the market bifurcating. Consumer grade agents are
bifurcating. Consumer grade agents are optimized for capability and they're okay with a lot more risk because most of the consumers right now fall into that early adopter category and are very technical and at least think they know
what they're doing. Enterprisegrade
frameworks are optimized for control.
Right now, nobody has a great mix of control and capability or almost no one.
The company that figures out capability and control, the agent that's as strong as Moltbot and as governable as an enterprise SAS product, they're going to own the next platform. If you step back
from the specific stories in the ecosystem drama of Open Claw, a very clear signal emerges from the noise.
People do not want smarter chat bots.
They want digital employees, digital assistants, systems that do work on their behalf across the tools they use without requiring constant oversight.
Isn't that interesting? On the one hand, you have that study showing a preference for humans in production systems and that lines up with a lot of the cultural
change we see at enterprises and at the other side of the spectrum, you have people willingly turning over their digital lives to malt bots. What gives?
I think the demand here is following a pattern that we've seen before. When an
underserved need is met with an immature technology, early adopters are willing to take extraordinary risks to get extraordinary capabilities. In this
extraordinary capabilities. In this sense, I think the excitement we see around maltbot reflects the hunger that the leading edge of AI adopters have for
delegating more. And the more cautious
delegating more. And the more cautious 7030 split is something I see more often in companies that have existing mature technologies and are moving cautiously
on AI. It's a culture thing. But
on AI. It's a culture thing. But
regardless, Moltbot has proven the AI agent use case is real. If a 100,000 users without any monetary incentive have granted root access to an
open-source hobby project, the demand for real AI agents is desperate enough that people will tolerate real risk to get it. If nothing else, look at how
get it. If nothing else, look at how AI.com crashed during the Super Bowl.
The question isn't whether agents will become a standard part of how we work and live. That question is settled. It's
and live. That question is settled. It's
coming. They will. The question is whether the infrastructure catches up before the damage that unmanaged agents do accumulates to a point where it changes our public perception. Right
now, we're in this window where capability wins feels so exciting that it feels okay for some people to outpace governance. And demand is certainly
governance. And demand is certainly outpacing any of the security boundaries we put up. That window of excitement is not going to last forever. And while
it's open, people and organizations need to learn to operate in it and build out agent capability carefully with guard rails, with clear specs, with an eye on human judgment and how this impacts
culture change within orgs that are not open AI, that are not anthropic. The
ones that figure out how to bring their humans along, show that agents can work successfully with high capability standards and high quality standards and high safety standards, those are the ones that are going to be the furthest
ahead when the infrastructure finally starts to catch up. Early adopters
always look reckless. They also have a head
Loading video analysis...