AI agents in 2025: Why agentic commerce isn't ready for Black Friday yet
By IBM Technology
Summary
## Key takeaways - **Claude 4.5 Opus Leads Coding**: Claude 4.5 Opus is 50% more token-efficient than Claude 4 Opus, making it the best model for coding, outperforming recent releases from Google and OpenAI in initial tests. [01:46], [02:27] - **Black Friday No Agentic Boom**: Agentic commerce won't breakout on Black Friday due to early-stage partnerships like OpenAI-Shopify (US-only), nascent protocols, and underwhelming product research capabilities. [10:05], [10:37] - **Agents Thrive in Enterprise Backend**: 15-20% Black Friday returns are handled by backend agentic workloads at big retailers like Amazon, optimizing processes invisibly to consumers. [14:42], [15:17] - **POC to Production Gap Huge**: Prototyping agents is easy with no-code tools like Langflow, but scaling to hosted, shareable deployments requires complex infrastructure patching and lacks Shopify-like simplicity. [23:02], [24:37] - **Need Shopify Moment for Agents**: Developer ecosystem requires a Shopify-like platform to democratize agent deployment, enabling non-programmers to describe workflows in English without coding. [26:46], [27:11] - **No Single Agent Winner**: Agent race is an infinite game where players exhaust resources; it will spawn a creator ecosystem breaking Web2 monopolies via model composition and marketplaces. [38:30], [39:10]
Topics Covered
- Claude 4.5 Tops Coding Efficiency
- Agentic Commerce Lags One Year
- Agents Thrive in Backend Returns
- English-to-Agent Democratizes Development
- Infinite Agent Game Favors Creators
Full Transcript
I don't think this is a finite game. I
don't think there is a winner. I think
this is the classic Simon Sync Infinite game. I think um the players are going
game. I think um the players are going to play until they run out of resources and they can no longer play the game.
And I think we're going to win, right? I
think it opens up a creator ecosystem.
What I hope that it breaks up is all these kind of web 2 massive companies controlling everything and we can get a a more uh surrounded marketplace. All
that and more on today's Mixture of Experts.
I'm Tim Huang and welcome to Mixture of Experts. Each week, Moe brings together
Experts. Each week, Moe brings together a panel of the smartest and most charming thinkers in technology to distill down what's important in artificial intelligence. Joining us
artificial intelligence. Joining us today are three incredible panelists.
We've got Chris Haye, distinguished engineer, Lauren McHugh Oende, program director AI open innovation, and Vulmar Ulleig VP core AI and Watson X AI. So,
this is our Thanksgiving episode and we're going to change up the format a little bit. Rather than ticking through
little bit. Rather than ticking through the news of the moment, we're going to take a step back and have a focused discussion about the bigger picture.
But, as always, we've got the headlines with Hi everyone, I'm Eile McConnen, a tech news writer for IBM Think. As always,
I'm here to cover your top AI news of the week. But instead of running through
the week. But instead of running through a bunch of headlines today, we're actually going to focus on one, the big news of the week. Anthropic's new Claude 4.5 Opus model which just dropped. And
to do this, I'm joined by our expert, Mihi Creetti, distinguished engineer for Aentic AI.
>> When I heard it was about the latest model from cloud, I couldn't resist. So
happy to be here.
>> What would you say is the most important thing that users need to know about Anthropic's new Claude 4.5 opus?
>> I think it's just how efficient it is with its tokens. It's 50% more efficient than Cloud Opus 4.1. So even though it's a reasoning model, even though it can be a fairly expensive model, it's cheaper,
but it also consumes 50% fewer tokens when reasoning. So it's it's one of the
when reasoning. So it's it's one of the most efficient models out there. So
don't be afraid to use it. Give it a try. I think the best way to use it is
try. I think the best way to use it is through cloud code and leveraging those capabilities and it really performs very well for how token efficient it is.
>> Mihi, what are some of your initial reactions to the model after playing with it? Yeah, it's all kind of fresh
with it? Yeah, it's all kind of fresh because I think it all happened like 21 hours ago. So, this is all fairly new.
hours ago. So, this is all fairly new.
Um, but I've already been using it. I've
been using it with both, you know, the desktop application and with cloud code and I can say this is by far the best model for coding and the bar has already been set quite high. Uh, as you know,
Google has released Gemini 3 Pro quite recently and I think uh two three days ago. Um, OpenAI released GPD 5.1 Pro
ago. Um, OpenAI released GPD 5.1 Pro which has great reasoning capabilities, but also they've released GPD 5.1 Codeex Max, which makes their codeex uh I would
say agentic platform really really good.
I suspect at least from my initial testing that it was on par or even better than um cloth code with um the
previous 4.1 models or opus 4.1 or 4.5 set but now with Opus 4.5 I believe uh entropics has regained the lead in terms
of the best model for coding still doing some initial testing on it but it's performing really well. That's super
interesting. And you mentioned this comes, you know, shortly after the release of Gemini 3 and we've had some other big models. You know, what do you make of the the timing of Anthropics release? Obviously, it's been a busy
release? Obviously, it's been a busy fall for big releases of coding agents.
>> This can't be a coincidence. I'm
wondering if these vendors just have these models ready to go. Um, they just might not have the best, I would say, uh, price performance or they're waiting from another announcement from their
competitors before they put them out to market. uh because the timing was really
market. uh because the timing was really really good. I mean within the span of
really good. I mean within the span of three days we got you know three world leading models or all for code all outperforming each other in various benchmarks which is you know quite interesting.
>> And are there you mentioned you know three uh strong performing new agents you know being released in quick succession. Is there any aspect of um
succession. Is there any aspect of um the cloud 4.5 opus that is different or that you know obviously it has it sounds like slightly superior performance but you know what what makes this release
different if it is from your perspective >> I think it's the pricing as well they're able to reach much better price per performance or number of tokens used
than the previous 4.1 opus um I was at some point where I was just using um 4.1 opens for the planning or the more complex tasks and using 4.5 set or the
previous models for the actual work cuz they have, you know, a very large uh context window. They were cheaper, they
context window. They were cheaper, they were faster. I think they've been making
were faster. I think they've been making some optimizations in terms of things like pricing and performance as well for Oppus 4 lat 5 and it feels you get a bit more bang for your buck than the
previous versions of um you know for lot one opus >> and can you talk a little bit more about how you know they've achieved that those uh a lower pricing you know are there sort of innovations in how they're
approaching things that have helped them um to do that >> I'm not sure it's necessarily to do with innovation maybe it just has to do with more availability of the hardware. You
know, they've recently announced some very, very strong partnerships with both u Microsoft Azure and with uh Google to use, you know, more GPUs, more TPUs. Uh
I think part of it just has to do with having more availability of the infrastructure and being able to reach a bit further and say, hey, we're putting some of our best models out there as the default. I was actually quite surprised
default. I was actually quite surprised and maybe even shocked that when I've opened up uh cloud code, it recommended 4.5 opus as the default, which is somewhat
unusual. It might say something like,
unusual. It might say something like, "Hey, we're going to use for.1 or we're going to use for 5 for uh thinking and then we're going to transition into 4.1, but outright they've launched a new
version of uh cloud code and it used for five opus as a default."
>> And I guess it's hard to predict, but how long does Anthropic have the lead?
you know, we've had is it 48 hours? Will
we have something new by um by next week or you know, does this set itself apart such that you'll be using it yourself at least as your top choice for at least the coming weeks?
>> I think the way I see it and I'm I'm using all three models. I'm using Gemini for a lot of my deep research, my
research, my advisory. I'm using codeex sorry cloud code with um opus for writing code or for you know writing test cases and I'm using uh codeex with
GBD for reviews or for anything else uh after so I'm actually using all three but I would say by number of tokens it's still the entropic models they're
I'm using the most tokens with uh they're writing the vast majority of the code for me because they perform the best for this particular use case. uh I
don't see it necessarily as a situation where if I had a different use case I would still use the mouse from from entropic if it was you know u summarization or content generation or
creative thinking or creative writing u maybe I would lean more towards GBD5.1 pro um but at least for in the area of code it could also be a combination with
what they put in uh the cl code tooling it still seems to outperform um codeex at least for my use cases or from my my personal experience.
>> Do you think for that specific area of coding does this sort of have enterprise application or usefulness or you know do you think that's part of the the play in trying to bring the cost down to make it
sort of um more enterprise friendly or yeah how do you see that? Yeah, I think the strategy of making this available through the various hyperscalers at um I would say at a reasonable cost is going
to help with enterprise deployments because many of the enterprises are never going to consume the models directly from the provider. So you know you're not going to go off and consume
this on if you're using models from open AAI you're going to consume them likely on Azure from Microsoft not necessarily directly from chat GPT. Uh same thing goes with cloud. You're going to consume
it through maybe AWS bedrock, but there has been somewhat of a limiting choice.
Uh for enterprises with its availability through Microsoft Azure, for example, this has really opened it up to to enterprise customers and uh uh consumers.
>> Is there anything else about the model um you know that's worth worth talking about that we haven't covered yet that sort of strikes you as interesting?
Yeah, I think um clearly they've optimized it for agents, they've optimized it for computer use, they've optimized it for coding, but I like the fact they've also optimized it for things like building PowerPoint slides,
which was maybe at um something that, you know, even Microsoft was looking at these models for for use in Office 365 or, you know, PowerPoint generation, slide generation. So, they're not
slide generation. So, they're not looking just at software development use cases. They're now starting to tackle a
cases. They're now starting to tackle a lot of other enterprise use cases. you
know being the best model for generating a PowerPoint slide or generating a word document of generating and working with you know XML or working with the schema uh required to build those documents. So
I I I'm I'm pleased and happy to see that models are being optimized for these enterprise use cases.
>> I'd be happy to have a model uh to do my PowerPoint slides as well.
>> Yeah, 100% >> to optimize. Thank you Mihi so much for joining our conversation. And now we're going to return to our special Thanksgiving episode. Happy holidays
Thanksgiving episode. Happy holidays everybody.
Thanksgiving, I think, is like a really good time to be talking about agents because, of course, agents have been very much hyped in 2025, but Agentic Commerce has been one of the things in agents that people have been excited
about. And you know, this week is going
about. And you know, this week is going to feature Thanksgiving, but also importantly, Black Friday, which is one of the biggest shopping moments of the year. And so, I guess maybe Chris, I'll
year. And so, I guess maybe Chris, I'll kick it to you first. you know, do you think this week is going to be breakout moment for agents in Agentic Commerce?
You know, why or why not?
>> No, I don't think it will be. I think
we're probably another year away from that. Um, why not is that I think all of
that. Um, why not is that I think all of the ingredients are getting in place. So
if you think about what OpenAI's done, they're now bringing on board the ability to shop in their channel for uh commerce products and they partnered
with uh you know uh you know Shopify etc. But there's so many commerce retailers that they've not on boarded yet. So I think that's really early and
yet. So I think that's really early and it is US only at the moment. And then
Google has released their uh agent commerce protocol and again that's really early at the moment. So I think we're and agentic browsers haven't quite
taken off yet. So I I I just think we're about a year away from that. Now what
where where I do think it's going to become relevant is utilizing web search and deep researchers from within chat GPD to find the products that you want.
That is going to be big and that is disrupting retailers. But I I I don't
disrupting retailers. But I I I don't see a massive effect on Black Friday this year. And what's sort of
this year. And what's sort of interesting and I'd love to kind of parse that out a little bit more is I guess Chris you've listed like a couple key components right it's almost like the agentic browser is not quite there
the uh you know the the like partnerships are not right there from a business standpoint I guess Lauren do you kind of agree with this assessment do you think like you know this week is going to be big for Aenta commerce I guess you know Chris is almost saying
like I guess there is some in so far as people are using it to find products but um it's it's not obviously what we were promised in the in the the the exciting early days of 2025.
>> Yeah, my feeling is it's also not going to be so different from last year. So,
the you know, the protocols that Chris talked about will help in automating the actual checkout. Once you're using chat
actual checkout. Once you're using chat GBT and you find the thing you want, you can automate the checkout. But I'm not sure that was really ever the biggest problem. You know, I didn't have a big
problem. You know, I didn't have a big problem putting in my credit card manually once I get to the link. They
spent a lot of time making it easy to spend money on the internet.
>> Yeah.
>> I mean, we've had automated checkout for a while. It was very hacky. You know,
a while. It was very hacky. You know,
that's why it's hard to get concert tickets is because it is possible to build browser automation to buy things automatically. So, I don't see a big
automatically. So, I don't see a big revolution coming from the simple act of being able to check it out once you're in the AI application. And I do think
too that even the product research capabilities are a bit underwhelming.
You know, when you know you're looking for something specific like size, dimensions or a style or something, it's not it's not always easy to find that
just through the kind of interface that we have now. And I think a lot more work could be done on both training AI models
on e-commerce relevant information. So
you know what are the input output pairs of input this was the customer intention output this is what they ultimately selected and you know allow the AI model to build those pattern recognition of
okay you know that intention really meant this size of things or this configuration so I think there's definitely work that could be done on that front of having the models
themselves better uh working better for e-commerce and then I think there's also you know when you fit those models into a aent authentic pattern. How do you you
know prompt it? How do you build in the steps of that process? So you know like first by start start by identifying the retailers you want to look in then you know get the information you need on
their products then compare them like that's a whole flow that using a general purpose um chatbot or you know chat GPT or whatever pick your system is not it's
not made specifically for that. So I
think if we had more workflows that were built specifically for that, the performance of those would go could go, you know, through the roof that you know when you have an intention, you can get a specific link to that product right
away. So I think that's really where I'd
away. So I think that's really where I'd like to see more improvements.
>> Yeah, definitely. Vulmar, I'd love to bring you in because I think your angle of this is really interesting. When we
talk about agents, obviously we tend to talk about like higher up in the stack, right? It's like the application's not
right? It's like the application's not quite there. Even like the business
quite there. Even like the business partnerships are not quite there to get this to work. Is there a hardware limiter to the world of agents really taking off particularly in commerce but I guess otherwise or is this not like
almost not even in the picture? So I I would take a completely different stand right I think it is actually the year of agents and the reason is very simple um
if you look at black Friday 15 to 20% of the stuff gets returned right and so I think the agents are not the consumerf facing agents but the agents are
actually the back end and I think this is where the true adoption happens where stuff you know people are returning it's the same after Christmas like you know it's like statistics like it's unclear
somewhere between 15 and 25%. It depends
on the product category. So if you look at Amazon today already um you know if you want to return something it used to be like you know they they click five buttons and then they're like okay good
ship it back or no we reject it and you had to make a phone call. Now that's all done through agentic workloads. And so I think from from the big retailers and probably not I mean Shopify at some
point will offer it as well but the big retailers uh are already in that motion of actually optimizing that backend flow. I do not know what they are doing
flow. I do not know what they are doing when the product hits you know their their shipping center. um if they have agentic workloads there I'm sure they do
but that first customer touch point uh effectively doing like the return I think that's where the the majority of the labor is on their side because the the front end is very optimized so um we
just don't see it as a consumer but we see it indirectly because actually return is easier >> yeah definitely and I think this is kind of a pattern I did want to talk about I mean just zooming out from e-commerce
right or like you know buying stuff online is I think it's almost easy as like a consumer to be like agents are the dog that didn't bark in 2025 because yeah most of the people I know are not
using agents every single day for all sorts of things but it does seem like on the back end on the enterprise there is a lot of agent activity and so it's sort of interesting is that kind of like the public face and public experience of
agents is like very underwhelming whereas you know Vulmar there's like stuff under running under the hood for like returns which are like very much identified um and so we have this kind of split screen that's happening in in
agents that might actually fool us about just how far this thing is going.
>> And I think that I mean if you look at programming agents it's pretty complicated you know what you need to do and you need to coers it into doing the right thing and so I think the consumer will always uh consume agents indirectly
through products and so it's not like you know I mean we have like these apps on the phone where I can automate stuff.
I don't know I I have one automation on my iPhone which is like when I'm driving close to our community open the gate.
Okay, that's the only automation I have out of all the automations I could build. And so humans typically like you
build. And so humans typically like you know they want to have a packaged product which just solves the problem.
And I think this the beauty of what CHPT did is giving you that that one line everybody knows how to use from Google over 20 years. Um and it's it's really
easy to consume and now they're building similar to Google all these capabilities in. So I think that's how we will as a
in. So I think that's how we will as a human we will consume agents. Um but
then in the enterprise no you take your business process and where every every place you have a human you can actually try to put an agent and so I think we will see adoption indirectly but not
directly now is it the year of the agents I think we are still in the PC phase and some companies and so in that sense I just want to have a contrarian
opinion I think to a certain extent um the it is probably the year of the agent PC's let's call it this way >> okay yeah pilot agent Yeah, the pilot agents. Yes.
agents. Yes.
>> Yeah. Chris, go.
>> Yeah. No, it's it's the year of the agents. I don't you know, I disagree
agents. I don't you know, I disagree with Mar. It's the year of the agents.
with Mar. It's the year of the agents.
So, >> you don't need to caveat that. Uh you
say yeah >> it just is, right? I mean, we we need to think about this for a second, right?
So, if we >> look at what has happened with chat GPT, right? And we'll we'll start from there
right? And we'll we'll start from there and then move outwards. But um
integrated both in it claw Gemini you've got web search capabilities which is which is tool calling you've now got the model catalog so you can hook up things like your Jira your you know and pretty
much anybody who's got a service uh on the internet you can hook up as a connector now and that is basically tool calling everybody's offering deep researcher which is a agent behaviors um
and then probably the biggest star of them all is going to be the coding agents right that's just went crazy especially things like claude code if you think about things like lovable etc. um everybody uh you know codecs
everybody's using codec uh coding agents uh to get their work done and the biggest thing that's made the difference there is given access to tools so I I think agents are here and as I said at
the beginning of the year year to super agent is that um the fact is that with planning and reasoning these agents have
became really really capable so I I think uh it can still feel PC like because everything's maybe not agent agent agent in the way that you think,
but we are all pretty much using agents every day. We're just not thinking about
every day. We're just not thinking about them in that way.
>> Yeah. And I guess they're not weaved together in kind of like the cohesive experience that we've been promised, right? Like Chris, you began by saying
right? Like Chris, you began by saying like, well, it's not like Black Friday like agent commerce is really going to be happening everywhere because all these pieces are still missing. They're
they're there, but they just haven't been kind of orchestrated in a certain sense. Um, Lauren, I I always joke, you
sense. Um, Lauren, I I always joke, you know, the agentic consumer demo is always like, you need to book a trip and he's like, push the button and the trip is booked. Um, and I guess kind of what
is booked. Um, and I guess kind of what Vulkmar is saying is like that that isn't that isn't happening. Might take a really long time to happen. Do you think it eventually will like will we get to
the much more kind of like consumerry agentic experience, right? That like I think that is the source of all these like splashy videos and startups that people are working on. Um, or is that or is that kind of I mean Vulmore what I
heard you saying is almost a little bit like the future may not actually look a whole lot like that just because of all the things you need to package and so like all the agents might always be kind of a little bit in the background but
I'm curious about kind of like how far to the consumer you think the age agent experience will look like.
>> Yeah, I think the trajectory of just LLMs standalone is a really interesting one to compare this to. So LLMs we had,
you know, 2017 the Transformers paper.
Um 2018 was the year that we got BERT and was the year that we got GPT1 and then 2022 was when we got those
things available to the end consumer in a very very easy way like in the form of a web app or a mobile app. So I feel like where we are with agents is maybe
that 2018 like you know not purely research paper level but still not in the hand not 2022 in the hands of every single person. You know we have like our
single person. You know we have like our GPT1 BERT kind of demos and things to look at and then I think the big question is will it take four years to
get into the hands of everyone like it did with LLMs? You know, there's definitely reason to think that across the board, these timelines are accelerating. So maybe it could be less
accelerating. So maybe it could be less than four years. We have way more attention and investment in this technology than we did with, you know, there was just low awareness amongst
the, you know, community of investors and people who are going to nudge this along back in 2018 of LLMs. Um, so could it be faster because of that or could it
be longer because maybe it's a lot going to turn out to be a lot more complicated than getting LLMs into production and into everyone's hands. So
>> yeah, it's like I I do like the idea this like background hype on AI makes all downstream AI things happen faster because everybody's paying attention to it now. Um, and I guess Lauren, what I
it now. Um, and I guess Lauren, what I think this is actually one thing I did want to ask you a little bit about is that like a big part of this acceleration is whether or not it's easy for people to develop agentic platforms,
tools, applications. And do you want to
tools, applications. And do you want to give just kind of a flavor of the state of the developer ecosystem right now?
Because I feel like that's a critical thing. I mean, I think Vulmar said a
thing. I mean, I think Vulmar said a moment ago, right, like getting these to work still takes a lot of work, right?
And so I feel like that in some ways limits our progress just because like the number of organizations and people who can actually do this is like small.
And so one way you increase progress is you just make it easier for people to develop for it. And so interested in how you see the developer ecosystem around this uh evolving.
>> I think it's a really really fun time to be a developer if you want to try and you want to experiment and you can I mean you can do that no code. There's
things like lang flow that you I mean it's visual to build an agent and drag and drop. That's super cool. that helps
and drop. That's super cool. that helps
you not waste a lot of time coding something that ultimately you know the data is just not there or the LLM just doesn't understand all the way to like the pro code there's lane chain lane
graph crew AI autogen semantic kernel there's I mean your choice of things and some are a bit more easier and abstracted to use some give you full control if you want it so I think if you
want to try you have all of the tools to do that that should never be the problem I think if you want to actually deploy deploy that and take it out of, you know, a very tightly controlled
environment with a very, you know, precisely specified use case, which is probably book a trip. Like you said, if you want to ever, you know, expand beyond that, actually have it hosted
somewhere, somewhere where you could invite your friends to try it, that's where it immediately gets very complicated and there's far fewer just
obvious options of what you're going to use. You know, you might have to I mean,
use. You know, you might have to I mean, right now it would probably make sense to you want to deploy an agent and have it hosted somewhere. You have to figure out where to host the agent logic
itself, which is not really LLM type workloads and then a separate environment to host the actual inference and then patch those two things together. So, it's really not ideal. So,
together. So, it's really not ideal. So,
I think you know actually scaling up, sharing, hosting what you build is the hard part. So I think it's also one of
hard part. So I think it's also one of the inhibitors right so if you look at it right now we don't have this packaged all happy solution and it's the the entry barrier we are not at the Shopify
level right where the you know a mom and pop shop can say hey I want to have an agent and it should you know deal with something um you know and I think the
the there are some projects we have in IBM where we take the the um like the flows and the business um like
description and we're converting that straight up from English into like you know an a lang flow and so that transformation when we are at the point
that you can actually use English to describe what problem you want to automate um and not know anything about programming I think then you get uh you get it to the masses and then you can do
it on a cell phone right it's like hey when I come home I want the lights to be on and not like you know build automation you know when someone's like it's it's the the the interface is just
like really it's it's um it's a baby programmer interface for people who can program and that's why nobody uses it right but I mean there there's a logic and and I can describe that logic in English so now you need to be very
explicit but I think the models can already fill in the gaps they are smart enough for that if you can get to a point similar to you know right now we are doing English to code if we can get
English to agent then we are at a point that it's mass consumable and right now the interface is are still it's still built for programmers. It's not built for consumers.
>> Yeah, that's right. And I feel like that vision almost shortcircuits it which is like well do you need a developer ecosystem, right, for a whole set of applications? Um which I think is pretty
applications? Um which I think is pretty pretty interesting.
>> I think it's pretty obvious that you know someone like I always use Shopify but Shopify was this you know if you look in the 2000s it was like oh my god you can you can run a web server on the
internet that's amazing right? So I can build a billion dollar business and then Shopify came along and just in fact we democratized this. We are not yet at the
democratized this. We are not yet at the point it's still high-tech. It's not
democratized. But it's just a question of time that someone wraps it and says okay you know I make it really easy and and that easiness once you have that and you the complexity goes down by a factor
of 10 or 100 then it will be then you know everybody will use it because otherwise you die and I think there will be an an integration in already these type of commerce applications. Um, and
so the moment someone figures this out, well, it will just be wildfire. But I
think that pivotal moment hasn't happened. The Shopify moment for agents
happened. The Shopify moment for agents hasn't happened.
>> Yeah. There's almost a tension between these kind of two pathways. It feels
like where one of them is Vulmar where you're talking about which is like language to agent. If we got that really good and really powerful, then you almost don't need to build a lot of the kind of like deployment infrastructure,
I guess, in some ways that like Lauren you're talking about, right? which is oh we've got this like prototype we're building and then now it's got to be on some kind of rails for us to like make it more available. There's kind of a vision for I guess Vulma where you're
talking about which is like the consumer just simply types in what they want and then it it happens basically. Um, I I guess Chris, maybe to bring you into the conversation, I think going to kind of what Lauren is saying about like, okay,
right now there's lots of ways of kind of like prototyping an agent, but the minute you want to do anything more complicated or to scale it, there's just this like gap in the space. Um, do you have a sense of kind of like what's
necessary to sort of mature that right now? Is like we're I guess we're still
now? Is like we're I guess we're still waiting on the companies and platforms that are going to make that happen.
>> Yeah, I I think so. I mean I I think it is to to take things to your point from from P and MVP to scale is a hard
problem because you know consumers do crazy things right so you start to have to say well am I am I putting the LLM right in front of the consumer and if you are at that point then you need to
guard rail it and that could be things like guard models it could be running you know deterministic flows in conjunction with the AI to keep it on track to Vogmar's point about text to
plans if you look at something like claude code if you look at something like cursor wind surf etc almost all of these things have a built-in planner and and so when you ask a question the first
thing that happens is it goes to the planning module for anything complex and then the model is kept to the plan and you see that you know we talked about manis early in the year same sort of
thing right you ask a task it goes the planning module the planning agent uh kicks in creates the plan and then the agents execute to the plan. And and
there's a good reason that exists, which is if you give an LLM and agent uh a big list of tools, who knows what tool it's going to pick, right? And and and and my
favorite one at the moment for this is the Kimmy K2 model, right? I love the Kimmy K2 model. It can call 200, 300 tools. It has a long sequential range of
tools. It has a long sequential range of tool call and it can do a massive amount of tools. But you know what it is? You
of tools. But you know what it is? You
give it a tool, it's gonna call it, baby. You know what it's like? It's like
baby. You know what it's like? It's like
every tool that it's got, it's like, I will do it this way. I will do it that way. It goes off the rails. It's in a
way. It goes off the rails. It's in a phenomenal model, but it goes off the rails because it can't keep itself on track. And then even when you're
track. And then even when you're executing to the plan, quite often the models will either use its own memory or it will not even bother updating the progress. So, it will be like, "Oh, no,
progress. So, it will be like, "Oh, no, no, no. I know the answer to this." And
no, no. I know the answer to this." And
then just answer it, right? as opposed
to no I I need you to use the tool.
Right. Your your information that that that you've got isn't enough. I need you to use a tool. And it's like no no no I know I know this. I know this. And then
gives it Yeah. Exactly. And it's like you don't got it. I do. I do. I do. And
then and then and even if it does that once it's done the task when you're following a plan you want to go executed step executed step executed. And again
the model if if you're not deterministic and if you leave the model on its own it will skip steps in the plan or not even update it. Right? So actually to that
update it. Right? So actually to that point about frameworks when you want to start to get to production that's where those sort of frameworks become really important in place but but the reality
is that there's not a lot of you you you then back to a developer mindset to be able to put those frameworks in place to deploy. They're not out of the box. I
deploy. They're not out of the box. I
think I think when we see that being a mass thing, those frameworks are either just going to be part of the platform and ecosystem you deploy your code onto it or that's going to be solved at the model level.
>> So I guess in the last few minutes I want to talk a little bit about sort of we've been talking about technically what needs to happen for 2026 I guess to be the the the real year of the agent.
Um I'm interested a little bit in sort of like uh winners and losers and kind of platforms here, right? Um, you know, I I guess the question is like are the winners in agent land from a platform
standpoint going to be the winners in AI in general, right? Like is it going to be, you know, open AI and Enthropic that end up dominating the kind of like agentic ecosystem? Will it be, you know,
agentic ecosystem? Will it be, you know, some of the maybe the cloud players that really end up doing this? Uh, I don't know if anyone here has any strong priors on like who's well positioned to kind of really be the the major platform
for this space.
>> I think there are two questions to answer. So one is what model or what
answer. So one is what model or what model zoo do I need to use um to actually get good results and what Chris just said was you know that these things
go off the rails and so you need to kind of babysit them into giving you the the right answer. I had a case where I you
right answer. I had a case where I you know I tried to program something and it had an API call and the API call didn't work and so in the end the model just decided like oh I I just stub it and you
know I just call my own function and it's like look I'm done it works right and didn't do anything anymore. So it's
like the solution is don't do anything and then I'm good and it's like success congratulations. So I think there's a
congratulations. So I think there's a the whole like how do we manage the model and uh that's um that's a hard problem in itself right um building a
model like right now we are the state of the world is you need these frontier models because otherwise the reasoning capabilities are not not you know not prevalent enough now I think that
probably next year we will see people just building like planning models like you just focus on one thing get the planning right and then of course the the models underneath to execute the
plan and not go off the track and right now I don't think we have done that. Uh
and so I think the the the frontier models are really the only place where it can go right now but of course with humongous cost associated to it. So so
we will see um like the smaller models being specialized for planning. Um the
uh I think the second question is how you execute this and where you execute this. And I think that's a that's a
this. And I think that's a that's a really good question. My belief is and this where I'm also taking uh like the our product is AI is everywhere. There
is no place where AI is not right. So
the idea that we are like oh we're just putting a bunch of you know H100s or H200s in the data center and that's where all the AI will happen. That's
just not true. Like we will see pervasive application. It will happen on
pervasive application. It will happen on your cell phone. It will happen in the data center. And so um the real trick
data center. And so um the real trick here is who can make those agents cost-effective because in the end right now the work is done right in in Portugal in in business scenarios um the
work is done by labor right there is a person who is currently doing it by hand and what we are hoping for is that the agents a replace the people who do it by hand so that the people who are doing by
things by hand can do better things. Um
and then the other one is uh we want to have agents in the hands where right now the work doesn't get done at all. right
or poorly. So I we want to get more choices. And so that one is really now a
choices. And so that one is really now a cost optimization problem. And so I think there's an industry at the bottom of like we need to efficiently run that capacity that infrastructure so that we
bring down the cost of these agents you know by 10 or 100x and if we hit that then it will be pervasive. Right now we are using it primarily for like high value tasks which are incredibly labor
intensive right or which are very very controlled. So I can actually say I have
controlled. So I can actually say I have you know thousands of people doing this but I can put an agent behind it um because it's a confined enough problem space that I can supervise and watch it
and so the moment these things get more powerful and we bring the cost down I think then it will be a more pervasive application.
>> Yeah that's right. I like seeing that kind of like almost like the market sort of dividing between sort of maybe like the existing frontier AI model companies going more agentic and like that's kind of one part of the market and then
there's a whole kind of like cost efficiency universe that kind of emerges and they they also like you know the frontier model companies might also get into that but it's also like maybe looks like a very different kind of market and
a very different kind of ecosystem. Um,
Lauren, I'm curious about how you divide up the future agentic market. Is it, you know, one model model's to rule them all? You know, is it uh I just kind of
all? You know, is it uh I just kind of curious there's many ways this could play out and I'm sort of interested in how you how you forecast here. Yeah, I
think whoever can make something repeatable will win because it really feels like this moment of agents right now is like traditional AI 10 years ago
where it was really cool that you could build an AI model to do anything, but you had to do that from scratch. So you
wanted it to predict education outcomes, you had to find the data, train the model just for that, refine it, and then package it up and use it. And then if you wanted it to do something else, you had to start over. It was a whole
endto-end process every time. And that's
kind of what agent building is right now. And it's even more painful because
now. And it's even more painful because it's not just code based, it's language based. So so much of that rebuilding is
based. So so much of that rebuilding is like prompting it and figuring out how to nudge it in certain directions and get it to use tools sometimes and not other times and use the tools in better
ways. So I feel like if there
ways. So I feel like if there I mean the breakthrough with traditional AI was foundation models. We then
trained bigger better models because we could we had more data we had more compute and then that one model could do different things because it knew a little bit of everything. I think if we
similarly had some concept of foundation agents it could work similar um and then you know kind of reduce that friction of having to build from scratch every single time.
>> Yeah for sure. And you think the winners that are best positioned I guess there would be the existing leaders right like I guess they take their model polish it off and then it's the foundation at Gentic you know model basically
>> and I don't even think it would be model at this point it would be orchestration of multiple models plus other um you know constraints put on that
>> um I really don't know I don't know if it would be the existing leaders or is it going to be some dark horse that builds one agent initially builds one
agent to do one specific thing but then takes the pieces of that you know whatever percent of the code and uses that to do a second thing and then a third thing and then eventually because
that was kind of the AWS story right is they were building for themselves to do something specific initially but then a lot of that cloud infrastructure could be used for other things beyond that so
I think there is a scenario where that happens where someone just commits to a use case and initially you know they're kind of looked down upon because they're using AI to just do one thing but do it
well, but then they realize what the pattern is to expand to other things and eventually build something that's more repeatable and more of a platform.
>> Yeah, that's very rich. I never really thought about that as basically like what's the specific agentic problem that if you solve unlocks the largest number of subsequent agentic problems and it's kind of interesting to think about like
is that is that the travel planning one?
Like what what what is that use case?
So, um Chris, do you want to give us a final thought here before we close up? I
don't think this is a finite game. I
don't think there is a winner. I think
this is the classic Simon Sync infinite game. I think um the players are going
game. I think um the players are going to play until they run out of resources and they can no longer play the game.
And I think that's that's that's what's going to happen, right? So I I think a lot of the technology and the techniques is known. I think it's well known across
is known. I think it's well known across the world and the limiting factor is resource. But in an agentic world, the
resource. But in an agentic world, the the models need to get smaller and smarter and they need and in the future be able to fit on a chip, right? And and
therefore I I just don't think there is a winner in this scenario. So who do I think is going to win? I think we're going to win, right? I think it opens up
a creator ecosystem. What I hope that it breaks up is all these kind of web 2 massive companies controlling everything and we can get a a more uh surrounded
marketplace. That's what I believe in.
marketplace. That's what I believe in.
And and and the biggest thing if I think about this is what what I think was going to happen for 26 and 27 is you remember the Rick Rubin episode where I was frantically uh googling who Rick
Rubin was so and I could give an intelligent answer to your question, Tim. Um I'm obsessed with Rick Rubin at
Tim. Um I'm obsessed with Rick Rubin at the moment because actually I think composition is where we're going. I
think 26 and 27 is going to be about marketplace, but I think it's going to be about being being producers. And
you're going to say, "Okay, I've got this model over here and I've got my piece of data and I've got my brand and my style and I've got these five tools and then I'm going to combine them together into my ecosystem and I'm going to create something new and beautiful
and therefore and that's going to be my product." And and so I I hope that's
product." And and so I I hope that's what happens is that it's not this this one model or whatever is is the winner.
That's a depressing future. What I'm
hoping is this vibrant, amazing ecosystem and marketplace where everybody's got a chance to use AI to improve their lives, personalize it to them, and create their own company's products and data without the
limitations that we have today. So,
we're going to be the winners. And but
but I I think these model providers, they're going to come and go, right? And
we saw that this year. Who you know who who was Moonshot and Kimmy, right? You
know, ask that question six months ago, right? And then and then if we go back
right? And then and then if we go back last year, who was DeepSeek, right? Same
sort of thing. And and new new model providers are going to come in. And you
remember when we were super excited about Manis again? I expect them to come back at some point. Just people are going to come in and out and it's and it's fine and it's okay. But uh yeah, who knows for the future, but it's going
to be fun.
>> Nice. Well, on that hopeful note, I'm going to let you all get to your impending holidays. Uh Vulmar, Lauren,
impending holidays. Uh Vulmar, Lauren, Chris, thanks for joining us. And thanks
to you listeners for joining us. Uh, if
you enjoyed what you heard, you can get us on Apple Podcast, Spotify, and podcast platforms everywhere. And we'll
see you next week on Mixture of Experts.
Loading video analysis...