LongCut logo

AI NEWS - Something Big is Happening: Gemini 3.1 Pro, GPT-5.3-Spark, and Anthropic $30B fundraise

By Mastra

Summary

Topics Covered

  • Non-Users Lag 6-18 Months Behind
  • Fast Agents Kill Multitasking
  • Gemini 3.1 Matches Opus
  • WebMCP Enables Native Browser Agents
  • Rails Beat General Agents

Full Transcript

So, this is Agents Hour. Now, we're

going to be going over all the news from the last week. Actually, probably about the last 9 or 10 days. It's been a little while since we've done this, and there has been a lot of news. So, should

we get into it?

>> Yeah. Let's start with the juicy stuff.

>> First, something big is happening. It's

about 9 days ago when it came out. 84

million views on this post. This hit a nerve. And I think some of it is a

nerve. And I think some of it is a little uh you know exaggerated in my opinion, but I think the post is overall pretty good.

>> I mean, I wouldn't consider like rage bait or anything. It's just a good perspective, right? It's just a

perspective, right? It's just a different perspective, a good perspective, you know? We're like living in a world where like the model companies are like, you know, waging a war and like we're all like just like the dudes in the outside of the kingdom,

you know?

>> Yeah. I think it hit a nerve because it kind of was written for people that aren't in the space, you know? We're in

this space every day. We're keeping

track of all that's going on on X and other places, all the releases, every new model. We know when there are jumps.

new model. We know when there are jumps.

You know, we we noticed in mid December that things got even better.

>> You know, we I felt it. I think got a lot of people excited. But if you're not in this, you're not seeing what's possible today. And you're probably 6

possible today. And you're probably 6 months, 12 months, maybe even 18 months behind because you're just not really using it. I I know a lot of people that

using it. I I know a lot of people that are even developers that aren't really fully utilizing coding tools because they're at an organization that doesn't allow them to.

>> Yes. And they're not getting held back.

>> Yeah. So they have not even seen what is possible yet. And I think that if you

possible yet. And I think that if you apply that to the broader sense, not just developers who by definition, if you're a developer, you're probably a little more techsavvy. You're probably

following the trends a little bit more versus someone who's maybe just in a different type of position, a non-technical role. And you do not see

non-technical role. And you do not see other than you see all these commercials talking about AI, you do not see the impacts in your daily life.

>> Yeah. And dude, it's also when it when you talk about people's jobs, right, it's very like both fear-mongering, right? Um and rage bait, especially if

right? Um and rage bait, especially if your job is technical and you say, I don't even do I don't even have to be at my job or whatever. AI does my job.

People of the same job are like, well, you probably suck at your job or something. And then it's the

something. And then it's the non-technical people that think like, well, if that guy's going to get replaced, then I'm screwed, you know?

And it's like that kind of fear-mongering stuff as well. Um, so

excellent excellent X strategy, I would say.

>> And if you haven't read it, which I'm sure if you you've at least seen it or heard about it, it's a good read. It's a

little long, but uh it obviously hit hit a nerve with a lot of people.

Some Anthropic news. This is from February 12th. Anthropic has raised 30

February 12th. Anthropic has raised 30 billion in funding at a 380 billion post money valuation. It's a lot of money.

money valuation. It's a lot of money.

It's a lot of bees. I saw a follow-up post here that it was because of Cloud Code revenue. Ain't that some

Code revenue. Ain't that some >> Cloud Code has got to be a huge driver of one token usage, but also just subscriptions, right? That $200 a month

subscriptions, right? That $200 a month max plans got to be bringing in a lot of a lot of revenue.

>> Yeah, seriously.

>> We have people on our team with three max plans. Well, one person on our team

max plans. Well, one person on our team has three different max plans right now.

So, >> yeah, dude. We're just burning tokens over here.

>> I don't even fill up my usage on one.

So, I need to step up my game. recently

clawed Sonnet 46. I was kind of expecting it to be Sonnet 5. So I wonder if they just didn't quite get the leap they wanted. So they went with 46 rather

they wanted. So they went with 46 rather than, you know, I don't know. I don't

know how they actually named things, so I'm just speculating, but I saw a lot of speculation earlier that it was going to be a Sonnet 5 release and then it was just a Sonnet 46. But it has a 1 million token context window, which is big.

>> I mean, I've been using Opus 46, so I'm excited to get it. Sonnet's really good for not he heavy thinking tasks. It's a

lot faster, too. But it's also kind of I mean, there are better models for that now, like Kimmy and stuff. But anyway, I wonder if people care about Sonic releases as much as since the Opus moment.

>> I feel like I just use Opus for everything. I mean, not everything, but

everything. I mean, not everything, but most things. Yeah. And then I I will

most things. Yeah. And then I I will sprinkle in some, you know, codecs in there as well. If you're a designer or you're a developer that wants to be a designer or wants to send stuff to your design team, apparently there's been

some improvements. So you can go direct

some improvements. So you can go direct from cloud code to Figma using Figma's MCP server. So you can build a working

MCP server. So you can build a working prototype in code and then send it to Figma to see different versions.

Some open AI news. You can just build things faster.

>> Dude, I was uh sitting next to someone who got access to Spark and it is quite fast.

>> So I do have this question. So right now we've kind of been training ourselves for the last 18 months now, 12 months for sure. We kind of get in this mode

for sure. We kind of get in this mode where we will send something off and then try to manage multiple tasks, right? There's probably a world though

right? There's probably a world though where the models are just fast enough that it actually doesn't make sense for you to manage two things at once. You

just stay on one because if imagine it >> where it sometimes will take 2 minutes for it to go off and and do things, sometimes longer. But what if it never

sometimes longer. But what if it never took more than 10 seconds or never never took more than even 30 seconds? Would it

be beneficial to go off and start a new thread or or bounce between tasks or would it just make more sense to stay focused on that one thing and just work really closely with a really fast agent?

>> Yeah, cuz you'll get it done in like half the time or something, right?

>> Because you can just go deep. You can be like in in it. If rather than like spinning the slot machine and seeing the wheels spin for a long time, if it just like instant push button and it was there, you would never want to manage multiple threads. You would just stay on

multiple threads. You would just stay on one thread for longer.

>> I would do that.

>> I think we're going to get there. I

don't know when, but this is obviously a telltale indication that they can make things faster and they're going to continue to make things faster.

>> If it's super fast and you're super focused, you might turn more issues than this whole background agent thing to me.

>> I think you would cuz there's you there is inevitable context loss when you switch, right? There's there's a

switch, right? There's there's a switching cost to anything. Like one of the things I think I've lost with these coding agents is it kind of taps into this like multi-threading like

multitasking part of my brain which I previously wouldn't have thought myself very good at.

>> Mhm.

>> But one thing I don't necessarily do is get in like the the flow state of when I used to code where I'm like deeply focused on one thing. But

>> if the agent was fast enough, you might be able to hit that again. I don't know.

>> Context switching is tough too cuz I'm like doing 10 things at once, right? And

yeah, it's like almost like a a built a skill to get there. If it's super fast, I think we're all going to go back to one again and build the skill again.

>> Yeah. It's like almost like starting over and eventually it'll be, you know, I think people scaled up to where you're at 10. You know, I may be doing like

at 10. You know, I may be doing like three to five if I'm really in it. And

then eventually that'll scale down to be like, "Oh, you can only do two cuz it's so fast." And then maybe maybe you only

so fast." And then maybe maybe you only need to do one and you just focus on that one thing until it's done. OpenAI

had this uh really interesting post. If

you're wanting to learn more tips and tricks and techniques on building really long running agents and workflows, go ahead and read this. It's from the OpenAI devs X account.

>> Quick note on this. Pretty much

everything we just talked about is uh in that blog post.

>> It's always good validation when you know different uh teams are arriving at some of the same conclusions. Other

teams piggyback on some of our ideas.

We've definitely piggybacked on other ideas. It's like we're all we're all in

ideas. It's like we're all we're all in this building this together. We're

competing but we're also like building like how do we make this process of building and writing code and building applications better. It's always good to

applications better. It's always good to share some knowledge. This was wild.

>> The wildest thing. So Pete Steinberger from Open Claw fame has joined OpenAI that happened. Now the wildest thing

that happened. Now the wildest thing about this I'm happy for the guy. That's

awesome. But he strategically was like talking crap about Claude models, you know, and how, you know, Codeex is his, you know, choice, and the next thing you

know, he's at OpenAI. Well played. Well

played.

>> Yeah. And it, it was funny, too, because it was originally, you know, Claudebot, which is really based on Claude, and then Claude sent him a some kind of strongly worded letter, I imagine. I

don't know the exact story, but you can only imagine he was asked to change the name. Then it was Moltbot, then

name. Then it was Moltbot, then OpenClaw, and then now to complete the story completely away from the original naming, and now it's going to be, you know, Open AI claw at some point.

>> Dude, Open AI has appropriated the claw from that their branding, right? So now,

like, it's such a good strategy move too if that's like a I don't know if it's owned by them. I don't know the structure of that deal of him joining.

I'm sure OpenClaw stays, you know, open source. who continues to get uh help and

source. who continues to get uh help and traction, but what is he going to be working on? I wonder. I

working on? I wonder. I

>> I saw some speculation that it was really a big PR move. I saw some speculation that he might actually be cooking up some stuff. We'll see. You

never know with these kind of things, what the intent is. Maybe OpenAI just had a lot of money and they wanted to stick it to Enthropic because Enthropic stuck it to them on the Super Bowl ads.

>> Dude, what if that's just a revenge?

>> When you're playing with that much money, you can play you can play uh petty revenge games. But, you know, there's probably some strategy here.

We'll have to see how it all plays out, though.

>> You know, in reference to that first article, like something big is happening if things like this can happen. You know

what I mean?

>> All right. This was hot off the press. I

haven't even uh >> had a chance to digest it yet.

Introducing Gemini 3.1 Pro, our new state-of-the-art model across most reasoning, coding, and STEM use cases.

>> And it beats Opus on some things. Like

if you look at like terminal bench, it's ahead of Opus. If you look at, you know, SweetBench verified, it's only 0.2 below O scoring. 2 below Opus in the

O scoring. 2 below Opus in the benchmarks.

>> So, it is arguably right there with Opus on on at least a lot of things, which is uh kind of wild.

>> Yeah, I'm going to try it. I wonder if like, dude, what if people start dailying Gemini? What a turnaround.

dailying Gemini? What a turnaround.

>> We've talked about it. You know,

Google's in such a good place to really own things if they want to, right? And

it really does feel like at least for coding agents, right, they've always been third of the major model providers at least for the last six months or so.

I know a lot of people use Gemini in the app as their daily thing more more so than Chat GBT. So I'd be curious to know actual usage numbers, but I feel like at least with most consumers, Gemini is,

you know, maybe as popular or getting close to as popular as ChatGpt. But if

they could be in the coding agent game, that's something.

>> This is the coolest thing. WebMCP uh

available for early preview. What is

WebMCP? It's like a protocol that allows agents to perform actions in the browser. How are people doing this

browser. How are people doing this today? They're doing things like

today? They're doing things like playright, browser use, browser base, browser whatever. And with those things,

browser whatever. And with those things, you're either running a headless browser or you're doing an active browser and you're doing like Google CDP to talk to it. You have like a connection to it and

it. You have like a connection to it and you're operating it. But um what if it was just native, you know, to how browsers work or at least how Chrome works. So this would be dope. And

works. So this would be dope. And

coupled with observational memory, you could start scraping and have good context and all that stuff. I've been

working on like this browser primitive for Monstra and I have it done, but like I don't know if it's the right thing to do because it's like such a heavy thing to ship a brows like a browser component, but if it's something that's

already built into Chrome, like you know, that would be so clutch. you just

it's just an API call.

>> There's definitely a lot of interest around just making browsers more accessible to agents, right? Like of

course you should just have APIs for a lot of things, but that doesn't always exist. One example is all these

exist. One example is all these screenshots were taken by Claude Chrome extension. You're looking at these

extension. You're looking at these slides that we have. I didn't even look at these things hardly besides I I sent it all the notes and then I had Claude's Chrome extension basically take screenshots of all this for me. This

would have taken me an hour to do manually and it took me one, you know, 20 second command that I sent. And of

course, I've done this a few times now.

So, I got like the loop down where I know how to create one of these things in 5 minutes or less.

>> But there's a lot of things that I've run into where I'm just thinking, why isn't there an API for this? And maybe

someday there will be, but in a lot of cases there probably won't be. And so,

just being able to have your agent go through a browser. I also saw maybe it was a resend or something, but they they basically said that they noticed a lot of automated attempts to access certain

things were getting blocked by their like anti-bot tools and so they turned it off. They just really, you know,

it off. They just really, you know, because you actually like maybe in this world you want bots on your website, especially if they're performing legitimate actions.

>> That's such a messy situation, huh? Like

>> it is. And where's the dial? What

because obviously you don't want all bots, but like how do you know which bots are the right bots? I guess that's why Cloudflare is like, you know, kind of positioning themselves in that that area.

>> And we're talking a little bit about Gemini. So, they just introduced Lia 3,

Gemini. So, they just introduced Lia 3, a new music generation model in Gemini.

Maybe to compete a bit with Sunno. From

funny jingles to lowfi beats, you can create custom 30-second soundtracks for any moment. In a lot of cases, I think

any moment. In a lot of cases, I think some of my favorite uses of suno is when we did like battles and sometimes like a two and a half minute piece of music sound, you know, m song is kind of long, right?

>> But like a 30 secondond clip, maybe that's the right length, you know, where you can just like create a soundtrack for any moment of your life and it kind of just matches the vibe, right, based on what you want.

>> We're just writing jingles.

>> Yeah. It's like you you take a video of like you at an event and then now you create a jingle for that and now it's like it captures that moment or something you know because most of the stuff is just thrown away anyways right

just like >> maybe not thrown away but personal to you in that I have a photo maybe I share it with some people my family sees it or whatever maybe I post it online but not

all my photos get posted online right very few but I keep the photo if it's means something to me >> I wonder if like music could be that same kind of thing where Yeah,

>> I I keep photos. What if I kept like a a photo or a video with a soundtrack or something that was generated because it just matched the vibe, right? And now

it's all kind of >> in that moment of time.

Always got to talk about new model releases and it's always almost always uh Chinese models. Quen 3.5 is here. 397

billion parameter model. It's openweight

vision language model. So, it's for coding, reasoning, multimodal. And then

this is kind of a some of it's a recap because this has been a little dated, but GLM 5.0 came out. It's what been 10 days or 12 days or so, I forget. And

Miniax 2.5, which we can't forget about because it's really good. I just like this comparison of images. And the GLM 5, what is it? Pelican riding a bicycle is pretty dang good. That's probably the

best one I one of the better ones I've seen. The Miniax one's not too bad, but

seen. The Miniax one's not too bad, but I don't know. Which one do you think's better? GLM. Dude, I feel like GLM did a

better? GLM. Dude, I feel like GLM did a better job than Opus and Codeex for making that same Pelican.

>> Do you think GLM they just like trained it on a bunch of Pelicans riding bicycles?

>> Dude. Yeah. Your benchmark maxi.

>> I don't know.

>> Benchmark Maxi.

>> It's like they know what the what everyone's tests are going to be.

>> Yeah.

>> It's like if you're if you're doing a a video model, you got to you have to train on a whole bunch of Kobe Bryant getting schooled in in one-on-one, you know, or Will Smith eating spaghetti.

Like those are the benchmarks.

>> Those should be the benchmarks.

>> And this is uh from about a week ago, but Miniax M2 took number two on the Bridgebench leaderboard. This is just a

Bridgebench leaderboard. This is just a highlight. Miniax M2.5 is really good

highlight. Miniax M2.5 is really good model. Like I think if you haven't tried

model. Like I think if you haven't tried it, I know some people on the team are using it. You kind of said it too. Like

using it. You kind of said it too. Like

rather than Sonnet, you might just use Miniax for those types of tasks.

>> Yeah. It's like 72 cents.

>> It's just cheap. It's fast and it's pretty good. I think I would imagine

pretty good. I think I would imagine it's pretty comparable to definitely Sonnet 45. Obviously, in some cases,

Sonnet 45. Obviously, in some cases, it's actually beating even, you know, 5.2 codecs.

There's a new Deepseek model coming. As

of now, or at least when I checked, it wasn't released yet, but supposedly it's coming and there's a lot of hype. Will

it live up to it? I don't know. But this

is getting a little bit of traction.

Cheaper inference, million 1 million context window, and apparently can run on consumer hardware. We will see if it lives up to the the hype or not, but I would expect that maybe next week when

we're having this show, we will be talking about Deep Seek V4.

>> This is funny.

>> Kimmy released and it got 4 million views. Dang. Released Kimmy Claw.

views. Dang. Released Kimmy Claw.

Essentially, it's a what an OpenClaw clone.

>> It's a open claw implementation within their product using Kimmy and they have access to the claw hub and all that. A

lot of companies are doing these claw wrappers, you know, part partly for demo reasons, maybe partly to sell.

>> Yeah, I've seen a lot of try people trying to wrap like enterprise security on it >> as well. So, it's like this is a more secure version because of all the malware issues that they had originally

and you people wanting the promise of it but not wanting the security risks. So

if you could put a security wrapper on it, you could maybe get people to want to pay you for that security.

And a bunch of other just launches over the last 10 days or so. Warp introduced

a platform to orchestrate agents in the cloud. Browserbase built AWS Lambda with

cloud. Browserbase built AWS Lambda with a browser built in. So they're

introducing functions. So it's a new compute paradigm that colllocates your code in browsers to build faster browsing agents. So is this basically

browsing agents. So is this basically just like browsers on the edge >> pretty much >> or like browsers in like serverless functions maybe is a better way to state that >> or yeah it's like a hosted function that

the context of the function is already like next to the browser when it runs pretty cool but why use that instead of web MCP I don't know

>> something that came up a couple times it's called intent which I just thought was kind of cool the developer workspace for agent orchestration so the idea is what comes after the IDE if you don't

need an IDE anymore and to me looks very similar to things like superset things like ductor but it is a bit more structured maybe like there's like specs

and >> this looks I mean Dex is building like a very focused like planning version of this I think people are co colliding on this idea >> yeah and I do wonder they'll all have

their niches right that's that's always going to be a thing you'll find you'll your team will adopt one and it'll work.

If you commit to it, it'll likely work, right? Or it'll at least probably give

right? Or it'll at least probably give you some benefits. But it's this concept of what's going to be better, a more general purpose agent that you just kind of define the tasks or the the process

the way you want or a more specifically designed tool that gives you the process and says like if you follow this process, you'll have good results. I

think depending on who you are and what like the DNA of your team is, you may want to be handed a process that says do it this exact way and you'll have good results. Or if you have a team that's

results. Or if you have a team that's maybe a little little more battle hardened or maybe just more opinionated and they don't want to be confined to a set way of doing things, they'll develop

themselves and use a more general purpose agent and have probably have good results if they are able to kind of define their way of doing it.

>> Majority of people like to be on rails, right? So, I think there's a huge space

right? So, I think there's a huge space in the market for these products for sure.

>> There was a Techrunch article, former GitHub CEO raises a record. This

shouldn't even be called a seed round, but $60 million seed round at a $300 million valuation to build something called entire. And it's funny because

called entire. And it's funny because you I'm not saying it's the same thing, but you were building WIT, which was essentially like what if Git was built a little differently for the AI era?

>> And so, I mean, Wit still exists. It's

still out there.

>> Yeah. ahead of my time.

>> And so entire is kind of around this this type of concept, right? Every

commit tells a story. Now you can read it. And it's essentially this idea of

it. And it's essentially this idea of like what would newer tools for in this world of agents look like? So

>> looks like wit to me.

>> Yeah, I mean it's a little different but >> Yeah, of course.

>> But it is cool and it's MIT licensed.

Not that many stars to be honest for how much they've raised. You need to give more stars. I'm going to give it a star.

more stars. I'm going to give it a star.

Just a to help them out. You know,

>> this was really cool. I don't know how well it works, but the idea is really cool.

>> There's a lot of companies in this space, too. You know, training and

space, too. You know, training and rlinging and all that stuff. I want to use it, dude. That be if it's like self-s serve, which is kind of the dream, right? To just everyone could

dream, right? To just everyone could become a model lab.

>> The dream is not the dream. The first

dream is self-s serve. The second dream is API to do it. Give me an API. like I

want to send you some stuff and you give me a model that I can uh just use.

That's pretty cool.

>> Yeah.

>> So, we're gonna go a little rapid fire.

Ramp said 57% of merged PRs at RAMP come from our background agent. So, it just runs and merges over half their PRs.

Lettera released something called context repositories. So, get tracked

context repositories. So, get tracked files for storing agent context. You

know, file systems and are important for agents. Let has the ability to turn your

agents. Let has the ability to turn your codeex and cloud code sessions into a context repo. So you can basically take

context repo. So you can basically take your coding sessions and put that into context.

This was kind of funny.

>> Joel Hooks, you know, he's talking about, you know, memory is such a fun problem, how I built an observation pipeline. Uh if you go into the article,

pipeline. Uh if you go into the article, he found our observational memory and I believe he chose to build his own after

looking at us because we're MIT and he focused very much on how MIT licenses work. um which is kind of weird but

work. um which is kind of weird but thanks for the the use and the you know looking at it and using it. What did you think about this? Yeah, I just thought it was kind of funny, you know, more and more people are just adopting and

calling it observations and observational memory and >> yeah, >> which is fine. You know, it is it's open source >> MIT, you know.

>> Yeah, you c you can just use it though.

Like you don't have to use MRA to use observational memory. I think that's

observational memory. I think that's maybe uh something people don't realize.

So you don't have to pre-implement yourself. You could also just use it and

yourself. You could also just use it and then if there's something that doesn't work the way you want, it's MIT.

>> Let us know.

>> Yeah. PR it, you know, open an issue.

Let's make it better together. But uh I think more and more people are going to be talking about observations as a pattern rather than compaction. And it's

good that we played a part in that.

>> Yeah, it's super fun.

>> Cloudflare released the ability to just get markdown from any website hosted by >> Great idea.

>> Yeah, that opted in. So you just send text markdown and you get the markdown rather than the HTML, which is way better uh token density, right? You

don't have all the extra markup. So

that's cool. Tilly is joining Nebius, so there's more consolidation.

>> So sad, dude. So

>> yeah, we used them for search and stuff.

>> Nebius, I'm not sure what never heard of them before really.

>> Never.

>> So I don't know. But

>> yeah, I remember before provider web search tools, people were using Exa and Tavilli and things like that.

>> Ghost is the first database designed for agents. This looks interesting and

agents. This looks interesting and pretty cool, but I don't know how much traction it's getting so far, but it does. It's kind of this idea of like

does. It's kind of this idea of like what's a and it's not the first. I mean

there's other types of implementations of like what's the best way for agents to read and write but it is an interesting concept of building databases that are more geared towards agents

>> is using like PG light or something as well lib SQL runs the world anyway.

>> Yeah.

>> And that's it. That's the news for today. If you're just tuning in this is

today. If you're just tuning in this is agents hour. You should be following us.

agents hour. You should be following us.

Give us a star on GitHub master-ai/mra.

We appreciate that. You can catch all of these episodes and other videos that we post about Mastra on the Mastraai

YouTube channel. Follow us on X Mastra

YouTube channel. Follow us on X Mastra or me SM Thomas 3 or Abby if you want to see us share uh more about MRA, share some drama, see clips from the show, see

when we go live. And if you haven't already, go like and subscribe and give us five star reviews on Spotify and Apple Podcasts. Only five, please. We

Apple Podcasts. Only five, please. We

don't like anything less than five. you

know, go find something else to do.

That's okay. You don't have to give us a review, but if you do, please let it be a five.

>> Thanks everyone for uh watching. We will

see you next week.

Loading...

Loading video analysis...