LongCut logo

The Industry Reacts to Gemini 3...

By Matthew Berman

Summary

## Key takeaways - **Gemini 3 Tops Benchmarks**: Gemini 3 is number one on independent benchmarks with a three-point buffer over GPT 5.1, leading in five of ten evaluations including humanity's last exam with a 10 percentage point improvement and two coding benchmarks. [00:12], [01:02] - **Premium Pricing Exposed**: Gemini 3 Pro costs $2 and $12 per million input and output tokens respectively for less than 200k context, making it among the most expensive models to run intelligence index evaluations. [00:44], [01:00] - **Scaling Laws Endure**: The secret behind Gemini 3 is improving pre-training and post-training, with a drastic jump from Gemini 2.5 to 3.0 as big as ever seen, proving no walls in sight for scaling up parameters to get better models. [02:30], [02:42] - **Anti-Gravity Windsurf Clone**: Google launched Anti-Gravity, a VS Code fork powered by Gemini 3, which is essentially a clone of the acquired Windsurf IP, as evidenced by leftover 'Cascade' references despite a find-and-replace. [03:19], [04:36] - **Google's AI Stack Dominance**: Google leads big tech in AI with applications, foundation models, cloud inference, and accelerator hardware like TPUs used for both training and inference, giving them data, distribution, and top researchers to potentially dominate. [07:22], [08:11] - **Token Efficiency Leap**: Gemini 3 Pro shows massive efficiency in tokens to tool calls, solving ARC V2 tasks like one in 772 tokens and 188 seconds, approaching human efficiency of 147 seconds while intelligence per token increases rapidly. [10:16], [11:39]

Topics Covered

  • Scaling Laws Defy Death
  • Google Clones Windsurf IP
  • Google Masters AI Stack
  • Intelligence Per Token Soars

Full Transcript

Google dropped Gemini 3 24 hours ago and the industry has been reacting strongly.

It is definitely the best model on the planet and I'm going to show you all of the industry reactions right now. First

is from Artificial Analysis, the company that runs independent benchmarks against all of the top models. And yes, Gemini 3 is number one. Here's what they have to

say. For the first time, Google has a

say. For the first time, Google has a leading language model and it debuts with a threepoint buffer between the second best model GPT 5.1. And a lot of

people are talking about its token efficiency. That means how many tokens

efficiency. That means how many tokens is it using to solve the same set of problems as compared to other models on the market. And it is very, very

the market. And it is very, very efficient. But it's also a very

efficient. But it's also a very expensive model. Listen to this. Given

expensive model. Listen to this. Given

its premium pricing, $2 and $12 per million input output tokens respectively for less than 200k context, Gemini 3 Pro is among the most expensive models to

run our intelligence index evaluations.

They also show that Gemini 3 is leading five of their 10 independent benchmarks.

And one of the biggest improvements is on humanity's last exam, improving on the previous best score by 10 percentage

points. Gemini 3 also leads on two of

points. Gemini 3 also leads on two of their three coding benchmarks and doing extremely well in agentic coding scenarios and also takes the number one

spot in multimodal reasoning and it's coming in at a very fast 128 tokens per second very comparable to Gemini 2.5 Pro

which I have said for a while was a really impressively fast model and I think a lot of that is based on running inference on their TPUs which is their custom silicon And before I move on to

the next reaction, let me tell you about humanity's last prompt engineering guide updated for Gemini 3. This is the Gemini 3 edition and comes with new use cases

specific for Gemini 3 and new ways to use Gemini 3 that you should know right now. This is a free guide from my team.

now. This is a free guide from my team.

You can download it right now. All you

have to do is subscribe to our newsletter. I'll drop the link down

newsletter. I'll drop the link down below in the description. Download it

now. Check it out. And next, the reports of the death of scaling laws have been greatly exaggerated. It seems like you

greatly exaggerated. It seems like you just continue scaling up the parameters and you get better and better models. So

this is from oral vinols who is the VP of research and deep learning lead at Google deepmind and Gemini co-lead. So

the secret behind Gemini 3 simple improving pre-training and post-training. Can you imagine that?

post-training. Can you imagine that?

Pre-training. Contrary to the popular belief that scaling is over, the team discovered a drastic jump. The delta

between 2.5, that's Gemini 2.5 and Gemini 3.0 is as big as we've ever seen.

No walls in sight. Keep scaling up and keep getting improvements. Plus, post

training still a total green field.

There's lots of room for algorithmic progress and improvement. And 3.0 0 hasn't been an exception thanks to our stellar team. And Boris Power, head of

stellar team. And Boris Power, head of applied research at OpenAI, agrees. Look

at this. Great work. May the scaling laws live forever and make us prosper.

So it looks like the frontier model companies are not doubting scaling laws in the slightest. All right. Next, along

with Gemini 3, Google also launched anti-gravity, which is their own vibe coding platform based on a VS Code fork.

But people noticed it felt really similar to Windsurf in a lot of ways.

And it turns out anti-gravity may just be a clone with some updates of Windsurf. Now for a bit of context

Windsurf. Now for a bit of context before I get into it. This is Verun Mohan who now works at Google. Excited

to launch Google Anti-gravity, our next generation Aentic IDE now powered by Gemini 3. Verun is the founder and was

Gemini 3. Verun is the founder and was the CEO of Windsurf which was kind of acquired by Google. They acquired the IP and only acquired what they considered

to be the best people from Windsurf, but it wasn't a full acquisition and Windsurf still exists. They basically

just took the IP, took the top people and it was this whole drama in Silicon Valley that the founder of Windsurf got this huge payday and most of the team from Windsurf kind of got screwed. Then

over a very crazy weekend, Cognition, the company behind the product, Devon, acquired the remaining people at Windsurf. But here's where it gets

Windsurf. But here's where it gets funky. In Theo's YouTube video, someone

funky. In Theo's YouTube video, someone pointed out that they actually left the name Cascade, which is what the Windsurf browser was called, or at least the agent portion, can be found in

anti-gravity. So, they basically did a

anti-gravity. So, they basically did a find and replace, but missed this one.

So, if Cascade is showing up in anti-gravity, then this is essentially what Google did with Windsurf. But, as

Ender replied, they bought Windsurf and its IP. I'm very confused why anyone is

its IP. I'm very confused why anyone is surprised by this. Should they just not use the thing that they paid 2.4 billion for? And I agree. Look, they bought it.

for? And I agree. Look, they bought it.

They get to do whatever they want with it. But now we know the origin story.

it. But now we know the origin story.

They bought Windsurf and they made it into anti-gravity. And Scott Woo, the

into anti-gravity. And Scott Woo, the CEO and founder of Cognition. Remember,

Cognition bought Windsurf after Google bought Windsurf. Kind of weird, but he

bought Windsurf. Kind of weird, but he says, "Congrats to the anti-gravity team on the launch today. FYI, you missed a spot." And yeah, so they forgot to

spot." And yeah, so they forgot to rename Cascade. And a quick thanks to

rename Cascade. And a quick thanks to Dell Technologies for sponsoring this portion of the video. Dell Technologies

has a family of incredible laptops called the Dell Pro Max. Featuring

Nvidia RTX Pro Blackwell chips, which are portable AI workh horses. It comes

in 14 and 16in screen sizes and up to 32 GB of GPU memory. Perfect for onthe-go AI workloads. Check them out, link in

AI workloads. Check them out, link in the description below. But we have a lot of positivity between Frontier Lab founders. We have Sam Alman

founders. We have Sam Alman congratulating Google on Gemini 3 looks like a great model. We have Elon Musk also congratulating Sundar on Gemini 3.

It is a great model. I am very happy to see that they can all congratulate each other but also be hyper competitive with each other because that helps us as consumers of these products. Logan

Kilpatrick, the lead for Google AI Studio and the Gemini API shows that Gemini 3 Pro has the largest delta improvement on the design arena benchmarks. And so although this

benchmarks. And so although this screenshot is very blurry, please forgive me. Here's 5.1 at 1316, which is

forgive me. Here's 5.1 at 1316, which is the ELO, and then a massive jump, Gemini 3 Pro, all the way at 1422. Now, let's

talk about the business of Google for a moment. Let's talk about the business of

moment. Let's talk about the business of Gemini. DD Doss from Menllo Ventures

Gemini. DD Doss from Menllo Ventures says we're in the what if Google does that part of the AI cycle. They can make cheaper models, $2 per million in, $12 per million out, which is not really

cheaper. It's on the more premium side

cheaper. It's on the more premium side of model costs. just above GPT 5.1.

Cheaper than Pro. Better models, yes.

Distribute products at no cost to billions of users. Yes, they have a built-in user base of basically everybody who uses the internet. And

they have good unit economics because they own their own custom silicon. And

it turns out they actually not only are serving inference on their custom silicon, the TPU, but they also trained the model using TPUs, which I believe is

a first in the Gemini series of models.

And so he makes this final point which I think is extremely accurate. Here's his

assessment of the big tech giants.

Amazon and Microsoft chose to be infra partners. Meaning they'll work with

partners. Meaning they'll work with anybody and they're going to serve your inference because they have massive data center infrastructure. Apple chose not

center infrastructure. Apple chose not to play. Well, they tried to play and

to play. Well, they tried to play and then they tried to partner and basically it has not worked. They've really

stumbled on the AI era. Meta the bed and Google is coming out on top. I said this in my previous video. I really think Google has this incredible opportunity

because they basically have everything.

They have the data. They have the custom silicon. They have the distribution.

silicon. They have the distribution.

They have hardware with Android. They

have the top researchers in the world.

AI is really Google's to lose. And if we look at this really cool infographic about what all of the top tech companies have in terms of their competitive advantage and their mastery of the

entire AI stack, we can see of all of them, Google is the only one with applications, foundation models, cloud inference, and the accelerator hardware.

And Ryan Peterson, founder and CEO of Flexport, agrees. Funniest outcome would

Flexport, agrees. Funniest outcome would be Google dominating AI after pulling off a 10-year dead cat act to escape monopoly regulation. Basically, what

monopoly regulation. Basically, what he's saying is the government was looking very closely at Google for a long time for antitrust violations. They

basically thought Google might have a monopoly on everything. Search, the

Chrome browser, I know was a big part of it. And if you remember, it wasn't more

it. And if you remember, it wasn't more than like a year and a half ago where everybody was looking at Google thinking, "What are you doing with AI?

you're not doing anything. They had some of the worst models. They had a lot of problems with bias in their models and now they're number one and they're seen as number one. And that is such a crazy

turnaround. Check this out. This is a

turnaround. Check this out. This is a twoshot working simulator of a nuclear power plant. Here it is. We have kind of

power plant. Here it is. We have kind of like voxil art style. Stage one, the core. You basically can walk through it.

core. You basically can walk through it.

You can see the water heating up here.

The different elements of a nuclear reactor. And yeah, this was twoshotted.

reactor. And yeah, this was twoshotted.

I think people are getting confused on what zero and one and two shot means. It

usually means you're providing examples.

Zero shot means you didn't provide any examples. One shot means you provided

examples. One shot means you provided one. But I think at this point it means

one. But I think at this point it means how many turns it took. So how many tries you gave it. But whatever. That's

a nitpick. And so here somebody asked for the prompt. Write code for a detailed explainer showing how a nuclear reactor works. It should break down the

reactor works. It should break down the different stages of the process in a clear way and code a 3D voxal version of the same demo. Next, Gemini 3 Pro

oneshotted an SVG of a pelican riding a bicycle. This is kind of the known test

bicycle. This is kind of the known test for new models. It's usually just an SVG of a pelican riding a bicycle, but this is actually a moving version of it. And

yeah, this is pretty darn good. And

remember I talked about token efficiency? Well, let me touch on that a

efficiency? Well, let me touch on that a little bit more. This is Emad Most, founder of Stability AI says, "The most interesting thing testing Gemini 3 Pro

has been how efficient it is from tokens to tool calls. The intelligence per token of models is increasing rapidly, even as prices fall. It's quite

something." And I've been talking about this for a while, especially as a lot of these companies are talking about the duration of autonomy of an agent, how long an agent can run for without any

human intervention. I look at it and

human intervention. I look at it and say, "Okay, it's great if they can run for 1, five, 10, 20 hours, but what is it actually doing with that time?" That

is just as important. So now we're getting to token efficiency. And really

the ultimate measure of how good these models are is intelligence per unit of time. How much intelligence, how much

time. How much intelligence, how much problem solving is it actually doing in a given amount of time. And that is what they're seeing here. These models are getting better at using fewer tokens to

do more. and Mike Canoope from Arc Prize

do more. and Mike Canoope from Arc Prize also says the same thing. So, first

Gemini had a massive leap in the Arc benchmark performance and we just verified Gemini 3 Pro and Deepthink are over two times state-of-the-art on ARC

v2 and it's impressive and frankly a bit surprising. We're also starting to see

surprising. We're also starting to see the efficiency frontier approaching humans. The fastest V2 task Gemini 3 Pro

humans. The fastest V2 task Gemini 3 Pro solved was this example with only 772 tokens in 188 seconds and our human panel solved it in 147 seconds. And this

is the point. This is why the Arc Prize is such a good benchmark or set of benchmarks is because it tests not only generalization but also efficiency. But

the interesting thing is it didn't improve all that much on V1 of the ARC prize. It made a huge jump on V2, but

prize. It made a huge jump on V2, but that didn't translate to doing so much better on V1. As he says, I expected an AI system which can score half on V2 to

basically get 100% on V1. These systems

still make obvious mistakes on much easier V1 tasks. Check out these examples. I can't fully explain these

examples. I can't fully explain these contradictions. So, the models are

contradictions. So, the models are getting better at some things and staying the same at others. So, my team is still putting the finishing touches on the Gemini 3 testing video. That is

going to be incredible because some of the tests that we're running, some of the demos we were able to create are truly mind-blowing. So stick around for

truly mind-blowing. So stick around for that. If you enjoyed this video, please

that. If you enjoyed this video, please consider giving a like and subscribe.

Loading...

Loading video analysis...