Gemini 3 Pro - The Model You've Been Waiting For
By Sam Witteveen
Summary
## Key takeaways - **Gemini 3 Pro Tops LM Arena**: Gemini 3 Pro outperforms 2.5 Pro on all major benchmarks and is the first model to top LM Arena with a score beyond 1500 ELO, 50 points up on Gemini 2.5 Pro, with only some Grok 4.1 models in between. [03:28], [04:08] - **Superior Reasoning Benchmarks**: It achieves 37.5% on Humanity's Last Exam, measuring comprehension and multi-step logic, and tops the GPQA Diamond for reasoning and deep PhD-level knowledge in specific topics. [03:54], [04:53] - **Agentic Coding with Anti-Gravity**: Along with Gemini 3 Pro, Google released Anti-Gravity, a product focused on agentic coding, enabling sophisticated tasks like function calling and long horizon planning beyond just text interfaces. [00:35], [03:05] - **Multi-Step AI Studio Demo**: In AI Studio, the model performs multiple searches, writes code, executes it to create a comparison table of coding tools, and provides accurate citations from various sources after multihop steps. [05:33], [07:32] - **One-Shot Game Building**: Gemini 3 Pro builds fully functioning games in one shot, like a Crossy Road clone with voxel environment, score system, and playable mechanics, or a Don't Starve-style 2D crafting game. [09:37], [11:48] - **Gemini App's 300M User Surge**: The Gemini app has added over 300 million users since the 2.5 Pro preview and over 200 million since July, now featuring Gemini 3 Pro for visual layouts, dynamic views, and agentic tasks like organizing inboxes. [13:27], [15:17]
Topics Covered
- Why did Gemini 3 take years to build?
- Gemini 3 prioritizes utility over personality?
- How does Gemini 3 dominate key benchmarks?
- Can Gemini 3 build interactive games instantly?
- Will Gemini 3 redefine search with generative UIs?
Full Transcript
Okay, so the model that people have been waiting for for quite some time, Gemini 3, is being released today. And as one of the early testers, DeepMind has given me access to this model. And in this video, I'm going to go through a whole number of the different aspects of the release.
I'll talk about what Gemini 3 Pro is good at. I'll talk about where Google is focusing with this model to actually get it to be much better at certain kinds of tasks. And then I'll also talk about how Google plans to use this model to support a whole suite of new features and new products.
Now along with this Gemini 3 Pro release today, Google's also released a product called anti-gravity which is all about agentic coding.
Now I've actually made a separate video where I go in depth into that.
So if you're interested in that, check out that video as well. All right.
So first off, this model has taken quite a while to arrive. And I'm not talking about just the sort of few months that people have been talking about rumors of the model.
I'm talking about that this model is really the culmination of Google focusing on a whole bunch of things over the past few years.
And that's things on the sort of AI infrastructure level with things like TPUs and data centers etc. right through to the cutting edge research that's driving these mixture of experts models that not only started at Google but has been advanced in many ways over the past couple of years and that brings us to today of Gemini 3 and really for this launch Google's sort of focusing on a number of key capabilities for the model and then how they're actually going to use those in various products so internally Google's been focused on giving Gemini 3 a series of capabilities and these revolve around a few different things.
One being that the model should have a jump in reasoning which it certainly does when you start playing with it.
But in many ways that reasoning is focused on certain skills.
So when you're using the actual model via an API or via one of the chat apps etc. The whole goal there was to make the model clever, concise and direct.
And this seems to be a really clear direction that Google's going in in that it seems that they're not going for the heavy sort of personality driven style model perhaps like what OpenAI is doing.
They're going for something that's more both an assistant and a tool that will actually do work for you and do a lot of the heavy lifting for you.
And this really shows up in a number of the skills that Gemini 3 Pro is actually a lot better at. We see this, for example, not only in things like the state-of-the-art reasoning, but we also see it in things like long horizon tasks, in the ability to build and to plan.
And when we look at that, that's not only things like coding tasks, but being able to build dynamic UIs on the fly so that the user can then interact with the model far beyond just a text driven interface.
And while I'm mentioning coding, this has clearly been an area of focus for Google is both the coding and the whole idea of agents being able to follow through and do more sophisticated tasks with things like function calling with long horizon tasks, etc. And we can even see this when we start to look at things like the benchmarks.
So, not only does Gemini 3 Pro outperform the 2.5 Pro on all the major benchmarks and often by a significant margin, it's also the first model to top LM Marina with a score that goes beyond 1500 ELO. And that that's 50 points up on the Gemini 2.5 Pro.
And really, the only thing that's in between those is some of the Gro 4.1 models.
Some other benchmarks where this does really well is that it achieves 37.5% on humanity's last exam. The progress on that exam clearly has gone a lot faster than the creators of it probably intended, but it does really measure some of the genuine sort of comprehension and multi-step logic in there, which you can see Gemini 3 Pro is a lot better at. The other benchmark of note is that they top the GPQA diamond.
So that's the Google proof question answering and the whole idea there is to measure both reasoning but also sort of deep knowledge that someone like a PhD would actually have in their specific topic.
If we come in and look at the benchmarks from the model card, it's pretty damn impressive, right?
Not only have we got humanity's last exam, which I've talked about, you've got Arc AGI uh being substantially higher than both Claude Sonnet 4.5 and GBT 5.1.
And then we've got some of the agent coding ones where we look at things like terminal bench 2 again substantially better not only than Gemini 2.5, but than competitors models in there.
The Aentic tool use benchmark is doing really well, edging out Cha Sonnet 4.5. And as you can see, pretty much overall on all of these, with perhaps the exception of SweetBench, Gemini 3 Pro is beating out the competition here. Right, let's jump into AI Studio and have a look at some of the actual examples of building things with this using the AI Studio build tool.
Okay, so coming into AI Studio, what I thought would be interesting is to try out a number of different examples where it's got to use multiple searches and multiple sort of tools for doing this. So, while I'm not actually giving it a huge amount of tools, we can see from this prompt that the idea here is that it's going to need to do searches, respond to those searches, write some code, and then gradually put all this together.
And you can see sure enough it kicks off and actually does that. Now in the end you can see down here I've given it code execution.
I've given it grounding and I've given it the URL context to basically pull things back and we've asked it to basically do an analysis of different coding tools out there.
You can see in the end it's able to write a bunch of code for this. It's able then to sort of execute that, use this to actually make a comparison table and put it all together. And then it gives us citations along the way of what it actually found at each site.
And I've gone through some of them. They seem to be quite accurate. And you can see to get to this point, it's used a lot of different sources, but it's also used a large amount of searches with each of these different sort of keywords being searched to find different benchmarks, to find different reviews, etc. And this is common pattern that I see through lots of the different tests that I've run doing a similar kind of thing of asking it to basically come up with a state of AI agents doing this.
You'll see that it goes through and in this case it was supposed to make some slides.
So I haven't actually given it a set of slides tool here. We're just in a sandbox.
But I've given it code execution.
I've given it the grounding search etc. And to do these kind of tasks, it needs to do a lot of multihop sort of steps going through this.
And we can see that again when we're looking at the all the sources that it's used and the large amount of searches that it's actually done to find the different information and then collect it together.
This is a really good sign for a lot more of the long horizon sort of tasks.
And it definitely shows up when we're using the API version as well and giving it a series of tasks where it's basically required to go through and put together a lot of these things over a lot of multihop or multi-step reasoning points going through it. All right, if we come and look at some of the actual coding abilities for this and I'll go into the actual build tool in a second.
You can see here, this is actually an example that came from one of the googlers of making a 3D voxal image.
And there have been quite a lot of these out there that have sort of leaked over the last few weeks. This one I find very interesting because this is basically an interactive simulation.
And this is of the Golden Gate Bridge using 3JS.
It's nothing too surprising there, but having the different lighting sliders, fog, describing all of these things in there, the model's got to pay attention to actually be able to do this. And you can see that when we look at its thoughts, it's addressing each of the elements that were brought up, which is really good.
And then finally, it gives us this simple HTML page out. So I can zoom in and zoom out. I can change the time of the day so we can actually see things.
So, the way I've spun this around to, I can do things now. I can increase the traffic density.
We can actually see some shining off the water. We can see some of the buildings in San Francisco on the back end there. And we can even add in things like fog so that we can actually see what's actually sort of going on here. I think we've got some boats bobbing around there. And this is actually pretty impressive for what it's actually doing in here. So another area where the model actually really shines well is this whole sort of build your ideas with Gemini which is in AI studio.
So this is like their vibe coding tool.
You can oneshot a lot of things and you see when you actually go to create something you can pick different elements that you want to be in that app and they will get added in here.
Now I got to say that the model is particularly good at things relating to games which kind of surprises me.
Okay, so here is one that I basically gave it a very short prompt. I seem to have lost my prompts, but this is basically a oneshot game of building something like Crossy Road.
So, if you know Crossy Road, you've got different characters.
It's a voxil environment, and it's kind of like the old school game Frogger.
You can see here that this is basically done the whole thing in one shot.
And you could argue that, yeah, okay, it's not as nice.
We've got parts of the board cut off and stuff like that. But you can see that I've got a fully functioning game here that I can play. And so I can come around, I can crash into a car even.
And you can see that this is one shot for actually putting this together.
We've got a score system. We've got a best score system. We've got each of these things in here. Another example is this one that I've asked it to make sort of a clone of the game Don't Starve.
So, if you know Don't Starve, it's got a certain cartoon kind of look to it.
All I've asked it in the prompt is, "Can you build me a 2D cartoon game in the style of Don't Starve where you control the character as you walk around the world and find different elements to use for crafting?
" That's the entire prompt.
And it's gone off and been able to put together something that allows me to do this.
Now, I haven't actually tried playing this.
Now, it's gone off and actually made something where I can walk around the world. Perhaps not the best graphics ever, but you can see in here it's also got the font very similar to the real game, but I've also got a crafting thing here where I can see that, okay, these are the things that I would need for actually crafting in here, which is very similar to the real game.
So, it's kind of interesting that it not only has a good sort of sense of what is in the real game, but it also has the ability to actually code it and put this together. Okay. If you wanted to build something that's not an actual game, here's an example that I did last week where I basically asked it to build me a professional looking news site that parodies tech news but for written for cats.
Make it look slick and have good graphics in there. And you can see that in here it's built a whole sort of website.
It's obviously used, I'm guessing, nano banana or image gen to actually create the images in here.
But you can see that it's gone through and put all of these together as a different elements.
And we've got like trending news and stuff like that. And even if we change the size of the screen in here, it can adapt to that. So rather than just go through a bunch of prompts and show you different things, I would suggest that you come in here yourself to AI studio. Don't forget this is free.
You can try this out and see for your particular use cases. How is it actually responding?
You can come in here and set a thinking level. You can set obviously a lot of the sort of standard tools and stuff in here. and it's very easy to get started with either the AI Studio version or the API version for actually testing out your prompts and your use cases.
So, one of the things that's really interesting about this release is the actual platforms that Google is releasing the Gemini 3 Pro model on.
So, if we look back to when 2.5 Pro came out, it was basically just AI Studio back then.
These models were only being used by devs. The preview models in particular were only being supported on AI Studio, often not even on Vert.
Ex or GCP back then. So for Gemini 3 Pro, this is a very different story 8 months later.
So of course, we've got the whole sort of AI studio in there. That's not a surprise.
And having Vertex, I guess, is also not a surprise these days.
But beyond that, we've got the Gemini app, which we've started to see the models come out on day one on the Gemini app only in the last 6 months or so.
And the Gemini app itself is seeing massive momentum.
It's added over 300 million users since the 2.5 Pro version of Gemini came out in preview, and it's added over 200 million users since July.
And while I think for us as developers, we tend to think of AI Studio as being the place to go to for the latest Gemini models, clearly the Gemini app is what's getting the most amount of traction nowadays.
So in the Gemini app, they're actually adding some really interesting things which are features that Gemini 3 is able to actually generate. So the first up is visual layouts.
So the idea here is that rather than just return text, the model is able then to go and find images and then lay them out as well as being able to generate different outputs via code that don't get shown as code but get shown as a website that is interactive that people can engage with.
Another one along this lines is the idea of dynamic view.
And you can see in some of the examples that they're showing for this that you will be able to make fully interactive portals on the fly in the Gemini app for a particular topic or subject that you're interested in.
And you got to think that this takes it to a whole new level for things like the Gemini app's learning mode where you can basically get things made specifically for what you're after to be able to interact, learn, use it, etc. Another new feature which in some ways you could even think of as being a whole new product is coming to the Gemini app which is Gemini agent. And this is where you can think of the Gemini app actually making use of the sort of long horizon tasks fully agentic sort of stuff with tools etc in the Gemini app. So the idea here is that rather than you just ask for information or rather than you just have a chat in here, this is like a doing thing where you can basically tell it things like go and organize my inbox, go and do this task and as an agent with a set of tools that Google's basically empowered it with, it will be able to go off and do this. Now, we don't know yet whether that's going to also include MCPS or any form of sort of custom tools, but certainly you can see them going down the similar sort of line to Claude Skills and moving away from these apps being purely a chat interface to now being much more something where not only can you get work done, but it can get work done for you. The other big platform that they've announced with Gemini 3 Pro and is going to actually be using the Gemini 3 models is search.
And we know that this is Google's bread and butter.
We know that they've taken quite a while to basically implement some of the features that some of their other competitors or sort of answer engines have already been adding in there.
But for me, the interesting thing here was that up until recently, Gemini has always used the flash models for search.
So even if you were doing things in AI mode and stuff like that, you generally were using the flash model of Gemini.
In this case, it seems with Gemini 3, they may actually be using the pro model and allowing you to get the best out of the pro model there. And it does seem from some of the things that they've shown that there's some really interesting ideas that they're using the Gemini 3 models for in search. So the whole idea of fanning out a query, basically taking a query and then rewriting it into multiple queries and then doing checks on that is something that perhaps up until now has just been too compute inensive to do. Google has now worked this out and put it into the AI mode so they can actually get you the best responses back.
on top of this AI mode is getting a whole sort of generative UI in the actual AI mode. And there are some really nice examples of how they're using this to basically if somebody asks a particular question about a mortgage, they can actually have the model write a mortgage calculator and show it in relation to your specific query.
And this generative UI doesn't stop there.
this whole idea now of we're moving into a world where the UI that you're basically engaging with can dynamically change based on what you're asking of the model.
So that's going to come to AI mode I think first in the US but I imagine it'll gradually be rolled out to the rest of the world as things go along and it doesn't stop there. Clearly a lot of the product teams inside Google have had access to this model and been planning for this model for a while.
So I think you'll see a lot of interesting stuff come out of Google Labs that are ideas that they've had perhaps for a while but just haven't had the model that was strong enough to be able to deliver this.
So obviously over the last few months we've seen Opal come out of Google Labs.
We've seen Stitch.
We've seen a whole bunch of different ideas that have started their journey there.
Not to mention Notebook LM which has had the biggest turnaround for an app I think ever in that when I first made a video about it there are only a couple of thousand people in the world using it.
Now it's become one of Google's hottest products.
So this whole idea of learn anything across the different apps, build anything perhaps mostly in AI studio but also in things like Gemini CLI and of course anti-gravity as well through to things like plan anything and the built-in agents that the various platforms are using just show how strong this model is and how Google has planned for this model to be able to be rolled out across many platforms. and many apps and features that people are using in the Google ecosystem.
So lastly, just one more thing that they did announce but is not actually released yet and I can't talk too much about this is Gemini 3 Deep Think.
I made a video of Gemini 2.5 deep think when that came out and I talked about how that there were a lot of challenges around that model that the time to first token could be 15 minutes that it would think so long and stuff like that and unfortunately the model didn't make it into the APIs I suspect purely because a lot of the challenges around actually running that model and perhaps even the sort of cost of running that model but the good thing is the deep mind team have now announced ounced Gemini 3 deep think which is quite a substantial improvement on the previous one.
Again with the same idea that this is the kind of model that you use when you're happy for it to go off and think for tens of minutes at a time before it will even respond to you. And with this announcement, they've also rolled out some pretty impressive performance scores on things like the humanities's last exam and for me more interestingly, the ARC AGI challenge. So, as that gets closer to release, I will make a separate video just for that and walk through some examples of what that can do.
Overall, the Gemini 3 Pro release is very impressive, both from a model perspective, if you're going to be using it through the API, but also from a products perspective, if you're going to be using it in the various Google apps, etc. And while this still is a preview release, we can expect just like with the 2.0 models, the 2.5 Pro models, and Flash models, etc., where we saw multiple iterations of improvement before it got to the GA release.
I expect we'll see the same thing with 3 Pro and perhaps not that far off in the distance with the flash and with the flashlight models.
So, if you've got some use cases, go over to AI Studio, have a play with the model.
It's totally free. You can try it out. You can see how it works. If you're interested in agentic coding, definitely go and check out anti-gravity.
Like I mentioned earlier on, I've made a full video about that.
That's something that you can download today.
You can start using and gives you a bunch of free calls that you can actually use the Gemini 3 Pro model for trying out different coding tasks, etc. And over the next couple of weeks, I'll make some follow-up videos about how you can actually use these things with ADK.
And also, I think there are some other models in the works that perhaps will see the light of day in the not too distant future. All right, as always, let me know what you think in the comments.
I'd love to hear what you see if you're trying out prompts yourself.
I've tried to make this video not just a video of me walking through prompts, but to actually talk about some of the key thinkings behind what's actually going on. And I'd love to hear from people where they see themselves using this both from an API perspective, which I guess is the main thing that I would be interested in, but also from the products perspective, I think is really interesting.
So anyway, as always, if you found the video useful, please click like and subscribe and I will talk to you in the next video.
Bye for now.
Loading video analysis...