Google wins again. Gemini 3.1 Pro review

By AI Search

Summary

Topics Covered

Gemini 3.1 Designs Entire Mobile OS with Eight Integrated Apps
Gemini 3.1 Pro's 3D Generation Outperforms All Competitors
Gemini Builds Earthquake App From Video Alone
Smartest model is also the cheapest
Gemini 3.1 Outperforms Top AI Models on Hallucination Rate

Full Transcript

Google just released their best and most performant AI model, Gemini 3.1 Pro, and this is incredibly powerful all across the board. So, in this video, I'm going

the board. So, in this video, I'm going to go over how and where to use it, some cool stuff it can do, and of course, we're going to review its specs, performance, and benchmarks compared with other models. Let's jump right in.

Thanks to HubSpot for sponsoring this video. All right. Now, at the time of

video. All right. Now, at the time of this recording, 3.1 should already be available in the Gemini app. So, if you select Pro, notice that it says it uses

3.1 Pro over here. So, let's try out a few prompts. First, let's test out its

few prompts. First, let's test out its creativity and generating new ideas. So,

I'm going to get it to develop an OS for mobile that's even better than Android or iOS. Include eight apps on the home

or iOS. Include eight apps on the home screen. Explain what you designed, how

screen. Explain what you designed, how things work, and your rationale behind it. And then under tools, I'm going to

it. And then under tools, I'm going to select canvas so that we can preview what it built in a side window. Let's

press run. All right. So, here's what it built. It's called Fluid OS. Here are

built. It's called Fluid OS. Here are

the eight core apps of this OS. So,

there's this omni or top banner, which is the brain of the phone. I guess it's this one over here. So, if I click into it, it's basically your unified AI agent. It reads your calendar, emails,

agent. It reads your calendar, emails, location proactively, displaying exactly what you need to know right now, such as when your next meeting is. And then the next one is thread. This is where it

unifies all your messages, emails and DMs and calls in one place. So here it says why have separate apps for SMS, WhatsApp and email thread unifies all your communications. And then next we

your communications. And then next we have the sense app which aggregates all your biometric data from your smartwatch, phone sensors, etc. And then we also have flow which is like a

universal media player. And then we also have prism which I guess is this one. So

this is not just a camera but an AR lens for real-time translation, text copying, visual search. And then we have shift

visual search. And then we have shift which is this one which is navigation.

So this is like Google maps I guess but it knows when you leave for work and proactively shows the traffic delay without you asking. And then we also have this vault icon which I guess is

for security and keys, credit cards, passwords. And then we have this home

passwords. And then we have this home icon which is for controlling your smart home. Now there's no correct answer to

home. Now there's no correct answer to this. This is just what it designed, but

this. This is just what it designed, but let me know how good this is. Do you

think it's better than Android or iOS?

All right, here's another one of my standard tests. So, I'm going to upload

standard tests. So, I'm going to upload this image and let's get it to create a beautiful 3D animation from this image.

Let's see how good it is at coding this up. All right, and here is what I got.

up. All right, and here is what I got.

This pigota actually looks really nice and detailed compared to other models.

I've been testing out this prompt with all the other top models and by far Gemini 3.1 generates the most detailed version of this pagota so far. Now, it's

not perfect. It's still lacking like the rails and the windows and the roof tiles plus, you know, the cherry blossoms could look more detailed, but just from one prompt, this is already very good.

In fact, Gemini 3.1 Pro is way better at 3D and spatial understanding. So, here

are some additional 3D generations from Gemini 3.1 Pro versus the previous Gemini 3. And as you can see for all

Gemini 3. And as you can see for all these examples, it's way more detailed and better quality. If you get it to create some SVG animations of different

things, you can see that Gemini 3 Pro does these a lot better. Everything just

looks a lot smoother and more detailed and more accurate compared to the previous version. All right, next let's

previous version. All right, next let's see how good it is at composing music.

But first, we need to make an interface for it to create the music. So, first,

let's get it to make a piano roll interface where I can drag and draw notes on the timeline. Let's press run.

All right. So, here's what I got at first. Let's play this to make sure it

first. Let's play this to make sure it works.

All right. So, the first version works, but of course, the real test is to get it to actually compose some really insane music. So next I wrote, "Show a

insane music. So next I wrote, "Show a powerful, expressive 32 bar piano opus rich in complexity capturing the final performance of a master pianist." And

then afterwards, here's my result. Let's

play this. Actually, you can't really see this. Let me actually zoom out a bit

see this. Let me actually zoom out a bit so you can see all the notes. And then

let's play this.

That was actually really good.

Everything sounds like pretty harmonious. There's no dissonance

harmonious. There's no dissonance involved here. I tried the same test

involved here. I tried the same test with another state-of-the-art model called JLM5, and it's a lot more incoherent. So, this one actually sounds

incoherent. So, this one actually sounds really good. It has some inherent music

really good. It has some inherent music composition knowledge built into it. All

right. Next, let's test how good it is at lighting physics. So here I'm getting it to develop this simulation featuring three metallic spheres because two spheres is already too easy. We have

some other AI models that can already do this. And then it's going to be

this. And then it's going to be suspended above a street scene. Use any

publicly available 3D street view and allow adjustable parameters such as reflectivity, roughness, and other material properties. Let's see what it

material properties. Let's see what it generates. All right, so here's my first

generates. All right, so here's my first result. And the real test here is to see

result. And the real test here is to see if the spheres are actually reflected within each other. So for the first iteration, it did not get this correct.

So I put the spheres don't reflect in each other. And then afterwards, here is

each other. And then afterwards, here is the final iteration. And as you can see, the spheres are indeed reflected within each other. So let's test out each

each other. So let's test out each sphere to make sure everything works.

For this gold sphere, let's play around with the reflectivity. So that works.

Let's also do roughness. So roughness

also works. And then reflection brightness. That also works. We can also

brightness. That also works. We can also change the color. So let's change this to blue for example, that also works.

And then let's play with this chrome one in the middle. So reflectivity of this also works. Roughness also works. And

also works. Roughness also works. And

notice that as I change the roughness, the reflections of this in the other spheres also change. And then

afterwards, let's also play with reflection brightness. And that also

reflection brightness. And that also works. Very nice. And then finally,

works. Very nice. And then finally, let's play with the red metallic sphere to make sure that one works as well. And

yes, all the settings do work. So, in

just two prompts, I made this fully functional, you know, floating sphere simulation where I can change all these different parameters and it looks great.

Now, here are some really cool use cases of Gemini 3.1 Pro. I can upload a ton of these receipts and then ask it to parse all these receipts and add them to a spreadsheet with these columns: date,

item, total, and currency. Let's press

run. All right. And here's what I got.

Indeed, it has given me all these items and the currency is correct. And it even got the one trick receipt that was in Canadian instead of HKD. Very

impressive. And then afterwards, we can click on export to sheets. And if I open this up, here is the result. Very nice.

Now, because this can analyze images, let's input this where's Waldo photo and ask it to find and circle Waldo in this photo. And yes, I'm going to use the pro

photo. And yes, I'm going to use the pro mode and then press run. Now,

unfortunately, this was a very disappointing answer. It says that he's

disappointing answer. It says that he's not actually in this picture, which is wrong. He is in the picture. So, that's

wrong. He is in the picture. So, that's

disappointing. And this is a classic hallucination example where it's just making this up. It got it completely wrong. Now, the really awesome thing

wrong. Now, the really awesome thing about Gemini 3.1 is not only can it analyze images, but also video and audio. So, what I'm going to do first is

audio. So, what I'm going to do first is upload this explainer video of an earthquake simulation for Japan and then ask it to create an app based on this video. Let me play you the video first.

video. Let me play you the video first.

I want you to create an interactive earthquake visualization of Japan. So,

let's say we have a map of Japan like this. First, I want you to list or show

this. First, I want you to list or show me all the major cities in Japan on the map. And then there's going to be a left

map. And then there's going to be a left sidebar where I can adjust various settings like earthquake, magnitude, etc., etc. So, these are the settings

that I can adjust. And whenever I click somewhere on the map, so let's say I click here, then you would start to create an earthquake. So it's going to

be an animation effect that slowly ripples and ripples all the way until it hits one of these cities. And based on the magnitude of the earthquake, I want

you to calculate how severe the impact would be for each major city. So note

that in my prompt, I did not specify with text any details of the app. I

didn't specify Japan or earthquakes or anything like that. It needs to figure out everything by itself from actually watching the video. Let's press run and see what it generates. All right, and

here is what I got. How cool is that?

Let's actually see if this works. I can

click anywhere on the map to trigger a simulated earthquake. So, let's click on

simulated earthquake. So, let's click on here. And here we go. Notice that it

here. And here we go. Notice that it even like automatically pulled up a map of Japan for me. Next, let's increase the magnitude of this. Very nice. Let's

make the magnitude lighter. And you can see that doesn't really affect any of the major cities. Let's increase this to a whopping magnitude of 9. And then

let's set it somewhere over here. And

here is what happens. How cool is that?

So indeed, it followed exactly what I specified from my video. Really cool.

Speaking of Google's Gemini, if you find yourself spending hours manually researching topics but only scratching the surface, I've got something that will completely transform how you

approach research and information gathering. Take a look at how to do 10

gathering. Take a look at how to do 10 hours of research in 20 minutes. The

marketer's guide to Google Gemini and Notebook LM by HubSpot. I put it in the description below for you to download for free. This guide will help you

for free. This guide will help you become a top 10% expert in any topic in days instead of months by leveraging AI research tools that compress what used

to take hours into just minutes of work.

You'll be able to process more sources than your entire team and spot patterns across information that humans typically miss. Inside, you'll find step-by-step

miss. Inside, you'll find step-by-step instructions for using Gemini's deep research capability to gather comprehensive information from dozens of

sources, plus how to use Notebook LM to transform that raw data into actionable intelligence through AI powered summaries and even personalized audio

overviews. My favorite section is where

overviews. My favorite section is where they outline 11 practical applications for this research stack, including everything from content creation to competitive intelligence and crisis

communication planning. The examples

communication planning. The examples show exactly how to implement these tools in real marketing scenarios. This

resource was made by HubSpot, who's sponsoring this video, and download it for free via the link in the description below. All right, here's another

below. All right, here's another example. You can easily get this to

example. You can easily get this to create personalized educational content for you or your kids. For example, let's get it to make a fun educational course on chemistry for kids consisting of

multiple lessons. Include real images

multiple lessons. Include real images and interactive visual exercises. All

right, here is what we got. So, it

included three lessons and it does use real images. Let's start the first

real images. Let's start the first lesson. So, here we need to add some

lesson. So, here we need to add some hydrogen and oxygen to build a water molecule. So, let me do this really

molecule. So, let me do this really quickly. And here we go. Here is a water

quickly. And here we go. Here is a water molecule created. So everything is very

molecule created. So everything is very smooth. The animation is very seamless.

smooth. The animation is very seamless.

Next, let's try the next lesson, which is solid, liquid, gas. Here, it failed to load the image. And then here, we need to select the appropriate state of

matter. So, let me just randomly fill

matter. So, let me just randomly fill this out. And that works. And then

this out. And that works. And then

finally, the next lesson is on crazy reactions. So, let's mix some

reactions. So, let's mix some ingredients in the beaker. And here's

what happens. Let's click mix again.

This time let's mix vinegar with dish soap. And nothing happened. And then

soap. And nothing happened. And then

let's try dish soap and food color. You

get magical colorful bubbles. So that's

just one quick example. This is just from one prompt, but of course you can prompt it further to make more lessons or make things more detailed for whatever you want to learn. All right,

next let's see how good it is at coding up games. So I'm going to write make a

up games. So I'm going to write make a 2D platformer game similar to Super Mario. Make it look amazing. Put

Mario. Make it look amazing. Put

everything in a standalone HTML file.

Use publicly available assets, models, and effects for the game. All right, and here's what I got. Let's press start game. And you know, this looks really

game. And you know, this looks really similar to Super Mario. It even has sound effects. What happens if I fall

sound effects. What happens if I fall down here? And I do get a game over. So

down here? And I do get a game over. So

everything works as expected. So, you

know, in just one prompt, it can create a fully functional 2D platformer game with enemies, with coin collection, with sound effects, and of course, I can prompt it further to add multiple levels

or to customize how the game works. Now,

as I'll talk about in a second, Gemini 3.1 is incredibly good at scientific knowledge and reasoning. So, here's a medical example. Analyze the molecular

medical example. Analyze the molecular pathology of distrophen loss in this muscular distrophe. Compare therapies.

muscular distrophe. Compare therapies.

Evaluate long-term functional outcomes from recent trials. Include nice tables and charts. And this time, I'm going to

and charts. And this time, I'm going to turn off canvas. And let's just press run and see what it gives us. And here

is what I got. So here it gives me the molecular pathology of this distrophen loss etc etc and then here is a comparative analysis of DMD therapies

with a very nice and thorough table jam-packed with information. You can

even export this to sheets. And then

here we have some long-term functional outcomes. What I like about Gemini 3.1

outcomes. What I like about Gemini 3.1 is it's quite concise. There's no BS.

There's no filler stuff. It just answers my question very directly and gives me the information that I need. And then

here's another table visualizing the functional divergence. You can also

functional divergence. You can also export this to Sheets. And that's pretty much it. So very short and concise.

much it. So very short and concise.

There's no correct answer to this. Every

top AI model out there can help you, you know, synthesize information. So it

really depends on what vibe you're looking for. It seems like Gemini 3.1 is

looking for. It seems like Gemini 3.1 is a lot more concise, whereas some other AI models are a lot more detailed and thorough. So those are some of my quick

thorough. So those are some of my quick demos. Next, here are some specs on

demos. Next, here are some specs on Gemini 3 Pro. As you can see here, it can take in text, images, audio, and video. Plus, it has a context window of

video. Plus, it has a context window of up to a million tokens. This is

basically how much information you can fit into your prompt at once. And Gemini

has by far the widest window of a million tokens, which is roughly 700,000 words or a medium-sized codebase or over an hour of video. Many of the other top AI models have a much smaller context

window. And then here it says the

window. And then here it says the architecture of 3.1 is based on Gemini 3. This is just like a marginal

3. This is just like a marginal improvement. It's nothing huge. And then

improvement. It's nothing huge. And then

here it says, "Currently, you can access this via the Gemini app, which is where I showed you most of my demos in this video. Plus, you can also already access

video. Plus, you can also already access it in Notebook LM, which is a really helpful platform for you to learn things and generate study notes. And then for developers, this is also available via

Google's AI studio, Gemini CLI, as well as Google's IDE, which is called Anti-gravity, which is very similar to Cursor, but it's just powered by Gemini.

And then it's also available in Android Studio and these other enterprise platforms. Next, let's look at how good this actually is. So, here are some benchmarks here. They're comparing it

benchmarks here. They're comparing it with the top models out there, including Opus 4.6, Thinking Max, plus GPT 5.2X.

to extra high plus GBT 5.3 codeex extra high. But as you can see across most of

high. But as you can see across most of these benchmarks, Gemini 3.1 Pro is just better. So for humanity's last exam, it

better. So for humanity's last exam, it scored the highest without any tool use.

This is a benchmark that tests an AI's knowledge on some really obscure stuff.

For example, here are like a few sample questions from this humanity's last exam benchmark. Most people won't know the

benchmark. Most people won't know the answer to this. It's some really deep and obscure scientific knowledge. But as

you can see, Gemini 3.1 Pro without any tool use, without, you know, doing web search or anything like that, scored by far the highest. So, this has a lot of world knowledge jam-packed into it. And

then for Arc AGI 2, this is a crazy lead. I don't think this table does it

lead. I don't think this table does it justice. So, here's another table of the

justice. So, here's another table of the ARC AGI 2 leaderboard for your reference. And the Y-axis is the score

reference. And the Y-axis is the score or how good it is. You can see Gemini 3.1 Pro is all the way up here, even beating Opus 4.6, six, which is covered by this label. It's like somewhere over

here. And then the rest of the models

here. And then the rest of the models are just like way down here. Now, why is this so important? If you're not familiar with the ARC AGI2 benchmark, this is basically how good an AI is at

solving visual puzzles. But it's more than that. So, here are some examples.

than that. So, here are some examples.

It's first given a question answer pair.

For example, here the correct answer is all the blue squares should gravitate to the left, whereas all the red squares should gravitate to the right like this.

and then it's given a new question and it needs to figure out the answer from this pattern. Now, this isn't just about

this pattern. Now, this isn't just about solving visual puzzles because for an AI model, it can't really learn new things.

So, after training its configuration of weights and parameters are fixed, so it's really hard for an AI model to actually learn and absorb new patterns and then apply it to its answers like

this. And that's why most models rank

this. And that's why most models rank pretty poorly on this benchmark. But as

you can see, GPT3.1 scored all the way up here. So it seems like it has some emergent ability to actually learn and absorb new patterns that it has never seen before in its

training data. In addition, it just

training data. In addition, it just absolutely dominated this GPQA diamond benchmark. This is like graduate level

benchmark. This is like graduate level scientific knowledge. It's also

scientific knowledge. It's also incredibly good at terminal bench and other agent coding benchmarks. And it's

also really good at long context performance. In other words, if you feed

performance. In other words, if you feed it a ton of stuff up to like 700,000 words, it's still able to analyze everything and give you the information you're looking for very accurately. Now,

these are just some of their self-reported benchmarks. Next, let's

self-reported benchmarks. Next, let's look at some independent evaluators as well. So, here's an independent

well. So, here's an independent leaderboard, and as you can see, Gemini 3.1 Pro is by far the most intelligent model according to their intelligence

index. It far surpasses Opus 4.6 6 max

index. It far surpasses Opus 4.6 6 max as well as GPT 5.2 extra high, making it currently the best model out there you can use. Not only that, but it's also

can use. Not only that, but it's also very costefficient. So even though it's

very costefficient. So even though it's smarter than Claude Opus or GPT 5.2 or Gro 4, notice that it's even cheaper in price, making it the most performant and

the most costefficient closed model out there. Now, interestingly, if you look

there. Now, interestingly, if you look at this other leaderboard called LM Marina, then Gemini 3.1 Pro does not perform as well as Opus 4.6 in terms of text. And then in terms of coding, it's

text. And then in terms of coding, it's actually ranked all the way down here, even below GPT 5.2. Same with Vision. It

even underperforms the previous Gemini 3 Pro. So, I'm getting some mixed results

Pro. So, I'm getting some mixed results between different leaderboards. And

that's why it's important to look at multiple leaderboards to get a good sense of how good a model actually is.

for other leaderboards like Swebench and Livebench. Unfortunately, they haven't

Livebench. Unfortunately, they haven't released Gemini 3.1 Pro yet at the time of this recording. But, you know, another really awesome feature about Gemini 3.1 is that it has a really low

hallucination rate. So, here is the

hallucination rate. So, here is the artificial intelligence omniscience hallucination rate. And as you can see,

hallucination rate. And as you can see, Gemini 3.1 hallucinates way less than some of the other top models out there like Claude Opus 4.6 6 as well as GBT

5.2. So, it's less likely to make stuff

5.2. So, it's less likely to make stuff up and give you the wrong answer.

Although, this still happens like a certain percentage of the time. If you

do want a model with the lowest hallucination rate, then the best one to use is the open- source GLM5, which was released last week. This is really good.

One thing to note is that this score of 50% does not mean it hallucinates 50% of the time. This just means it got 50%

the time. This just means it got 50% wrong on this benchmark. All right, so that sums up my review of Gemini 3.1 Pro. This is currently one of the most

Pro. This is currently one of the most intelligent and performant models that you can use right now, and it's already available in the Gemini app. So, try it out and let me know what you think of

it. As always, I will be on the lookout

it. As always, I will be on the lookout for the top AI news and tools to share with you. So, if you enjoyed this video,

with you. So, if you enjoyed this video, remember to like, share, subscribe, and stay tuned for more content. Also,

there's just so much happening in the world of AI every week. I can't possibly cover everything on my YouTube channel.

So, to really stay up to date with all that's going on in AI, be sure to subscribe to my free weekly newsletter.

The link to that will be in the description below. Thanks for watching

description below. Thanks for watching and I'll see you in the next one.

Loading...

Loading video analysis...