Googles AI Boss Reveals What AI In 2026 Looks Like

By TheAIGRID

Summary

Topics Covered

Google Leads Full Omnimodels
Gemini Robotics Masters Multi-Step Tasks
Gemini Live Enables Real-Time Guidance
Genie3 Builds Persistent World Models
Agents Dominate Specialized Workflows

Full Transcript

Demis recently had an interview with Axios where he was basically stating what he thinks that the future of AI in 2026 is going to be. In this video, we're going to dive into everything and

we're going to break it down.

So, one of the first things that he says here that will happen in 2026 is full omnimodels. Before I dive into it, take

omnimodels. Before I dive into it, take a listen to what he says because it's super interesting and I think Google has this in the bag.

>> What does the next 12 months of progress look like? What do you believe that if

look like? What do you believe that if if we sit here a year from today and I would love to what will have changed in the world?

>> Um I think the things that that we're we're pressing hard on are the convergence of modalities. So you Gemini which is our main foundation model has always been multimodal from the

beginning. It takes images, video, text,

beginning. It takes images, video, text, audio and then can produce now increasingly produce those uh types of outputs as well. And I think we're getting some really interesting uh

cross-pollination by being multimodal.

One the best example of that is our latest image model Nano Banana Pro which I think shows some astonishing sort of understanding of visuals and it can kind of you know create infographics that are

really accurate and so on. So I think over the next year you're going to see that uh progress a lot and I think for example in video when that converges with the language models you're going to be see some very interesting

combinations of capabilities there. So,

of course, there are essentially six parts of the full omnimodel stack. You

have robotics, images, video, audio, 3D, and text. And the thing is is that

and text. And the thing is is that Google Gemini is rapidly advancing in all of these areas. And I think most people don't realize just how advanced Google is and why it's set to dominate

the future. I mean, when you look at

the future. I mean, when you look at this compared to any frontier AI company, I would argue it is very difficult to compete with Google on several fronts that they are pretty much the leaders in. I think the first one that we're going to get into is

robotics. Google are a little bit behind

robotics. Google are a little bit behind here, but Gemini Robotics 1.5 is probably going to get an update in 2026, and it's probably going to be pretty good because I've actually seen a lot of the robotics updates, and so far they

are pretty promising. We're introducing

Gemini Robotics 1.5, a new family of models to power the next generation of physical agents. Now, for the first

physical agents. Now, for the first time, they can solve longer multi-step challenges. Aloha, could you sort these

challenges. Aloha, could you sort these fruits into color matching plates?

>> Sure, I can certainly help you do that.

Put the green fruit into the green plate.

>> We enable it to think. It can perceive the environment. It will think step by

the environment. It will think step by step and then finish this multi-step task. Put the banana into the yellow

task. Put the banana into the yellow plate.

>> Although this example seems very simple, the idea behind it is really powerful.

The same model is going to power more sophisticated robots like a humanoid to do more complicated daily tasks. Hi

Apollo. Can you help me sort my laundry please?

>> Sure I can help with that.

>> Put whites in white bins and the darks in the dark bin. Beautiful. Okay. And I

can now switch this up on it. Sorry

Apollo. And it knows that I have done that and it will react. So you can see the whole flow of his thoughts here and it's thinking every time it acts.

>> Gemini Robotics 1.5 can break a broad task down, think, and then take action over a number of steps. Another step

change with Gemini Robotics 1.5 is that now all of our robots use the same model without needing to fine-tune it for different form factors. New Agentic

capabilities mean Gemini Robotics 1.5 can use the internet to answer questions and solve problems. >> Hi Aloha. Can you use my location and sort the objects into the correct

compost, recycling, and trash bins? To

perform the sorting task based on San Francisco waste guidelines, I will use the green bin for compost, the blue bin for recycling, and the black bin for trash. The task is complete.

trash. The task is complete.

>> Good job. Thanks.

>> Gemini Robotics 1.5 is a step towards bringing truly useful AI agents into the physical world, giving the community a powerful new tool to build the next generation of helpful robots.

>> Then of course, we actually do have images. And the image you saw prior, I

images. And the image you saw prior, I use Nano Banana Pro. And I think, you know, when we're talking about the combination and multimodality, what I mean here is that like most people don't realize that, you know, images with Nano

Banana Pro, it actually reasons like an agent. So when it's creating your image,

agent. So when it's creating your image, it actually makes I think I think it makes the image and then it adjusts it.

It looks at it, you know, it changes things and it's like, okay, this is actually the image that works and this is why it's so accurate. And this is why I'm saying, you know, you're going to get full convergence in many different areas. So, it's probably that they might

areas. So, it's probably that they might even add this, you know, technique to videos, to other things. Of course,

speaking of videos, we've got V3. So, in

fact, let me just turn the volume down cuz I don't want this to be too loud.

But this is uh, of course, a V3 video.

I'm pretty sure you've already seen a videos with V3. If you haven't, I'm going to show you guys now. So, this is, of course, an image to video. V3 is

still pretty much the leader when it comes to video demonstrations, and I think it's really, really good. And most

people just really don't realize just how good V3 is. And in the future in 2026, of course, V3 is probably going to be a lot better. Last time it took the number one spot. And I have no doubt that Google is once again going to do it

to the point where they take the number spot again and again. So it's going to be super interesting to see where that actually goes. Now, of course, we do

actually goes. Now, of course, we do have Gemini Live. This is Google's most underrated feature. This is the, you

underrated feature. This is the, you know, combination of, you know, multimodality, live speech, the ability to reason on the fly and actually help a human. And this demonstration, it is a

human. And this demonstration, it is a little long, so if you do want to skip it, that's completely fine. But this

demonstration a few days ago went viral because it was a guy who didn't know how to, you know, change an oil, do the oil change in his car. And he uses, you know, Gemini Live to be able to actually do this. And it was the first time that

do this. And it was the first time that on a mass scale we could see how individuals were in terms of being able to use AI, reason with it, and basically have the AI guide you things that you

otherwise wouldn't be able to do. So,

this is a quick demonstration. I think

it's around 3 minutes long. I personally

believe this guy probably already knew how to, you know, all change his car.

But think about what this is like in 2026 when you know the latency is better, the reasoning is better and it has even more knowledge and is able to reason about more difficult things in a

much better way. And once again, this is, you know, completely in competition with open eye, and I think they're doing a beautiful job.

>> Hi, I'm Gemini.

>> Hey Gemini, this is my 2009 BMW 335i.

Can you help me do an oil change?

>> Yeah, absolutely. What kind of oil and filter do you have on hand? And do you have the right tools like a wrench for the drain plug and an oil filter wrench?

>> Yeah, I got the Casual 5W40 recommended and the correct oil filter. And I have the correct tool for the oil filter housing.

>> Great. You've got everything we need to get started. Let's get the car lifted

get started. Let's get the car lifted and access the oil drain plug.

>> Okay. And where's the oil filter?

>> The oil filter is located on the front of the engine, right below the intake manifold. It's always a good idea to

manifold. It's always a good idea to remove the oil cap before you drain the oil.

>> Okay, great. I'll do that. Okay, up we go.

>> Okay, I got the car up in the air. Where

is the oil drain plug?

>> That's the plastic panel you'll need to remove. The drain plug is located

remove. The drain plug is located underneath that panel.

>> There's the panel off and looks like that's the drain plug. What size is that?

>> It looks like a 17 mm drain plug.

>> I'm going to pull it out now.

>> Make sure you have the drain pan in place to catch the oil. Let me know when it's drained.

>> Should drain it out. Once the oil completely drains, you can wipe down the drain plug area and put the new washer on the drain plug.

>> Put the new washer on the drain plug.

I'm going to go ahead and put it back on the car now. What's the correct torque spec >> for your N54 engine? The correct torque spec for the drain plug is 18 ft-lb.

>> Okay, that's 18.

Okay, there's the flap back on now. Now

I'm ready to drop the car.

>> You can lower the car and then we can focus on the oil filter. So, I got the oil filter removed now.

>> Okay. I see the old filter in your hand.

Before installing the new filter, you'll need to replace the large O-ring on the cap that comes with the new filter.

There's also a smaller O-ring at the tip of the cap that should also be replaced.

I see the two O-rings in the plastic bag. Go ahead and replace the ones on

bag. Go ahead and replace the ones on the filter cap with those. You can use a small flathead screwdriver or a pick to gently pry the O-ring off. Got it?

You're gently prying the O-ring off with the pick. It looks like you've got the

the pick. It looks like you've got the green O-ring in place.

>> Okay, got the new O-rings and the new filter in.

>> Perfect. Now that the oil filter cap is ready, go ahead and screw it back into the housing on the engine.

>> Okay, I'm just tightening down the oil filter housing. What was that torque

filter housing. What was that torque spec again?

>> Is actually 18 ft-lb. Go ahead and set your torque wrench to that setting and let me know when you're done.

>> There we are. That's 18 lb. Okay, I'm

ready to top up the oil. So, how much does an N54 take? It's 6.9 quarts, which is 6.5 L. Your N54 doesn't have a dipstick. You'll need to start the

dipstick. You'll need to start the engine and let it run for a few minutes to let the oil circulate. Then you can use the vehicle's electronic oil level sensor to check the level.

>> Okay, Gemini. The oil looks full.

>> It looks like you're all set with your oil change.

>> Another job well done. Thanks so much, Gemini.

>> You're very welcome. It was my pleasure to assist you with your oil change. And

so yeah, I I don't know about you guys, but I think this just goes to show what is coming in 2026 in terms of the model getting better. And that demonstration

getting better. And that demonstration really really surprised me and I think it it surprised a lot of people. Now,

another thing that Google talk about is of course world models. And I think world models are going to be the theme for 2026. Maybe I'll make a video on

for 2026. Maybe I'll make a video on what's coming in 2026. So world models, we're going to take a look at what the mess says.

>> I think the other things we're going to see over the next year and I'm personally working on is world models.

So we have this uh system called Genie Genie 3 which is like an interactive video model you can think about. So you

can sort of generate a video but then you can start walking around it like you're in a game or simulation and it stays coherent for a minute. I think

that's very exciting.

>> Now if you want to know what Google's world model is, you can just check out this video.

>> What you're seeing are not games or videos. They're worlds.

videos. They're worlds.

Each one of these is an interactive environment generated by Genie3, a new frontier for world models. With Genie3,

you can use natural language to generate a variety of worlds and explore them interactively, all with a single text prompt.

Let's see what it's like to spend some time in a world.

Genie3 has real-time interactivity, meaning that the environment reacts to your movements and actions. You're not

walking through a pre-built simulation.

Everything you see here is being generated live as you explore it. And

Genie 3 has world memory. That's why

environments like this one stay consistent.

World memory even carries over into your actions.

For example, when I'm painting on this wall, my actions persist.

I can look away and generate other parts of the world, but when I look back, the actions I took are still there.

And Genie 3 enables promptable events, so you can add new events into your world on the fly. Something like another person or transportation

or even something totally unexpected.

You can use Genie to explore real world physics and movement and all kinds of unique environments.

You can generate worlds with distinct geographies, historical settings, fictional environments, and even other characters. We're excited to see how

characters. We're excited to see how Genie 3 can be used for next generation gaming and entertainment. And that's

just the beginning.

Worlds could help with embodied research, training robotic agents before working in the real world, or simulating dangerous scenarios for disaster preparedness and emergency training.

World models can open new pathways for learning agriculture manufacturing, and more.

We're excited to see how Genie 3's world simulation can benefit research around the world. And I think it's crazy

the world. And I think it's crazy because to jump from Genie 2 to Genie 3 was pretty incredible. So I can imagine Genie 4, Genie 5, those kind of world models that are going to get super super

good powered by, you know, cross modality. I think it's going to get

modality. I think it's going to get really incredible with how good these world models get. So if you aren't, you know, familiar what this is, the these world models are basically like, you know, virtual worlds that you can explore and it has memory, has

reasoning. It's just it's just crazy.

reasoning. It's just it's just crazy.

Like you couldn't you wouldn't think of this, you know, in the past. I do think that it's probably very compute expensive to run but nonetheless it's still fascinating and of course we do have agent based systems which is what

Google is working on >> maybe the other thing is a agentbased systems so we I think the field's been talking a lot about agents but then they're not reliable yet enough to do full tasks but I think over the

>> so Google has been completely dominating the agent system if you aren't familiar this is the co-scientist this is the multi- aent system built with Gemini 2.0 O that basically works like a virtual scientific collaborator that helps

researchers generate and refine novel testable scientific capitals and research plans. And this is basically

research plans. And this is basically designed to mirror key steps of the scientific research method going beyond literature and search and summarization to propose original ideas and

experimentational strategies tailored to any scientist stated research goal. So I

think this kind of agent is going to be extremely popular in the future. Can you

imagine when this AI agent is so good that it's able to propose new things?

It's able to discover new things and test and verify new things. This is the entire agentic system. Of course, it's for scientists. Probably won't be used

for scientists. Probably won't be used by you and me, but it is something that I think once again Google have in their back pocket. Again, Google have this

back pocket. Again, Google have this code men agent and this is the AI agent that's developed by Google DeepMind. is

designed to detect, debug, and fix security vulnerabilities in code bases in particularly open source projects.

And this Lemoni leverages Gemini deep think models for reasoning advanced, you know, tools like dynamic analysis. It's

just basically another great agent. And

then of course, we've got Google's data science agent, which is an AI assistant that automates end-to-end data science work inside Google Collab and across Google's data platforms. And I mean, I think you guys are starting to get the

picture here with just how many different things Google is working on in terms of that 2026 road map. And I think we're going to see a lot of things from Google that of course probably aren't

even in this video, like a lot of different models and stuff, but I think for the most part, I've covered it. And

one of the things I also wanted to cover is Alpha Evolve. And this was, you know, basically like the other scientists, but this is of course the Gemini powered coding agent for scientific algorithmic

discovery. So, this is of course kind of

discovery. So, this is of course kind of like an AI scientist. So, with all of these agents and some of these I didn't even mention, can you just imagine how good Google is going to get in 2026?

Honestly, you're probably going to be surprised by everything they have to offer. So, with that being said, let me

offer. So, with that being said, let me know what you guys think about this video. I'll see you where the next

video. I'll see you where the next

Loading...

Loading video analysis...