Google Gemma 4 Tutorial - Run AI Locally for Free
By Teacher's Tech
Summary
Topics Covered
- Local AI Means Zero Data Leaves Your Machine
- Free, Unlimited AI With No Subscription
- Local AI Now Understands Images and Audio
- 8 GB RAM Is Enough to Run Powerful AI
- This Receipt Shows the Real Power of Local AI
Full Transcript
What if I told you that you could run a powerful AI model right on your own computer, completely free, no internet required, and no data ever leaving your machine?
That's exactly what Google just made possible with Gemma 4, their brand new OpenAI model that just dropped a few days ago. In this video, I'm going to show you what Gemma 4 is, why you might want to use it, and how to exactly get it up and running on your computer step by step. And if you're not ready to install anything yet, I will also show you
how you can try it right now on your browser first. No downloads needed. Let's get into it.
Gemma 4 is a family of AI models made by Google. Think of it like a smaller portable version of the technology behind Google's Gemini. The big difference is that Gemma 4 is designed to run locally, meaning right on your computer, your laptop, even a Raspberry Pi. Now, here's
the part that matters to most people. When you use something like ChateBT or Claude or Gemini online, your questions and data get sent to a server somewhere in the cloud. With Gemma 4 running locally, everything stays on your machine. Nothing gets sent anywhere. That's a big deal for privacy. And it's completely free. No subscription, no API key, no usage limits.
You download it once and use it as much as you want. Now, it comes in four different sizes. And
I want to quickly break these down because it will help you decide which one to use.
The first two are the small ones, E2B and E4B. These are designed for phones, tablets, and laptops with limited resources. The E2B models can run on as little as 5 GB of RAM.
Then you have the 26B model. This one uses something called a mixture of experts, which is basically means it's a big model, but only activates a small portion of itself at a given time. So, it punches way above its weight. If you have a decent desktop with 16 to 20 GB of RAM,
time. So, it punches way above its weight. If you have a decent desktop with 16 to 20 GB of RAM, this one's worth trying. And finally, there's the 31B model. That's the full-size flagship.
You'd want a machine with at least 20 gigs of RAM or a dedicated GPU for that one. For most people watching this, I'd recommend starting with the E4B model. It runs well on most modern computers and gives you a great sense of what Gemma 4 can do. One more thing worth mentioning, Gemma 4 is not just a text model. It can also understand images. So, you can feed it a photo or screenshot and ask
questions about what it sees. The smaller E2B and E4B models can even process audio. Now, before we install anything, let me show you the fastest way to try Gemma 4 right now in your browser.
Head over to a studio.google.com. I'll put the link to it down below in the description.
You'll need a Google account to sign in, but it's completely free. Once you're in, you need to look for the model selector. I'm going to go and open up the side panel up here just to click on it.
Yours might be on by default. And in my case, I'm just going to click on right here where it says Gemini 3 flash. And you can see all the different model selections here. And Gemma is right here.
So now notice I have the 26B option. I also have the 31B option. I'm going to go ahead and select Gemma for 26B. Now let's test it out with a few prompts so you can see what it's capable of.
I'm going to give it this prompt to start. Explain how a mortgage works as someone who's never bought a house before. Keep it simple and practical. Send prompt. Now you can see it gives a very clear, wellorganized answer. No jargon, just a straightforward explanation. Let's do another
wellorganized answer. No jargon, just a straightforward explanation. Let's do another quick test. Write a professional but friendly email declining a meeting invitation because
quick test. Write a professional but friendly email declining a meeting invitation because scheduling conflict keep it short. And a few seconds later, I get three different options. All
very short to the point that I can copy and paste. All right, let's try an image. Now, I'm going to upload this receipt that you see right here. I'm just going to drag it over and not acknowledge.
And I'm just going to say this. What information is shown in this image? Summarize the key points.
I wonder if they'll tell me about the tip. And just like that, it reads and interprets the image for you. This works with charts, screenshots, handwritten notes, documents, all sorts of things.
for you. This works with charts, screenshots, handwritten notes, documents, all sorts of things.
So that's Google AI Studio. It's a great way to test things out, but the real power comes when you run it locally on your machine. So let's go set that up now. To run Gemma 4 on your computer, we're going to use a free tool called Olama. Olama makes it really simple to download and run AI models locally. No coding experience needed. Head over to.com. Right, just click the download.
You'll see the options for Windows, Mac, Linux. If you're in Windows, download the installer, run the .exe file, and follow the prompts. It's a standard installation that goes through next, next, finish.
.exe file, and follow the prompts. It's a standard installation that goes through next, next, finish.
If you're on a Mac, download the zip file, unzip it, and drag the Lama app into your applications folder. If you're on Linux, you can install it with a single command line in your terminal.
folder. If you're on Linux, you can install it with a single command line in your terminal.
Once it's installed, go ahead and open Alama. You'll see it as an app on your computer.
And once you go and open it, it might be open by default after you install it. But you're going to see this clean interface. It looks like chat GBT or any other AI chat tool that you used before.
There's a message box right here. And then you'll notice over here we can select a model. Now,
here's the exciting part. We're going to click on this model button. And then you're going to see the drop down with the search bar at the top and a list of all the available models.
Now, if I go ahead and look for the Gemma ones for four, I don't see that yet. So, I'm going to do a search. I'm going to go and type in Gemma. And I'm just going to put the number four. And here we go.
search. I'm going to go and type in Gemma. And I'm just going to put the number four. And here we go.
Uh, I have a download button right beside it. I'm going to go ahead and click on it. Now,
with this, you'll see that there is a download button beside it. And when I click on it, I'm not sure if anything is happening or if it's running the back. So, what I'm going to do is go uh to my search and I'm going to search for my command prompt and just open this up. And I'm
just going to go and put this in pull gemma 4. So, I'm going to hit enter. Now, at this point, it will show me uh if it was pulling it or where it is or start to pull it if it wasn't before. So,
this is getting it installed on my computer. You can see the size of this is quite large. It's at
9.6 GB. Okay, it looks like everything is finished downloading here. I have success. Now, I want to point out that I just did the default one when I put in that command. If I wanted the specifics, if I wanted a 31B, I'd have to type in something like this. Olama pull Gemma 4 31B or depending if it
was the E2B or 26B. So, I just chose the default one. Now, if I go ahead and open up my Olama uh and let me position this a little bit better here and take a look at it. I have Gemma 4 and notice that the download is gone from here. So, I'm all ready to give it a test. For reference,
I'm running this on a Lenovo Legion T7 with 32 gigs of RAM and an RTX 4080 GPU, but you don't need anything close to this. The default model runs fine on most computers with 8 gigs of RAM.
All right, let's put Gamma 4 through its paces with some real prompts so you can see what it can do running right here on my computer. I'm going to start with this. Write a parentfriendly explanation of why screen time limits matters for kids age 8 to 12. Keep it under 150 words.
Notice it's generating response right here in the app. The speed will depend on the hardware. If you
have a GPU, it will be faster, but even on just a CPU, it works. It just takes a little bit longer.
And there we go. It thought for about 3.8 seconds. We have the list of different what reasons why that we shouldn't be using screen time so much for kids 8 to 12. Let's try another quick one.
I have a meeting with my principal about next year's budget. Give me five smart questions I should ask to advocate for my technology funding in our school. All right. And a few seconds later, I have this the equity and access question and a little bit of information. the professional
development question, the infrastructure, I have four and five. And pro tip for delivery. Okay,
let's test how it can read images. I'm just going to use the exact same image I did in Google AI Studio. I'm going to drag it into it and then just ask the same question and let's submit.
Studio. I'm going to drag it into it and then just ask the same question and let's submit.
And look at that. It read the picture. Business name, location, transaction details, purchased, and cost. Everything is right there. And that came back fast.
and cost. Everything is right there. And that came back fast.
Let's give this one a try. Write a simple HTML page with a button that changes the background color to a random color each time you click it. Include the CSS and JavaScript in the same file.
All right, I went and took this code that I did and I copied past it just into notebook and saved it as an n uh HTML file and open it up and got this. Let's give it a try now. Change the
background and every time I click it, it's doing exactly what I asked it to create. Okay, let's
give this prompt a try. Now, this is the type of reasoning test that really shows off what a model can do. It has to do math, optimize, and explain its thinking. Let's go and hit submit. Okay, let's
can do. It has to do math, optimize, and explain its thinking. Let's go and hit submit. Okay, let's
see how it did. And the thing I want to point out with this question is this. No empty seats. So,
this is a classic optimization problem. Cost efficiency. Let's see. It broke it down first. It
figured out per student uh how much the cost would be for bus versus van. Um okay, this is important that it thought about this right here. Since you cannot purchase a fraction of the bus, you must uh round up to 12 buses. Okay, so it brought out at $3,300, which is correct, the 275* 12, and that
would get everybody there. [snorts] But the thing is, it's no empty seats. And when I look at this right here, to transport exactly 450 students, no empty seats constraint. The solution above calculates a minimum cost to transport at least 50. So it's saying it's mathematical impossible
to transport exactly. Well, I know it's not. What about what about nine buses and nine vans? Okay,
looks like we're just having a little bit of an argument about thereafter the cost effective solution versus what the question is asking. And I'm okay with that. uh that it understands that it it does equal the 450. But it's interesting when I put this into Gemini Pro, it does come back with the nine buses and nine vans based on it. So just a different way of thinking of it,
but it's still doing the math and breaking down everything for you. Now, if you wanted to run this these exact same questions inside command prompt, you can do that. What you would need to do is open up command prompt and you could go run Gemma 4 and as soon as you do that that will start it running
and then you can give it a prompt. So if I give it one of the same ones I did before like this I have a meeting with my principal and I can just hit enter and you can see it work through the process just like it was being in the Alama app before. When you're finished just put forward slashby.
So that's Gemma, a free open AI model from Google that you can run right on your own computer.
To recap what we covered, we looked at what Gemma is and why it matters. We tried it out in Google AI Studio without installing anything. Then we installed OAMA, downloaded the model, and ran it locally with several different types of prompts, including image understanding. If you want to take this further, try downloading a larger model size. If you have a more powerful machine, the 26B model
gives you a noticeable jump in quality, especially for complex reasoning and longer writing task. I'd
love to know in the comments, what would you use a local AI model like this for? Is it the privacy, the cost savings, or just the idea of having AI available offline that interests you most? Thanks
for watching this week on Teachers Tech. I'll see you next time with more tech tips and tutorials.
Loading video analysis...