LongCut logo

How To Run Open Source AI Models

By Tina Huang

Summary

Topics Covered

  • Open Source AI Matches Closed Source
  • Any Laptop Runs Small Open Models
  • Free Google GPUs Hide Data Risks
  • VPS Enables Private Model Scaling
  • Edge AI Ships Inside User Devices

Full Transcript

In this video, I want to explain every major way to run open source AI models because it's a common misconception where people think that it's really hard to run open source models. Like you need really fancy hardware, need to know how

to code, and that it's just like overall really hard or something like that, which is not true. Might have been true like a few months ago, but not anymore cuz building with open source models seems to be where the industry is headed now. So, I'm going to introduce four

now. So, I'm going to introduce four major categories of different ways to run open source models, rank them from easiest to hardest, and demo them to. I

will also include two bonus advanced categories for people who want to get ahead of the curve. Now, without further ado, let's go. A portion of this video is sponsored by Whisper Flow.

All right, just to make sure we're on the same page here, I'm going to briefly describe what are open source AI models.

The definition of open source AI models refers to models in which some or all of the core components are publicly available, typically including model architecture, model weights, training code, or inference code, and licenses that allow use modification and

redistribution. I'm going to put on

redistribution. I'm going to put on screen now some of the most popular open source models as of today in March of 2026. And the reason why we care about

2026. And the reason why we care about open source AI models is that these days they are just as good as closed source models now and has three core benefits.

The first is that you have full control on where you're running these models. It

can be local on edge private cloud.

We're going to be exploring these options a little bit later in the video.

The second is that it's customizable.

You can fine-tune them, modify architecture, add different guardrails.

And the third, which is probably the most important one, they are free to use and building with open source models generally incurs much lower longerterm cost, especially at scale. Cool. That

was my brief little intro. I made an entire video explaining different open source models, different frameworks, and things like that. So, please go check it out if you're interested in learning more about that, which I will link over here. But in this video, I want to

here. But in this video, I want to specifically focus on how to actually use these open source AI models, how to run them. Category number one is local.

run them. Category number one is local.

Running an open source AI model locally just refers to having that AI model actually downloaded onto your own machine and running onto your own machine as well. It's private as in everything stays on your machine like it's not sending its data to like other

people. It's free other than the initial

people. It's free other than the initial cost of the hardware you're running it on and the electricity bill and it's offline like you don't need to be connected to the internet to be able to use the model. This option is best for people who care about it being private, free and offline. Also, a lot of people

choose to build locally first before hosting it somewhere else so that other people can access it. Now, the easiest way to run an open source model locally is just to download a desktop model management app like Olama, for example.

Just go to Olama, download it, install it, and then you can pick from a variety of different models that you can download directly to your machine. Once

it's downloaded, you can start chatting with it. Yay. Literally takes like 2

with it. Yay. Literally takes like 2 minutes depending on the size of the model. And speaking about the size of

model. And speaking about the size of the model, I think where a lot of people get hung up on is that they don't know if they have the hardware requirements to be able to run these open source models. Well, I don't think people

models. Well, I don't think people realize how accessible these open source models are now. Like any usable computer can probably run the smaller models, like a 4B model, which are actually surprisingly good for how small they

are. So, I'm going to put on screen now

are. So, I'm going to put on screen now the hardware requirements to run different sizes of models. And just as a reference point, my primary day-to-day computer is a MacBook Air. 13-inch M4

chip 2025 with a 16 GB memory. I can run any 4B models, no problem. And I can run most 8B models as well. As long as I'm not like also trying to do very intense things like edit videos or like live stream or something at the same time.

Cool. Easy. Right now, if you want to be a little bit more advanced now and instead of just chatting with them, you actually want to use them with other software or build softwares yourself using them, then you will move on to the

medium difficulty workflow, which is calling the open- source local model using your own code. So, what you need to do is to install Lama locally if you haven't already, and then download a model that you want to use. Then in

order for you to get your code to be able to use these models, you need to call localhost at port 11434, which is the default port that Olama uses. So if

you want to run openclaw or even build your own agents or whatever other software and you want to use those open source AI models, this is how you make it work. Software goes knock on room

it work. Software goes knock on room 11434. Olama answers the door and

11434. Olama answers the door and provides the model that you want. And

yay, it works now. Oh, by the way, some of you may have heard or you yourself already done this um of people who are buying Mac minis like like this one over here. This one is in a case, but it is a

here. This one is in a case, but it is a Mac Mini. They are using these Mac minis

Mac Mini. They are using these Mac minis in order to run their open source AI models and agents and softwares like OpenClaw. So, the reason why they're

OpenClaw. So, the reason why they're doing this isn't because it's like a whole different workflow. It's actually

the same thing as running it on your local computer on your laptop for example. But the downside of running it

example. But the downside of running it on your laptop is that if you close your laptop or if you start doing like intense things like start video editing or something, you might run out of memory so that you're not able to continue running these open- source

software and models anymore. There would

be disruptions to this. That's why a lot of people choose to run their open source models and their software using another machine like this Mac Mini instead. You can have this running 24/7

instead. You can have this running 24/7 and don't have any disruptions. Also,

another big thing is that a Mac Mini is a lot more powerful than probably your laptop that you're using. So, you can also run like fancier and bigger models on here as well. Make sense? Same thing,

but just put in Mac Mini instead. Let me

know in the comments if you have a Mac Mini or you want to buy one. Cool. I

want to move on now to a even harder workflow in this local category of running open source models. And that is if you say you're building your own AI agents or software that's using these open source models and you want them to

be accessible by other people as well, not just like by yourself on your machine, then you need to figure out how to host it on the internets. The

simplest way of doing this is using something called a Cloudflare tunnel, which is basically like poking a hole in your software and then connecting it to the internet so other people can use it.

You probably don't want to be doing this in production where lots of strangers are going to be using your thing, but if you're using as a demo, that's totally fine. So, just wanted to put that out

fine. So, just wanted to put that out there. I will be talking a little bit

there. I will be talking a little bit later in the video how it is that you can have the models actually run locally while building out like an entire application that is secure and safe and being used by all the other people on the internet. But first, let me finish

the internet. But first, let me finish off this category with one of the hardest workflows, which is if you want to fine-tune your open source AI models locally. This does involve a lot more

locally. This does involve a lot more hardware requirements. Like you

hardware requirements. Like you definitely need a GPU for this and you need something like Unsloth to help you do this. I'm not going to go into too

do this. I'm not going to go into too much detail about this because this is a more of an advanced topic. But just for completeness, I do want to explain that yes, you can actually fine-tune open source models locally on your machine.

All right, great. First category done.

I'm going to put on screen now a little summary slide including some of the tools that you might want to check out if you're interested in running or building with open source models locally. If you haven't learned how to

locally. If you haven't learned how to promp properly in 2026, well, I don't really know what to tell you cuz it is literally the single most foundational important skill that you need for any AI interaction. The gap between someone

interaction. The gap between someone that gives a lazy twoliner prompt versus someone who gives a crisp, clear prompt is massive. But typing a truly detailed

is massive. But typing a truly detailed prompt is a lot of effort and is very tedious. I get it. So, we cut corners.

tedious. I get it. So, we cut corners.

That's why Whisper Flow is my go-to tool. You just speak your prompts

tool. You just speak your prompts instead of having to type them out. And

you naturally give so much more context.

If you know me, you will know that I'm very much an audio person, and it's not like your phone's built-in dictation or to voice input inside or clot. Those are

fine for those short, simple one-off things. But flow is faster, more

things. But flow is faster, more accurate, and way more personalized.

Flow works on mobile as well, which I especially loved using when I'm brainstorming using Claude. I just click here. Help me brainstorm different

here. Help me brainstorm different workshop ideas that Lonely Octopus students would like. It's formatted so much better than if you're just directly talking to Claude. And because it works across any app, website, or device, you

are not locked into any platform. It

just works wherever you type. My

personal favorite is file tagging. I

just say the file name and warp my AI coding tool and flow tags it by voice.

No more stopping method. Seriously,

coding with audio is so magical. Could

you change the agent handoff from the pattern agent to the phone agent and then to the email agent?

Oh, and it does work while whispering as well, hence the name whisper. Wow. Which

is great if you are in a coffee shop or in the office when you need to be discreet. Hey guys, do you know why

discreet. Hey guys, do you know why we're having this meeting in 10 minutes?

I have no idea.

Try it for free for 14 days with my code Tina Huang and you get an extra free month off pro. Thank you so much Whisperflow for sponsoring this portion of the video. Now, back to the video.

Next up, I want to talk about the category of using open-source models for people who just genuinely don't have the hardware or just don't want to run stuff locally, and that is browser/hosted playground solutions. Of all the

playground solutions. Of all the different categories, this is the easiest way for you to use open source AI models. Instead of having to download

AI models. Instead of having to download yourself, somebody else has done that for you and they've hosted it for you.

So all you have to do is just show up and use the model. No setup needed and no hardware needed either. This category

of using open source AI models is good for people who are genuinely just like learning, experimenting, and exploring.

Really low commitment and super fast to get started. And the easiest way to do

get started. And the easiest way to do this is just to go to some websites where you can access other people's open source AI models like arena.ai for example or like gro.com gro.com. You

just pick a model that you want, start chatting with it. You don't need to sign up for anything and it's mostly free.

There's not that much additional functionality to this and it's not private as well. So be careful about what you're putting into these, but it does allow you to experiment with different open source models and compare them. Another relatively easy workflow

them. Another relatively easy workflow in this category is going to hug spaces where they have hosted open source AI models and you can directly play around with them um on the browser as well. Now

for those of you in the education space and you want to like put up some demos uh so that your students can play around with it, I have a workflow for you and that is by using a collab notebook. I

would consider this like a medium level workflow, slightly harder. Google Collab

is where you can write out these different notebooks and you can share with your students so people can run the code line by line. It's commonly used in the education space. So you need to open Google Collab and enable the GPU

runtime. You do get a free GPU T4 that

runtime. You do get a free GPU T4 that you can borrow and use from Google during the session. Then you can just pip install transformers, follow the instructions, and be able to run your open source models using the borrowed GPU from Google. You can also fine-tune

your models using the borrowed GPU.

There's an unsolved collab template which I'll actually link in the description below which you can follow, upload your data sets, and then you can go ahead and fine-tune your models. So,

caveat to this workflow because it sounds too good to be true that you get to just use a GPU for free. Well,

firstly, collab sessions do expire. So,

when it does expire, just realize that everything that you've done is going to disappear, including your fine-tuned model if you didn't save it. So, be

careful with that. Also, note that this is not secure, not private at all.

Whatever it is that you're doing, all the data that you're inputting, all of this is going back to Google. they're

not going to just rent you a GPU for free, right? So, be careful of that. I

free, right? So, be careful of that. I

would say that that is the major con for this entire category. And another major con is that it is rate limited, meaning that, you know, there's going to be times in which you don't get to use the GPU, where it's going to be really, really slow, and if you really want to

continue borrow it, you probably need to pay for it. Those are the trade-offs for something that is so easily set up and seemingly free. All right, I'm going to

seemingly free. All right, I'm going to put on screen now some of the tools that you might want to check out if you're interested in this category of using open source AI models. Also, some

examples, some pros and cons. Take a

screenshot. Cool. Say that you've now experimented a little bit using like hosted open source models from from other places and you've decided that you still don't want to actually run these locally or deal with having to download them and use them and host them

yourself.

>> No, but you do want to build stuff like software and agents using these open source models. Then you would be

source models. Then you would be interested in the next category of how to use open source AI models, which is the managed inference API. Category 2,

browser hosted playground, is for using other people's open source AI. This

category is if you want to build stuff with other people's open source AI. This

category is best for indie hackers and startups and people who want to do personal projects. You just want to like

personal projects. You just want to like ship fast, don't want to touch infrastructure, don't want to host these models yourself. So these workflows are

models yourself. So these workflows are actually the exact same as if you want to use code source AI. Just calling the API key. The easiest workflow is just to

API key. The easiest workflow is just to go sign up for something like Gro Q. I

don't know how to pronounce it. I'm just

going to call it Grock Q. To get the together AI or Fireworks AI, these are LM API providers or inference providers.

They host the open source AI models for you. So you just need to get the API key

you. So you just need to get the API key and then call the API within your code.

Literally like five lines of code. And

when you're ready, you just deploy your app using something like Railway, Versel, Hostinger, Heroku, and you're good to go. Here's an example of a software that we built within our company, Lonely Octopus, where we're using inference provider to have access

to open source AI models without having to host them ourselves. I do want to make a note that while you can like just take this API, put them into different no code tools and just switch them out as a different model, you probably would benefit the most from this category if you do know how to code because you want

to build like custom things probably.

All right, I'm going to put on screen out some of the tools that you should check out in this category if you are interested and some other little summaries as well. Take a screenshot.

All right, category number four, VPS, otherwise known as a virtual private server. A VPS is defined as a virtual

server. A VPS is defined as a virtual machine sold as a service offering dedicated isolated resources like CPU, RAM, and storage on a shared physical server. It is your own remote virtual

server. It is your own remote virtual server where you have control of things and you can manage everything yourself as well. You can think of it like you're

as well. You can think of it like you're just renting someone else's computer without having to like physically have it. You should probably consider this

it. You should probably consider this category of using open source AI models if you're getting more serious about building things. You'll be able to run

building things. You'll be able to run multiple models, software, and services from this one server. It's also best for builders who need privacy and data control, especially in sensitive industries like healthcare, legal, or finance. Don't want to be using other

finance. Don't want to be using other people's hosted AI models for that, right? Because don't really know what's

right? Because don't really know what's happening with the data there. If you're

also working with teams and thinking about scaling your products, might want to consider VPS, too. So, this category of workflows, start from medium because you do need to know how to code to really get the most out of your VPS. You

can rent a VPS from a lot of different providers like Hezner, for example, or like Hostinger. Usually costs like $5 to

like Hostinger. Usually costs like $5 to $10 per month. You SSH in, which is creating a little tunnel so you can access the server, your virtual computer, and then pretty much do whatever you would normally do on your local machine, like install a llama, get

a model, start building with it, and when you're ready to deploy so the rest of the world can use it. A lot of these VPS providers also make that really simple by helping you get the domain name, and then setting everything up and getting it up and running. Very

convenient. This is an example of a VPS that I have on Hostinger. Now, I do want to make a note that most VPS only come with a CPU. But if you want to run a larger model and do stuff and finetune or whatever with a larger model, you do

need a GPU. But do not worry, you can actually rent a GPU from services like RunPod or like Vast.AI. You can rent it hourly and call it with whatever app it is that you're building. This is a little bit more advanced. I'll label it

like more in the hard difficulty.

Another harder workflow which you might want to try out is if you want to run multiple models and multiple apps simultaneously on your VPS. This is when you probably want to explore and use what are called containers. This lets

you package your apps into convenient little environments, little containers, so you can run them very conveniently and have multiple of them all running simultaneously on your VPS without messing with each other. Docker is an example of this type of service. And

finally, I want to introduce another harder workflow in this category, which actually is more of a combination of category 1 local and category 4 BPS. And

this is when you have your open source AI models running locally on your local computer, but the surrounding software, the app that you're building using these models would be hosted on your VPS. This

is a pretty popular use case because you have your models secure, safe, you know, locally all your data is secure on your like Mac Mini for example. But you can also make it available to everybody else and take advantage of having things on

the internet through the VPS. This is

also very cheap. only need to pay for the $5 to $10 for the VPS and you don't need to go rent a GPU or anything like that because you're having your models running locally. There is a software

running locally. There is a software called Tail Scale that helps you connect your local stuff with your VPS stuff which I recommend you check out if you're interested. Great. I'm going to

you're interested. Great. I'm going to put on screen now a little summary and some of the tools that you might want to check out if you're interested in this category of using open source AI models.

Take a screenshot. I am now done covering the four major categories of how to run open source AI models. I hope

you guys have a better idea now of how to do this. But before I end this video, I do want to give you guys two bonus categories that are much more advanced use cases.

>> Oh, thank you.

>> Just in case any of you guys may find this useful. The caveat here is that the

this useful. The caveat here is that the majority of people will not be venturing into these categories. So don't worry if it kind of flies over your head here. So

the first bonus category is manage cloud solutions. This is when you have your

solutions. This is when you have your open source AI models on the cloud and the cloud manages all the infrastructure and it also helps you scale automatically. The key here would be

automatically. The key here would be scalability. You only be interested if

scalability. You only be interested if you're actually building something that you really need scaled up. It's best for startups and enterprise teams, very complianceheavy industries. If you have

complianceheavy industries. If you have an app that has a lot of usage and like unpredictable traffic, like 100,000 users, or if you have like a fine-tuned custom model that you want to deploy so that other people can use. Not going to go into too much more detail about this.

I'm going to put on screen a little summary of example use cases in this category and some of the tools you might want to check out if you are interested.

These workflows would all be considered hard and very hard workflows. All right,

final bonus category is ondevice/edge.

At the current moment, this is a pretty niche thing still, but I have a hunch that this is going to become really popular at some point in the near future. So, you heard it here first.

future. So, you heard it here first.

Okay, on device/edge refers to having a open- source AI model that ships inside an application so that the user's device is actually running this model and the entire application. This is most

entire application. This is most relevant when it comes to mobile applications. We're still in the

applications. We're still in the relatively early days right now and most developers that are building these are from big corporations. Very few indie developers that are using open source AI models such as Apple Intelligence on iOS

mobile devices or Samsung and Gemini Nano on Android devices. But I do think building in this category is quite tricky right now. So you need to make sure the models are small enough to be packaged and run correctly while at the same time of course still being able to

deliver the results that you want. Quite

tricky. You might want to check out this category if you are interested in developing mobile applications using open source AI models. It also doesn't have to be mobile. It can be desktop applications too. Ones that prioritize

applications too. Ones that prioritize privacy and safety and offline stuff. I

shall put a little summary on screen now. Some example workflows and tools

now. Some example workflows and tools that you might want to check out if you're interested in this category. Yay,

we are now finished. We have covered four major categories of ways to use and build with open source AI models and lots of different workflows inside as well, ranging from easy to medium to hard to very hard. and two bonus

categories too for those of you who are more advanced. I will put on screen now

more advanced. I will put on screen now a little quiz so that you can test out your ability to retain all the information that we just covered. Happy

building. Have fun with your open source AI models and I will see you guys in the next video or live stream.

Loading...

Loading video analysis...