How To Pick The Right AI Model
By Tina Huang
Summary
## Key takeaways - **Plane Analogy for AI Models**: Flagship models are like massive commercial airplanes: best capabilities but expensive and slower. Light models are like private jets: fast and cheap but limited capacity; mid-tier like Boeing 737s handle 80% of routes efficiently. [00:20], [01:12] - **Grok's Anomalous Speed & EQ**: Grok is a flagship model that's also fast, cheap, with 2 million context window, and users vote it most capable. It has the highest EQ, responding empathetically like 'It sounds incredibly heavy' to startup rejection feelings. [05:14], [06:50] - **Gemini Flash via Distillation**: Gemini 3 Flash, the best light model, retains 90-95% of Pro's capabilities through knowledge distillation, delivering fast executive summaries while Pro provides more depth. [10:27], [11:47] - **Claude Sonnet Workhorse**: Mid-tier Claude Sonnet 4.5 is the less fancy Opus: good balance, excels at building interactive web apps from scratch like lunar cycle visualizers, and action-driven. [13:16], [14:16] - **Llama Open Source Privacy**: Open-source flagship Llama 2.5 runs locally for free, ensuring privacy for sensitive tasks like financial analysis or email processing without third-party data leaks. [14:51], [15:46] - **Sonar Specialized Research**: Specialist Sonar model, based on Llama 3.37B, excels at research with citations, compiling FDA status, trials, and opinions on semaglutide from credible sources. [17:20], [18:01]
Topics Covered
- AI Models Mirror Airplane Spectrum
- Grok Delivers Flagship Speed Plus EQ
- Light Models Retain 95% Power via Distillation
- Mid-Tier Sonnet Builds Apps from Scratch
- Open Source Flagships Unlock Privacy
Full Transcript
There are a lot of AI models out there.
So, in this video, I'm going to explain each type of AI model, what they're good for, and how to pick the right one for your project. Now, without further ado,
your project. Now, without further ado, let's go. The easiest way, in my
let's go. The easiest way, in my opinion, to start thinking about different types of AI models is understanding the relationship between model capability, size, speed, and cost.
Let's make an analogy. Let me explain using planes. On one end of the
using planes. On one end of the spectrum, you get your fancy flagship commercial airplanes. They're massive,
commercial airplanes. They're massive, can carry a lot of people, and can go across entire oceans and navigate across different complex routes, but they're also very expensive and slower because
they got a lot going on. These are also like your flagship AI models. They are
massive and have the best capabilities, but are also more expensive and slower.
Then on the other end of the spectrum, you get your private jets. You have much smaller passenger capacity, like maybe 10 people, 20 people max, and you can't be crossing entire oceans, but they are
cheaper to operate and small and nimble and fast. These are like your light AI
and fast. These are like your light AI models, models that are optimized for speed. Then, of course, in the middle of
speed. Then, of course, in the middle of the spectrum, you have everything else like your Boeing 737s, the workh horses of the industry. A good balance of capacity, size, speed, and cost. It can
handle 80% of routes efficiently. And
chances are, if you're getting on a plane, there's a good chance it's going to be a Boeing 737. In the AI model world, you also have examples of these mid-tier AI models, the go-to models that you will use to handle like 80% of
your queries. And finally, to round out
your queries. And finally, to round out our amazing plane analogy, we have some specialized planes as well, like your search and rescue helicopters where your cargo only transatlantic planes. Just
like with planes, you also have AI models that are made to be specialized in very specific things. Now, let me give you some examples of each of these type of models, when to be using which one, because there's still subtle differences between them and some demos
of what is actually like to use them.
And by the way, by the time you watch this video, it's very possible that the models I'm talking about here are already going to be replaced by newer versions. And that's okay because if you
versions. And that's okay because if you understand the types of these different models, you would be able to categorize these new models in the correct category as well. We'll still be serving the same
as well. We'll still be serving the same function, just better. All right, let's start off with the flagship type of AI model.
I'm going to start off with Opens GPT 5.2, their flagship model. So, going to put some stats on screen here. I think
they're nice to know, but not super functional. The way that I like to think
functional. The way that I like to think about it, more from like a functional, practical standpoint, um, is that OpenAI's flagship models are very well-rounded. It can do multimodality,
well-rounded. It can do multimodality, it can do analysis, it can do image generation, it can pretty much do everything and is pretty good at chaining together multiple actions as well. Let me show you an example. I'm
well. Let me show you an example. I'm
actually going to be using Perplexity AI to show you GPT 5.2. Perplexity is a really great model aggregator. Like they
have a lot of different models here, including GPT 5.2. So, you don't actually have to pay for individual subscriptions, which get really expensive when you want to use different models. In my case, I actually don't pay
models. In my case, I actually don't pay for OpenAI um and Chad GPT anymore. So,
this is how I access OpenAI models through Perplexity. So, I'm going to
through Perplexity. So, I'm going to turn on the thinking mode as well. This
allows the model to reason through the steps and you usually yield better results from this even though it will be a little bit slower. Okay, so I'm going to attach 500 rows of customer feedback
and I will say analyze this customer feedback CSV. Group the complaints by
feedback CSV. Group the complaints by category. Draft a markdown customer
category. Draft a markdown customer response template for the complaints and then generate a banner for the upcoming workshop on AI agents in Paris, France.
And here are the details. So let us do its thing. Really showcases GPT 5.2's to
its thing. Really showcases GPT 5.2's to his ability of being a really good well-rounder. Cool. So, we can see that
well-rounder. Cool. So, we can see that we have the complaints analysis summary categorization of the complaints and different examples of them as well. Here
is the complaint response template as a markdown file. And here is the banner.
markdown file. And here is the banner.
I'm also going to put on screen now some examples that OpenAI's GPT 5.2 would really excel at. Next up, also in the flagship category is Anthropic's Claude
Opus 4.6. Here are some of its stats.
Opus 4.6. Here are some of its stats.
So, Opus is also a heavy hitter and it really specializes in two things, writing and code generation, but its major drawback is that it doesn't have good multimodality functionality. It
can't actually generate images directly.
It is also the most expensive and the slowest model, but oh my gosh, it is so good at code especially like that's what I use it the most for, which is amazing that you can
just use Claude Opus 4.6 six with the thinking mode um as part of perplexity without an additional subscription. So
I'm going to put some code here which is for an open- source agent workflow which I made actually in this video over here if you want to check it out. It is able to screen through my emails, categorize them, draft some responses for things
and also flag emails that need attention. So here I'm going to ask
attention. So here I'm going to ask Cloud Opus 4.6 fix to change this code so that the user can input their desired email um and have a simple dashboard front end as well that showcase the emails that are categorized so it looks
a little bit prettier and let it do its thing and there you go. This is much more user friendly being able to alternate the email that you want and you can actually see on a dashboard the progress that the agent is making now instead of just ugly terminal. Here are
some other examples that Claude Opus 4.6 would be particularly excellent at. Next
up is Grock. Here are some of its stats.
Gro is actually pretty special because it's considered a flagship model. In
fact, as of right now at least, users vote Grock as the most capable and most preferable model, but at the same time, it's actually also really fast, pretty cheap, and has a
really big context window as well at 2 million. By the way, you can think about
million. By the way, you can think about context window as the amount of data that you can give to Grock, whether in the prompt or just like as data that you wanted to analyze or go through. So at
like 2 million, you could be like chucking in an entire book for it to analyze and you could probably do it. So
yeah, without going into too much technical details about why this is the case, just know that Grock is actually somewhat of an anomaly in a good way.
People also especially like Grock because they find it to have the most EQ. Like for example, if we put Grock
EQ. Like for example, if we put Grock here and you ask it to talk about your feelings, it's like very empathetic.
Like I've been working 80our weeks for the past 3 months on a startup and just got rejected by every investor we pitched to. My co-founder wants to
pitched to. My co-founder wants to pivot, but I feel like we're giving up.
I just want to talk through my feelings.
And Grog says, "It sounds incredibly heavy to pour everything into the startup for three straight months of 80our weeks only to face rejection from every investor, you know, and it's like
wow." Like, it's very high EQ. It really
wow." Like, it's very high EQ. It really
is. So, just to show you what I mean, I'm going to use model council and perplexity. This allows me to put a
perplexity. This allows me to put a single prompt and then use three up to three different models to compare the results for this. So here I'm going to put GPD 5.2, Claude Opus 4.6. We don't
need to have the thinking mode. Don't
got to think here. And Grock 4.1. So
let's run these. So yeah, you can see how fast Grock already is. It's like
already saying like, wow, so quickly while GPD 5.2 and Claude are still finishing up. So this sounds incredibly
finishing up. So this sounds incredibly heavy. Pouring everything to the starter
heavy. Pouring everything to the starter for three straight months of 80our weeks only to hit a wall of rejections from every investor. It's no wonder you're
every investor. It's no wonder you're exhausted. That's not just physical
exhausted. That's not just physical burnout. It's the kind that seeps into
burnout. It's the kind that seeps into your bones where even the thought of another day feels like climbing a mountain with no summit in sight. And it
talks about like the terror of quitting.
It's valid to feel all of this at once.
Being definely loyal to the vision clashing with the raw fear of failure wrapped in sheer depletion. Yeah. So no,
it's very like empathetic. And then
let's see GPT 5.2. So GP 5.2 it sounds like a really brutal collision. You'll
be running at unsustainable pace for months and the moment you expect some kind of external validation relief, you get a wall of no. So, I mean, it's not bad, right? Like, it tells you to it
bad, right? Like, it tells you to it tells you like a bunch of stuff and it kind of like tries to tell you, oh, here's the things that you're feeling and it kind of like goes on like it does address these problems, but it sort of
is making a lot of assumptions for things and it's also like much more rambly. So, and then there's Claude Opus
rambly. So, and then there's Claude Opus 4.6. So, Claude says, "I hear you. Let
4.6. So, Claude says, "I hear you. Let
me just sit with this for a moment because what you're describing is genuinely one of the hardest places to be." And yeah, it says like 3 months of
be." And yeah, it says like 3 months of 80our weeks and it lists a bunch of these things. Here's why I think it's
these things. Here's why I think it's really hard about where you are. Pivot
isn't quitting, but it doesn't feel that way. When your co-founder says pivot,
way. When your co-founder says pivot, you might be hearing what we build wasn't good enough. And after 80our weeks, that lands like a judgment on you, not just the product. So yeah,
etc., etc. I mean, I feel like this is pretty decent as well, but I can definitely see why people would prefer Grock because it's like so much more about like emotions and compassion
without being too rambly. What do you guys think out of these three responses?
Which one do you prefer when you talk about your feelings? All right, I'm going to put some other example prompts that Grock would be particularly good at. Next up, Gemini 3 Pro from Google.
at. Next up, Gemini 3 Pro from Google.
Here are some of its stats. This is
Google's flagship model. Its performance
is pretty on par with the other flagship models, and it also has two things going for it. The first one is the context
for it. The first one is the context window as well. It's also at 2 million context window, so you can chuck in a lot of stuff there. And what makes it really stand out is its multimodality functions. Gemini Pro really excels at
functions. Gemini Pro really excels at multimodality. Analyzing pictures,
multimodality. Analyzing pictures, analyzing videos, generating pictures, generating videos, and it has really good character consistency across different generations. Let me show you
different generations. Let me show you an example. So, let's say generate an
an example. So, let's say generate an image of a professional woman named Sarah. So, let's do that. By the way,
Sarah. So, let's do that. By the way, pretty sad. Google Studio used to allow
pretty sad. Google Studio used to allow you to generate images for free using Gemini Pro, but not anymore. But
luckily, we can use Perplexity cuz I do not want to be paying for another subscription. Cool. So here's the image
subscription. Cool. So here's the image and let's see its real power when I say now generate the same woman but in a lot of different scenarios teaching in front of a whiteboard with AI diagrams working
on a laptop in a coffee shop leading a workshop with students in the background and recording a video tutorial at her desk. And there you go. Pretty amazing,
desk. And there you go. Pretty amazing,
right? Like there's character consistency across these pictures. Here
are some of the other prompts on screen which will really let Gemini 3 Pro shine. Okay, so there is one more
shine. Okay, so there is one more flagship model that I want to show you guys that is really special because it also belongs to an entire other category of AI models which I have not talked about yet.
>> All right, then keep your secrets.
>> So, I'm going to save this one for now and continue on finishing the mid-tier and the light tier model examples first.
Stay tuned. For now, let's go from the flagship models and go to the opposite side of the spectrum, the light models.
But so for these light models, they are made to be fast, really optimized on speed. They're smaller and a lot
speed. They're smaller and a lot cheaper, too. The trade-off, though, is
cheaper, too. The trade-off, though, is that capability-wise, it is not as good as the flagship models. So, I'm going to put on screen now some examples of light models, but the one that I want to focus
on and demo for you guys is the Gemini 3 Flash. This is the current best light
Flash. This is the current best light model there is. It has all of the benefits of being fast and efficient and cheap while still retaining up to 90 to 95% capabilities of the Gemini 3 Pro model. And that's because of a process
model. And that's because of a process called knowledge distillation.
Basically, Google took the Gemini Pro model and was able to distill that and then package it into the Gemini Flash model that is a lot smaller but still retains a lot of the capabilities there.
Let me show you. Going to use the model council again. And let's pick Gemini
council again. And let's pick Gemini Flash and Gemini Pro so you guys can see the difference. So let's download this
the difference. So let's download this really big report about global climate highlights in 2025 and we'll put in the prompt create an executive summary and three key insights from this report. Now
normally you know maybe I would be fine just waiting around for this but say I have procrastinated and I have a meeting in like a minute and I'm supposed to present this report. That's when I would use Gemini Flash. As you can see flash
is already done. Look at that. So global
climate highlight 2025 report. Here's
the executive summary for it. 2025 was
the third warmest year on record globally with an average temperature of blah blah blah. This year was characterized by historically high sea surface temperatures and it gives you the key insights of the first 3-year
breach of 1.5° C persistent ocean warmth independent Elnino and historic lows in global sea ice. Okay, so Gemini Pro is also done now. It did take significantly
longer and we see the results. So
executive summary from Gemini Pro, the year 2025 was the third warmest year on record. And it also says the first and
record. And it also says the first and the three key insights. The first
three-year average above 1.5 degrees C, a trio of record-breaking years of 2023, 2024, and 2025, and the persistent ocean heat without Elino. Yeah. Okay. So, you
can see that they're relatively similar to each other. Like Flash is a lot faster. Flash is a little bit more
faster. Flash is a little bit more brief. While for Pro, it does go into
brief. While for Pro, it does go into more depth, and it's also able to give you like more numbers specifically, right? And like more evidence to suggest
right? And like more evidence to suggest its claims. But yes, if I need something that needs to be done fast, I would go with Gemini Flash. I'm going to put on screen now some of the other scenarios in which you would probably want to be using Gemini Flash. By the way, if you
feel like the AI space is very overwhelming and you know you got to learn things, but you're not really sure what you should be learning, then I have good news for you. I have made this free resource where you can put in what it is
that you're interested in learning about, and it will give you a 28-day personalized road map for what it is that you should be learning to achieve your goals. So check it out if you like.
your goals. So check it out if you like.
Here's the link and also in description.
All right. So now we've covered both of the extremes of the spectrum. Now let's
talk about the midtier models.
So these models are actually the ones that you would probably be using like 80% of the time because there's such a good balance between all of these variables. I'm going to put on screen
variables. I'm going to put on screen now some of the examples of different midcare models from different companies.
But the one I do want to focus on to show you guys is Clot Sonnet 4.5. Here
are its stats. I personally really like Sonnet. It's like the less fancy version
Sonnet. It's like the less fancy version of Opus that's a little bit more usable, but it still has the really good like writing style of Claude Opus and it has really good coding skills as well, which
is what I spend a lot of my time on. My
favorite use case for Claude Sonnet is asking it to build something from scratch, like build an interactive web app that can visualize lunar cycles.
Because it's building from scratch, you don't need as much power, so you don't need to use Opus. Sonnet is perfectly capable of this. And there you go. Some
nice little visualizations. Pretty cool,
right? I also really love using Sonnet for doing analyses and then building interactive dashboards and visualizations, too. So, I'm going to
visualizations, too. So, I'm going to put on screen now some other examples that I think Claude Sonnet really excels at. Oh, and also like a word about the
at. Oh, and also like a word about the tone of Sonnet. I know a lot of people prefer Grock's tone, but I actually prefer Sonnet and just like Claude in general because I feel like maybe I'm
just like not that good at expressing my emotions. I find Grock to be almost like
emotions. I find Grock to be almost like too empathetic and too emotional.
while Claude models like Claude Sonnet are more like actiondriven like here's the solution for what you should be doing in your specific case and I prefer that better. But yeah, it is these
that better. But yeah, it is these mid-tier models that are the true workh horses that are running a lot of these AI queries and AI agents out there.
Okay, now let's actually go back to the flagship model that I skipped previously. The example I would give of
previously. The example I would give of this category is Kimmy 2.5. Can anybody
guess why I'm saying that this type of flagship model? because it also
flagship model? because it also represents an entire other category of AI models. Put in the description if you
AI models. Put in the description if you can guess why. Okay, I'm going to tell you now. That is because the Kimmy model
you now. That is because the Kimmy model can do all of the things that the flagship models can do, right? But it's
also open source. All of the other models that we've talked about previously are considered closed source, as in you can only access them through platforms and APIs. You don't actually have full control and privacy. While
open source models like Kimmy for example are ones that you can actually download into your local computer, do stuff with it privately and then also host it yourself if you want. This is
really really special because of two primary reasons. The first one is that
primary reasons. The first one is that it is extremely cheap. Like if you download it and just run it from your local computer and however much you want, it's completely free. Also privacy
reasons. So if there's certain documents and things that you don't want to be sending through thirdparty platforms, you are able to control this by using an open source model because all of that information would be retained on your local computer or if you decide to host
it yourself, you're still in control of where you're hosting it and what is happening where you're hosting it. Two
of the agents I've built using open source models like Kimmy, which I would never do if they were not open source, would be an AI agent that analyzes my financial statements and another one
that reads through my emails. In both of these instances, I do not want to be leaking data to third-party platforms. And I also don't want to be paying a lot of money having to run the model like so many times because, for example, I get
like over a 100red emails a day. That
would cost me so much money. I'm not
going to go into way more detail about this. Please check out this video over
this. Please check out this video over here if you haven't already where I go in depth about open source models and how to build with them if you are interested. Now, if you're using
interested. Now, if you're using Perplexity, you can also use the Kimmy K2 model here. So you can pick this but just know that this model is hosted by Perplexity in the US. So you can definitely prompt and ask it stuff
directly. Uh but just know that there is
directly. Uh but just know that there is a whole slew of other things that you can be doing with open source models which yes you can check out the video over here if you're interested in learning about. But even on this hosted
learning about. But even on this hosted version there is a clear advantage for Kimmy. I don't know how applicable this
Kimmy. I don't know how applicable this is for a lot of people. It's applicable
to me because I speak Chinese. It's
really good at Chinese because it's a Chinese model. I'm asking to draft a
Chinese model. I'm asking to draft a contract and then use English to explain it to show off its bilingual capabilities. I shall put on screen now
capabilities. I shall put on screen now some other examples that the Kimmy K 2.5 model and other open source models are particularly good at. Okay, almost done.
There is one more category left which is the specialist type models.
These are AI models that are specifically made to be very good at particular things like analyzing MRI scans and healthcare data, legal stuff or drug research. An example of a model
in this category would be the sonar model from Proplexity. This is a specialized model that is based on the open- source Llama 3.37B model that is particularly good at research and
citations. By the way, this is another
citations. By the way, this is another great use case for open source models.
You can turn them into more specialized models by doing stuff like fine-tuning, giving it a rag system and other tools and infrastructure. There's like so
and infrastructure. There's like so much. Not going to go into too much
much. Not going to go into too much detail here. That's probably another
detail here. That's probably another video about how to make specialized models. Anyways, I want to show you guys
models. Anyways, I want to show you guys a demo of the sonar model. Ask it a researcher specific question of what are the current FDA approval status, clinical trial results, side effects, and expert opinions on semiglutide
limpic for weight loss and non-diabetic patients. Search and there you go. It's
patients. Search and there you go. It's
able to search through so many different resources and compile them together. Be
able to figure out what are credible resources and what are not so credible resources and be able to give you an result that has very good citations. I
will put on screen now some other examples of things that you can be asking. sonar that will really make the
asking. sonar that will really make the model shine. All right, that's it. We
model shine. All right, that's it. We
are at the end of this video. I really
hope this was helpful for you to understand the different types and categories of AI models. So, in the future when you see a model, you're able to go like, ah, yes, like this belongs in the flagship category or the light category or like open source category or
a specialized category. And then you're able to like figure out when you want to be using those models as well and not be overwhelmed by like the hundreds of models that are coming out every day.
Thank you so much for watching until the end of this video and thank you so much Perplexity for sponsoring this video. If
you're also interested in trying a lot of different models but don't want to pay for a lot of subscriptions and slash or you're like me and you live in a region where certain models are blocked unless you use a VPN, highly recommend that you check out Perplexity. I will
put a link in the description if you like and I will see you guys in the next video or live
Loading video analysis...