The 5 Levels of Creating AI Videos (and How to Level Up)
By Tao Prompts
Summary
Topics Covered
- Image-to-Video Unlocks Creative Control
- Master Prompts for Precise Camera Control
- Keyframing Enables Seamless Transformations
- Upscale Images and Videos for Max Quality
- Innovate Custom Transformation Workflows
Full Transcript
There's five levels of skill when it comes to creating AI videos, and each level unlocks exciting possibilities for the types of AI videos you can create.
The problem most people have is they don't know which level they're at. This
video will give you a complete road map for navigating all five levels of AI video creation and go from a complete beginner to a season pro. The first
level is a beginner level. This is the most basic one. Perhaps you saw some cool AI videos on social media and you got curious and wanted to try it out for yourself. This level starts at text to
yourself. This level starts at text to video, which means you'll be using just text to generate videos. For example, if I was a Star Wars fan and wanted to generate a scene from Star Wars, I could
go into Google Vo 3's flow and then enter a text prompt like a stormtrooper in action on the battlefield.
He fires his blaster.
Veil 3 will follow the instructions I gave it to generate the basic looking video, but what got you interested may have been some trends you found. For
example, the vlog style videos where you're using a bit more of a structured prompt. So, if you wanted to create a
prompt. So, if you wanted to create a vlog, you would use a keywords like selfie camera angle shot from an extended arm. Just another day in the
extended arm. Just another day in the jungle, you know, patrolling, making sure everything is calm and peaceful.
Nothing out of the ordinary.
Now, there's plenty of simple prompt formats at the beginner level. Here are
some other examples where I created ASMR videos of a knife cutting through glass fruits and vegetables. It's pretty
simple to make these. All you have to do is add the keyword ASMR video inside your prompt.
Now, at this point, you'll also probably start toying around with image to video.
So, for example, if you have your own photos, I took this while I was shopping to go to my friend's wedding a couple months ago. And you can experiment with
months ago. And you can experiment with turning photos like these into videos as well. Basically, all the AI video
well. Basically, all the AI video generators will let you do this. If
we're using Google V 3, I can go to the settings menu and select the frames to video. From here, using this plus
video. From here, using this plus button, what I'm going to do is upload this photo of myself. So, let's get that in there. And then we'll give the AI a
in there. And then we'll give the AI a few more instructions. Like, the man is vlogging and talks about the new clothing he found at Macy's. He's happy
with his new style, which is a true story. Check out this new fit I just got
story. Check out this new fit I just got from Macy's. Feeling fresh with this
from Macy's. Feeling fresh with this Tommy Hilfiger. Super happy with this
Tommy Hilfiger. Super happy with this style. This is one of the most fun
style. This is one of the most fun levels because you're discovering and trying new things for the first time. I
took this picture on my plank and animated a small goblin crawling out of it.
Here's another photo I took while I was at the Whitney Museum in New York. And I
simply told AI to animate the characters. So, the good news at the
characters. So, the good news at the beginner level is AI video is so good nowadays. Even if you're just using the
nowadays. Even if you're just using the most simple prompts and techniques, it can create some mindblowingly good-looking videos. Most people end up
good-looking videos. Most people end up staying at the beginner level, and I think it's actually a decent place to be. However, you might find yourself
be. However, you might find yourself running into a few limitations. For
example, in this video of myself at Macy's, check out this new fit I just got from Macy's. The voice doesn't sound like me.
Macy's. The voice doesn't sound like me.
It sounds like some other guy. What if I wanted to make this video but have him say it in my own voice? And also, what if you wanted to tell a bit more of a story? And you need the characters to
story? And you need the characters to look consistent inside different shots, which means you want to have more control in the creative direction of the videos you are making. You need to go to level two, the intermediate level. The
biggest thing at this level is starting to really dive into the world of image to video where we're not only generating the AI videos, but we're actually creating AI images first and then
turning them into AI videos. This gives
you a huge amount of control over the videos you want to make, especially with new updates like Google's nano banana image editor. For example, I can take
image editor. For example, I can take this photo of a woman and put her exact character into tons of different environments. for example, sitting at
environments. for example, sitting at the edge of a building overlooking a city. I can put her inside the cockpit
city. I can put her inside the cockpit of a spacecraft. What if I want to combine multiple characters, objects, even products into the same image? Well,
we can also do that using AI image. What
if I wanted to change the clothing that she's wearing to match the specific fashion look I'm going for? I can do that and create runway models. By
mastering image to video, it'll really allow you to bring your ideas to life.
So, how do we do this? Well, you'll need a powerful AI image model like Nano Banana. So, I'm inside Higsfield, which
Banana. So, I'm inside Higsfield, which is a platform that has a bunch of AI image models. Nano Banana is one of
image models. Nano Banana is one of them. So, let's say I have this photo of
them. So, let's say I have this photo of a character, and I want to put her inside some different scenes. What I'll
need to do is upload this image into Nano Banana. So, we'll use this upload
Nano Banana. So, we'll use this upload button. Putting the photo of my
button. Putting the photo of my character. Now, you really need to be
character. Now, you really need to be descriptive at this point in describing what the image will look like. So, I'll
say create a photo of this Asian character inside a cinematic sci-fi film. Notice that I specifically say the
film. Notice that I specifically say the medium that I want the image to be generated in. In this case, a sci-fi
generated in. In this case, a sci-fi film. She is sitting inside the cockpit
film. She is sitting inside the cockpit of a spacecraft in Star Wars. Uh, so the environment she's going to be in, shadows on her face, muted colors and gritty vod shot on 35mm fill. And this
is what Nano Banana creates. There's our
character. Let's compare her to our reference image. It did a pretty good
reference image. It did a pretty good job of copying over the face, uh, the clothing, even this belt. You see if we zoom in a bit
the shape of the belt. We have this like one layer here and then this other belt below at the bottom. Even the little wrinkle um that detail is copied over
into the image. We can also ask Nano Banana to edit existing images. So if I created this photo of her exploring temple ruins, I can ask for it to create all sorts of different shots of her. We
can also introduce other characters into the scene. for example, this uh sad
the scene. for example, this uh sad little robot that's looks like it's expired. I can also rotate the camera
expired. I can also rotate the camera behind her. Now, if you're going to
behind her. Now, if you're going to rotate the camera behind your subject, you want to make sure you also tell the AI what she's looking at. In this case, she's staring towards a dark tunnel. We
can also take a collage of different characters. I have a woman, this, you
characters. I have a woman, this, you know, furry alien creature here, a Jedi droid for her, and a speeder she's
riding on. If I use this entire image as
riding on. If I use this entire image as a reference, I can tell Nano Banana to combine all the elements together. This
is the resulting shot we'll get.
If I have a photo of the woman, but also a photo of some new clothing I want her to be wearing, like this denim outfit, I can also write a prompt where she should be wearing the denim outfit while
walking down the runway like a fashion model. Once we have all these images of
model. Once we have all these images of our character generated, we can use AI to turn them into videos. Again, by
starting with the AI image model, it gives us the ability to control exactly how we want the visuals in the video to look. Whether it's the outfit the
look. Whether it's the outfit the character is wearing, the location that I want the scene to be shot in, what specific characters and objects are inside the video, all of that can be
controlled by using image to video. And
we can take this even a step further. If
we take multiple images of a character, like I have a front view of her face and also a view shot from behind, I can actually chain these together into a
full AI video where the camera rotates all the way around her. Now, we're
getting a bit ahead of ourselves. This
is more of an advanced technique we'll see in level four. By the way, I'll put a link down below to a tutorial that teaches you all the tricks and tips to generate AI images inside of Nano
Bonano. Now, at level two, you'll
Bonano. Now, at level two, you'll probably also be playing around with more specific advanced features of image to video. For example, if I wanted to
to video. For example, if I wanted to create multiple shots of a world, but have them all keep the same style with this gray background, but with a bright
red embers and ash. To create multiple images of your world in the same style, one of the best tools is midjourney. And
so let's say I have this photo which I really like the visual style of. What I
can do is actually use that photo as a reference. So let's take him and drag
reference. So let's take him and drag him into the prompt toolbar. Now instead
mourney we'll need to make sure we put him under style references to create more images in the same style. And then
let's say in the prompt a polar bear in the wild covered with glowing red embers
and ash. And we also need to prompt for
and ash. And we also need to prompt for a 16-9 widescreen aspect ratio. And it
looks like mid Journeys generated some really really beautiful shots of a polar bear. Getting the consistent style in
bear. Getting the consistent style in all the shots, it keeps everything in the world consistent and really allows you to start world building. Level three
of creating AI videos is the prompter.
So, at this point, you've learned all the basics and fundamentals of AI video.
At this point, in order to progress in your AI video skills, and one of the key things to focus on is developing your understanding of prompts and how they interact with the videos you generate.
Prompts are extremely important. We can
take the same exact photo, create a bunch of different variations of AI videos. If I had this shot of my
videos. If I had this shot of my character, I can direct the AI video to create different camera motions. For
example, I can tell the camera to zoom in so we get a closer look on his glowing eyes. Alternatively, we can tell
glowing eyes. Alternatively, we can tell the AI camera to pull back to reveal more of the character's body and the environment that he's in. We can direct the AI camera to pan to the left or pan
to the right.
And we can even generate an arc shot where the camera rotates around the character in a circular fashion. The
prompt keyword orbit seems to work pretty well for this. At this point, you can also start combining multiple camera motions. Starting with this really cool
motions. Starting with this really cool looking shot of a tree. I can tell the camera to orbit around the tree as it tilts up to reveal a red moon in the sky. So, I should be combining the
sky. So, I should be combining the camera movement of rotating around the tree in a circular fashion while the camera also tilts upwards.
You might also want to start experimenting with the speed of your animations. Asking for a fast video
animations. Asking for a fast video versus a slow video also makes a pretty big difference in how your virtual camera moves. I can have the camera fly
camera moves. I can have the camera fly forward slowly or I can have it speeding towards a pagod.
You might have noticed me using different AI video generators throughout this tutorial. That's something else you
this tutorial. That's something else you start figuring out at level three. The
different AI video generators all have their quirks, their strengths, and their weaknesses. For example, if we compare C
weaknesses. For example, if we compare C dance versus cleaning 2.1, if I use the same image and prompt, C dance generates a video that has very high quality details and consistency, but it doesn't
quite have the realistic dynamic action that clean can generate. Clean just has a bit more of that stylishness.
Here's another example where I prompted for the camera to fly above the subject and point down towards the troll.
Cling's able to animate this specific motion pretty easily. It's able to fly above and point down on the subject perfectly. While Cance generate a really
perfectly. While Cance generate a really nice sharp looking video full of rich details, it's not able to follow my prompt correctly. Okay, so then what
prompt correctly. Okay, so then what about Cance? Well, one pretty cool
about Cance? Well, one pretty cool feature in C Dance is you can prompt for a video that has multiple shots inside it. What do I mean by this? If we take
it. What do I mean by this? If we take this photo of my character, I can actually use a prompt like he looks around first and then the camera cuts to
an overhead shot of him practicing with his sword. And what seed dance can
his sword. And what seed dance can actually do is generate multiple shots where we start looking at him, but then the camera cuts to a different angle showing him from above. Now, this
doesn't always work super consistently.
Sometimes the characters don't look that consistent between the different camera angles, but you can definitely put this to some good use. To master prompting at level three, though, you need to learn
how to debug your props. So, if I start with this photo and I prompt for him to be showing fear, distress, uh, sad emotions, the AI video does a pretty
good job of following my instructions and you can see his face bubbling with emotion. But if I wanted to create a
emotion. But if I wanted to create a different emotion like uh, happiness, joy, what I saw is that the AI video actually transforms the character. So,
he got rid of all the ash covering him and turned him into just a normal looking human. I don't want that to
looking human. I don't want that to happen. So, what I did is I doubled down
happen. So, what I did is I doubled down in the prompt on exactly the visual style I'm going for. I specifically said he has a happy, joyful expression, but
also as his eyes glow red, he is covered with red embers and ash. By reinforcing
the visual cues, when I run this prompt again, it's able to render a video of him smiling and happy, but also keeping
the ash and all the little glowing red particles on his face instead of transforming him into a normal guy. Next
up is level four. You're a power user now. You know the basics and you know
now. You know the basics and you know your way around different prompts to get the types of videos you want. Now, it's
time to start diving into the more advanced features. Earlier, I showed
advanced features. Earlier, I showed this example where we're actually using multiple image frames and chain these into a complete AI video where the camera rotates all the way around her.
What this is called is a key framing feature. And she can create all sorts of
feature. And she can create all sorts of amazing transformations using this.
Beyond just camera movements, for example, I can have a dragon hatch out of a glowing egg.
Or I can transform human characters and turn this guy into a fox spirit.
Let's say I want to make a transformation where I'm bringing this journal illustration to life. I want to start with this photo of the girl inside a journal and transform it into a real
world scene. I think the best version of
world scene. I think the best version of the key framing feature is actually inside Cling AI. And so what I'll do is actually upload these two frames as the
beginning and end of the video I want.
And so I've uploaded the starting frame which is the illustration of her inside a journal and also the end frame which is a real life photo of her. If we go
and generate this, what AI would do is create a AI video with a seamless transition between the two images.
And I think this turned out absolutely stunning.
Some other powerful features you're using at this point include motion capture features. So, for example, I can
capture features. So, for example, I can take a video of myself where I'm speaking, moving around, and actually map my movements onto a different
character. What I'm showing on the
character. What I'm showing on the screen right now is one 2.2's 's character transfer feature where you can motion capture a video of yourself like
I'm doing right now and turn it into a full AI avatar. Runway's AI video platform also has a pretty similar feature. It'll actually animate the full
feature. It'll actually animate the full scene with the character inside of it.
to use the motion capture inside a runway. I need to go to the video
runway. I need to go to the video generator and find this act two feature and then upload a driving performance of myself and also the character I want to
map my movements onto. So there's me talking. Most impressive videotovideo
talking. Most impressive videotovideo features and here is the image of my goblin character. There's some uh other
goblin character. There's some uh other parameters you can change uh for the level of expressness you want and if it also adds in the movements and gestures which I think you want in there. It's
one of the most impressive video to video features where you can motion capture a video of yourself. At level
four, you also have knowledge and understanding of more advanced workflows to squeeze every bit of extra quality out of the videos you create. On the
left is a video I created directly from an AI image I made inside midly. And on
the right is what a scene might look like if I use the proper techniques to squeeze the maximum amount of quality out of the videos. And you see that the details are way sharper and cleaner. To
get the absolute maximum quality out of your AI videos, the first thing you need to do is take an AI image and actually upscale the resolution of that AI image.
Now, there's plenty of different tools for this. I think one of the best ones
for this. I think one of the best ones is called a magnific.
And we can take a look at the before and after. It just adds that little bit of
after. It just adds that little bit of extra crispness to the textures uh on his armor, this little vegetation on his
body. So now it will alter some of the
body. So now it will alter some of the smaller details uh and also the way some of the characters look sometimes. And so
you have this optional slider where you can turn up the resemblance of how close you want the upscaled image to resemble your original image. Then what I'll do
is take this upscaled image and create an AI video from that. But we're not done yet. Once we've created the AI
done yet. Once we've created the AI video, we can actually take it a step further and enhance the resolution of the AI video itself. The tool I like to
use is called Topaz AI. So, I've
uploaded my video in here. And what I can do is choose the output resolution.
I can say 4K. And then if I go and upscale the video, we get a much higher quality of video than we would have if all we did was create a video from the
original image.
So again, what you want to do is take your original AI image, upscale that image, generate a video from that upscaled image, and then upscale the AI
video again, and you should be able to see a huge upgrade in how your videos look. Another level four hack is, let's
look. Another level four hack is, let's say you've generated a video of a character inside Google Veil 3, and you have them speaking, but it doesn't match
the voice you want. After the winter, nothing was the same. The world had changed and so had we. Everything
mutated.
Let's say I wanted this character to speak with my voice in the same dialogue. So, what I did is I downloaded
dialogue. So, what I did is I downloaded the video uh of this character and I extracted the MP3 audio file from it.
Now, what we'll need to do is using voice changing app to change the voice to match my own. I'll be using 11 Labs to do this. So, inside we can find this
voice changer button. And I've actually cloned a voice of myself. Now, there's
tons of other optional voices you can change it to.
I have a casual, smooth, and sexy voice that you can use for anything you want, like Denzel, for example. But let's use my voice this time. What I'll do is upload the extracted audio file from
that BO3 video was the same. the world had changed.
I should be able to transfer the audio into my own voice. Let's try that.
After the winter, nothing was the same.
The world had changed and so had we.
And now I have to do is line up the original video file from V3. And also
adding the voice that's being altered to match my After the winter, nothing was the same.
The world had changed and so had we.
Everything mutated.
Level five of AI video generation.
You're an innovator. You have a great understanding of the basics of creating AI videos and also you've tried out the advanced features and have a pretty good understanding of them. And at this
point, you can start actually coming up with your own workflows that no one's thought of before. So, I actually don't know the original person who came up with this, but using new tools like Nano
Banana or Runway Act 2, you can actually do this really cool effect of creating a transformation shot where I have this video of myself talking. And I'm going to have him transform into a different
avatar. What you need to start with is a
avatar. What you need to start with is a video you recorded of yourself that you want to map the AI avatar and transformation onto. Then, what I'll do
transformation onto. Then, what I'll do is divide the video into two parts. the
first part of just me talking and then the second part which should be the transformation and the AI avatar and what I need to do is take a screenshot
of myself at the end of the first part where it's just me talking. Then I'll
use an AI image editor like Nano Banana to transform that screenshot of myself into a different character. In this
case, I'm going for a pretty stylish looking elf. Once I have a image of the
looking elf. Once I have a image of the AI avatar, what I'm going to do is go into runway act two where I can motion
capture the driving video of myself and the image of the avatar. Uh, and then generate the motion capture video. So
now I have the first part of the transformation scene where it's just the video of myself talking and the second part of the transformation scene which is my AI avatar. Finally, what I need to
do is animate the transformation between these two segments. So to do this, I'll take a screenshot at the end of the first segment, which is me talking, and
also a screenshot of the first frame of the AI avatar, and I'll be generating a transition scene between them. And to
make the transition scene, I'm using cling. Again, remember this key from
cling. Again, remember this key from feature. And I can add a prompt for the
feature. And I can add a prompt for the transformation. The man gets covered
transformation. The man gets covered with a puff of smoke and mutates and transforms into the elf. That should be good enough.
Finally, what I need to do is combine the sequence together. First, we need to start with the initial segment, which is
just me talking. Then the transformation segment where I turn into the AI avatar.
And then the final segment that we created using motion capture of the avatar himself. Now I did also actually speed up the transformations in
2x uh just to make it a bit faster. What
I'm going to do is actually try to create a transformation of myself [Music] where I turn into this avatar and still
keep the motion capture going. So now
you have a pretty good understanding of the five levels of AI video and how to progress through each one. I also made a deep dive video where I compared all the different AI video platforms. So you can
see the quirks, the pros and cons, and the nuances, and also a complete guide on the unique features that each AI video platform has. If you want to see how the different AI video platforms
stack up against each other, go watch this guide right here. Heat. Heat. N.
[Music]
Loading video analysis...