The 5 Levels of Creating AI Videos (and How to Level Up)

By Tao Prompts

Summary

Topics Covered

Image-to-Video Unlocks Creative Control
Master Prompts for Precise Camera Control
Keyframing Enables Seamless Transformations
Upscale Images and Videos for Max Quality
Innovate Custom Transformation Workflows

Full Transcript

There's five levels of skill when it comes to creating AI videos, and each level unlocks exciting possibilities for the types of AI videos you can create.

The problem most people have is they don't know which level they're at. This

video will give you a complete road map for navigating all five levels of AI video creation and go from a complete beginner to a season pro. The first

level is a beginner level. This is the most basic one. Perhaps you saw some cool AI videos on social media and you got curious and wanted to try it out for yourself. This level starts at text to

yourself. This level starts at text to video, which means you'll be using just text to generate videos. For example, if I was a Star Wars fan and wanted to generate a scene from Star Wars, I could

go into Google Vo 3's flow and then enter a text prompt like a stormtrooper in action on the battlefield.

He fires his blaster.

Veil 3 will follow the instructions I gave it to generate the basic looking video, but what got you interested may have been some trends you found. For

example, the vlog style videos where you're using a bit more of a structured prompt. So, if you wanted to create a

prompt. So, if you wanted to create a vlog, you would use a keywords like selfie camera angle shot from an extended arm. Just another day in the

extended arm. Just another day in the jungle, you know, patrolling, making sure everything is calm and peaceful.

Nothing out of the ordinary.

Now, there's plenty of simple prompt formats at the beginner level. Here are

some other examples where I created ASMR videos of a knife cutting through glass fruits and vegetables. It's pretty

simple to make these. All you have to do is add the keyword ASMR video inside your prompt.

Now, at this point, you'll also probably start toying around with image to video.

So, for example, if you have your own photos, I took this while I was shopping to go to my friend's wedding a couple months ago. And you can experiment with

months ago. And you can experiment with turning photos like these into videos as well. Basically, all the AI video

well. Basically, all the AI video generators will let you do this. If

we're using Google V 3, I can go to the settings menu and select the frames to video. From here, using this plus

video. From here, using this plus button, what I'm going to do is upload this photo of myself. So, let's get that in there. And then we'll give the AI a

in there. And then we'll give the AI a few more instructions. Like, the man is vlogging and talks about the new clothing he found at Macy's. He's happy

with his new style, which is a true story. Check out this new fit I just got

story. Check out this new fit I just got from Macy's. Feeling fresh with this

from Macy's. Feeling fresh with this Tommy Hilfiger. Super happy with this

Tommy Hilfiger. Super happy with this style. This is one of the most fun

style. This is one of the most fun levels because you're discovering and trying new things for the first time. I

took this picture on my plank and animated a small goblin crawling out of it.

Here's another photo I took while I was at the Whitney Museum in New York. And I

simply told AI to animate the characters. So, the good news at the

characters. So, the good news at the beginner level is AI video is so good nowadays. Even if you're just using the

nowadays. Even if you're just using the most simple prompts and techniques, it can create some mindblowingly good-looking videos. Most people end up

good-looking videos. Most people end up staying at the beginner level, and I think it's actually a decent place to be. However, you might find yourself

be. However, you might find yourself running into a few limitations. For

example, in this video of myself at Macy's, check out this new fit I just got from Macy's. The voice doesn't sound like me.

Macy's. The voice doesn't sound like me.

It sounds like some other guy. What if I wanted to make this video but have him say it in my own voice? And also, what if you wanted to tell a bit more of a story? And you need the characters to

story? And you need the characters to look consistent inside different shots, which means you want to have more control in the creative direction of the videos you are making. You need to go to level two, the intermediate level. The

biggest thing at this level is starting to really dive into the world of image to video where we're not only generating the AI videos, but we're actually creating AI images first and then

turning them into AI videos. This gives

you a huge amount of control over the videos you want to make, especially with new updates like Google's nano banana image editor. For example, I can take

image editor. For example, I can take this photo of a woman and put her exact character into tons of different environments. for example, sitting at

environments. for example, sitting at the edge of a building overlooking a city. I can put her inside the cockpit

city. I can put her inside the cockpit of a spacecraft. What if I want to combine multiple characters, objects, even products into the same image? Well,

we can also do that using AI image. What

if I wanted to change the clothing that she's wearing to match the specific fashion look I'm going for? I can do that and create runway models. By

mastering image to video, it'll really allow you to bring your ideas to life.

So, how do we do this? Well, you'll need a powerful AI image model like Nano Banana. So, I'm inside Higsfield, which

Banana. So, I'm inside Higsfield, which is a platform that has a bunch of AI image models. Nano Banana is one of

image models. Nano Banana is one of them. So, let's say I have this photo of

them. So, let's say I have this photo of a character, and I want to put her inside some different scenes. What I'll

need to do is upload this image into Nano Banana. So, we'll use this upload

Nano Banana. So, we'll use this upload button. Putting the photo of my

button. Putting the photo of my character. Now, you really need to be

character. Now, you really need to be descriptive at this point in describing what the image will look like. So, I'll

say create a photo of this Asian character inside a cinematic sci-fi film. Notice that I specifically say the

film. Notice that I specifically say the medium that I want the image to be generated in. In this case, a sci-fi

generated in. In this case, a sci-fi film. She is sitting inside the cockpit

film. She is sitting inside the cockpit of a spacecraft in Star Wars. Uh, so the environment she's going to be in, shadows on her face, muted colors and gritty vod shot on 35mm fill. And this

is what Nano Banana creates. There's our

character. Let's compare her to our reference image. It did a pretty good

reference image. It did a pretty good job of copying over the face, uh, the clothing, even this belt. You see if we zoom in a bit

the shape of the belt. We have this like one layer here and then this other belt below at the bottom. Even the little wrinkle um that detail is copied over

into the image. We can also ask Nano Banana to edit existing images. So if I created this photo of her exploring temple ruins, I can ask for it to create all sorts of different shots of her. We

can also introduce other characters into the scene. for example, this uh sad

the scene. for example, this uh sad little robot that's looks like it's expired. I can also rotate the camera

expired. I can also rotate the camera behind her. Now, if you're going to

behind her. Now, if you're going to rotate the camera behind your subject, you want to make sure you also tell the AI what she's looking at. In this case, she's staring towards a dark tunnel. We

can also take a collage of different characters. I have a woman, this, you

characters. I have a woman, this, you know, furry alien creature here, a Jedi droid for her, and a speeder she's

riding on. If I use this entire image as

riding on. If I use this entire image as a reference, I can tell Nano Banana to combine all the elements together. This

is the resulting shot we'll get.

If I have a photo of the woman, but also a photo of some new clothing I want her to be wearing, like this denim outfit, I can also write a prompt where she should be wearing the denim outfit while

walking down the runway like a fashion model. Once we have all these images of

model. Once we have all these images of our character generated, we can use AI to turn them into videos. Again, by

starting with the AI image model, it gives us the ability to control exactly how we want the visuals in the video to look. Whether it's the outfit the

look. Whether it's the outfit the character is wearing, the location that I want the scene to be shot in, what specific characters and objects are inside the video, all of that can be

controlled by using image to video. And

we can take this even a step further. If

we take multiple images of a character, like I have a front view of her face and also a view shot from behind, I can actually chain these together into a

full AI video where the camera rotates all the way around her. Now, we're

getting a bit ahead of ourselves. This

is more of an advanced technique we'll see in level four. By the way, I'll put a link down below to a tutorial that teaches you all the tricks and tips to generate AI images inside of Nano

Bonano. Now, at level two, you'll

Bonano. Now, at level two, you'll probably also be playing around with more specific advanced features of image to video. For example, if I wanted to

to video. For example, if I wanted to create multiple shots of a world, but have them all keep the same style with this gray background, but with a bright

red embers and ash. To create multiple images of your world in the same style, one of the best tools is midjourney. And

so let's say I have this photo which I really like the visual style of. What I

can do is actually use that photo as a reference. So let's take him and drag

reference. So let's take him and drag him into the prompt toolbar. Now instead

mourney we'll need to make sure we put him under style references to create more images in the same style. And then

let's say in the prompt a polar bear in the wild covered with glowing red embers

and ash. And we also need to prompt for

and ash. And we also need to prompt for a 16-9 widescreen aspect ratio. And it

looks like mid Journeys generated some really really beautiful shots of a polar bear. Getting the consistent style in

bear. Getting the consistent style in all the shots, it keeps everything in the world consistent and really allows you to start world building. Level three

of creating AI videos is the prompter.

So, at this point, you've learned all the basics and fundamentals of AI video.

At this point, in order to progress in your AI video skills, and one of the key things to focus on is developing your understanding of prompts and how they interact with the videos you generate.

Prompts are extremely important. We can

take the same exact photo, create a bunch of different variations of AI videos. If I had this shot of my

videos. If I had this shot of my character, I can direct the AI video to create different camera motions. For

example, I can tell the camera to zoom in so we get a closer look on his glowing eyes. Alternatively, we can tell

glowing eyes. Alternatively, we can tell the AI camera to pull back to reveal more of the character's body and the environment that he's in. We can direct the AI camera to pan to the left or pan

to the right.

And we can even generate an arc shot where the camera rotates around the character in a circular fashion. The

prompt keyword orbit seems to work pretty well for this. At this point, you can also start combining multiple camera motions. Starting with this really cool

motions. Starting with this really cool looking shot of a tree. I can tell the camera to orbit around the tree as it tilts up to reveal a red moon in the sky. So, I should be combining the

sky. So, I should be combining the camera movement of rotating around the tree in a circular fashion while the camera also tilts upwards.

You might also want to start experimenting with the speed of your animations. Asking for a fast video

animations. Asking for a fast video versus a slow video also makes a pretty big difference in how your virtual camera moves. I can have the camera fly

camera moves. I can have the camera fly forward slowly or I can have it speeding towards a pagod.

You might have noticed me using different AI video generators throughout this tutorial. That's something else you

this tutorial. That's something else you start figuring out at level three. The

different AI video generators all have their quirks, their strengths, and their weaknesses. For example, if we compare C

weaknesses. For example, if we compare C dance versus cleaning 2.1, if I use the same image and prompt, C dance generates a video that has very high quality details and consistency, but it doesn't

quite have the realistic dynamic action that clean can generate. Clean just has a bit more of that stylishness.

Here's another example where I prompted for the camera to fly above the subject and point down towards the troll.

Cling's able to animate this specific motion pretty easily. It's able to fly above and point down on the subject perfectly. While Cance generate a really

perfectly. While Cance generate a really nice sharp looking video full of rich details, it's not able to follow my prompt correctly. Okay, so then what

prompt correctly. Okay, so then what about Cance? Well, one pretty cool

about Cance? Well, one pretty cool feature in C Dance is you can prompt for a video that has multiple shots inside it. What do I mean by this? If we take

it. What do I mean by this? If we take this photo of my character, I can actually use a prompt like he looks around first and then the camera cuts to

an overhead shot of him practicing with his sword. And what seed dance can

his sword. And what seed dance can actually do is generate multiple shots where we start looking at him, but then the camera cuts to a different angle showing him from above. Now, this

doesn't always work super consistently.

Sometimes the characters don't look that consistent between the different camera angles, but you can definitely put this to some good use. To master prompting at level three, though, you need to learn

how to debug your props. So, if I start with this photo and I prompt for him to be showing fear, distress, uh, sad emotions, the AI video does a pretty

good job of following my instructions and you can see his face bubbling with emotion. But if I wanted to create a

emotion. But if I wanted to create a different emotion like uh, happiness, joy, what I saw is that the AI video actually transforms the character. So,

he got rid of all the ash covering him and turned him into just a normal looking human. I don't want that to

looking human. I don't want that to happen. So, what I did is I doubled down

happen. So, what I did is I doubled down in the prompt on exactly the visual style I'm going for. I specifically said he has a happy, joyful expression, but

also as his eyes glow red, he is covered with red embers and ash. By reinforcing

the visual cues, when I run this prompt again, it's able to render a video of him smiling and happy, but also keeping

the ash and all the little glowing red particles on his face instead of transforming him into a normal guy. Next

up is level four. You're a power user now. You know the basics and you know

now. You know the basics and you know your way around different prompts to get the types of videos you want. Now, it's

time to start diving into the more advanced features. Earlier, I showed

advanced features. Earlier, I showed this example where we're actually using multiple image frames and chain these into a complete AI video where the camera rotates all the way around her.

What this is called is a key framing feature. And she can create all sorts of

feature. And she can create all sorts of amazing transformations using this.

Beyond just camera movements, for example, I can have a dragon hatch out of a glowing egg.

Or I can transform human characters and turn this guy into a fox spirit.

Let's say I want to make a transformation where I'm bringing this journal illustration to life. I want to start with this photo of the girl inside a journal and transform it into a real

world scene. I think the best version of

world scene. I think the best version of the key framing feature is actually inside Cling AI. And so what I'll do is actually upload these two frames as the

beginning and end of the video I want.

And so I've uploaded the starting frame which is the illustration of her inside a journal and also the end frame which is a real life photo of her. If we go

and generate this, what AI would do is create a AI video with a seamless transition between the two images.

And I think this turned out absolutely stunning.

Some other powerful features you're using at this point include motion capture features. So, for example, I can

capture features. So, for example, I can take a video of myself where I'm speaking, moving around, and actually map my movements onto a different

character. What I'm showing on the

character. What I'm showing on the screen right now is one 2.2's 's character transfer feature where you can motion capture a video of yourself like

I'm doing right now and turn it into a full AI avatar. Runway's AI video platform also has a pretty similar feature. It'll actually animate the full

feature. It'll actually animate the full scene with the character inside of it.

to use the motion capture inside a runway. I need to go to the video

runway. I need to go to the video generator and find this act two feature and then upload a driving performance of myself and also the character I want to

map my movements onto. So there's me talking. Most impressive videotovideo

talking. Most impressive videotovideo features and here is the image of my goblin character. There's some uh other

goblin character. There's some uh other parameters you can change uh for the level of expressness you want and if it also adds in the movements and gestures which I think you want in there. It's

one of the most impressive video to video features where you can motion capture a video of yourself. At level

four, you also have knowledge and understanding of more advanced workflows to squeeze every bit of extra quality out of the videos you create. On the

left is a video I created directly from an AI image I made inside midly. And on

the right is what a scene might look like if I use the proper techniques to squeeze the maximum amount of quality out of the videos. And you see that the details are way sharper and cleaner. To

get the absolute maximum quality out of your AI videos, the first thing you need to do is take an AI image and actually upscale the resolution of that AI image.

Now, there's plenty of different tools for this. I think one of the best ones

for this. I think one of the best ones is called a magnific.

And we can take a look at the before and after. It just adds that little bit of

after. It just adds that little bit of extra crispness to the textures uh on his armor, this little vegetation on his

body. So now it will alter some of the

body. So now it will alter some of the smaller details uh and also the way some of the characters look sometimes. And so

you have this optional slider where you can turn up the resemblance of how close you want the upscaled image to resemble your original image. Then what I'll do

is take this upscaled image and create an AI video from that. But we're not done yet. Once we've created the AI

done yet. Once we've created the AI video, we can actually take it a step further and enhance the resolution of the AI video itself. The tool I like to

use is called Topaz AI. So, I've

uploaded my video in here. And what I can do is choose the output resolution.

I can say 4K. And then if I go and upscale the video, we get a much higher quality of video than we would have if all we did was create a video from the

original image.

So again, what you want to do is take your original AI image, upscale that image, generate a video from that upscaled image, and then upscale the AI

video again, and you should be able to see a huge upgrade in how your videos look. Another level four hack is, let's

look. Another level four hack is, let's say you've generated a video of a character inside Google Veil 3, and you have them speaking, but it doesn't match

the voice you want. After the winter, nothing was the same. The world had changed and so had we. Everything

mutated.

Let's say I wanted this character to speak with my voice in the same dialogue. So, what I did is I downloaded

dialogue. So, what I did is I downloaded the video uh of this character and I extracted the MP3 audio file from it.

Now, what we'll need to do is using voice changing app to change the voice to match my own. I'll be using 11 Labs to do this. So, inside we can find this

voice changer button. And I've actually cloned a voice of myself. Now, there's

tons of other optional voices you can change it to.

I have a casual, smooth, and sexy voice that you can use for anything you want, like Denzel, for example. But let's use my voice this time. What I'll do is upload the extracted audio file from

that BO3 video was the same. the world had changed.

I should be able to transfer the audio into my own voice. Let's try that.

After the winter, nothing was the same.

The world had changed and so had we.

And now I have to do is line up the original video file from V3. And also

adding the voice that's being altered to match my After the winter, nothing was the same.

The world had changed and so had we.

Everything mutated.

Level five of AI video generation.

You're an innovator. You have a great understanding of the basics of creating AI videos and also you've tried out the advanced features and have a pretty good understanding of them. And at this

point, you can start actually coming up with your own workflows that no one's thought of before. So, I actually don't know the original person who came up with this, but using new tools like Nano

Banana or Runway Act 2, you can actually do this really cool effect of creating a transformation shot where I have this video of myself talking. And I'm going to have him transform into a different

avatar. What you need to start with is a

avatar. What you need to start with is a video you recorded of yourself that you want to map the AI avatar and transformation onto. Then, what I'll do

transformation onto. Then, what I'll do is divide the video into two parts. the

first part of just me talking and then the second part which should be the transformation and the AI avatar and what I need to do is take a screenshot

of myself at the end of the first part where it's just me talking. Then I'll

use an AI image editor like Nano Banana to transform that screenshot of myself into a different character. In this

case, I'm going for a pretty stylish looking elf. Once I have a image of the

looking elf. Once I have a image of the AI avatar, what I'm going to do is go into runway act two where I can motion

capture the driving video of myself and the image of the avatar. Uh, and then generate the motion capture video. So

now I have the first part of the transformation scene where it's just the video of myself talking and the second part of the transformation scene which is my AI avatar. Finally, what I need to

do is animate the transformation between these two segments. So to do this, I'll take a screenshot at the end of the first segment, which is me talking, and

also a screenshot of the first frame of the AI avatar, and I'll be generating a transition scene between them. And to

make the transition scene, I'm using cling. Again, remember this key from

cling. Again, remember this key from feature. And I can add a prompt for the

feature. And I can add a prompt for the transformation. The man gets covered

transformation. The man gets covered with a puff of smoke and mutates and transforms into the elf. That should be good enough.

Finally, what I need to do is combine the sequence together. First, we need to start with the initial segment, which is

just me talking. Then the transformation segment where I turn into the AI avatar.

And then the final segment that we created using motion capture of the avatar himself. Now I did also actually speed up the transformations in

2x uh just to make it a bit faster. What

I'm going to do is actually try to create a transformation of myself [Music] where I turn into this avatar and still

keep the motion capture going. So now

you have a pretty good understanding of the five levels of AI video and how to progress through each one. I also made a deep dive video where I compared all the different AI video platforms. So you can

see the quirks, the pros and cons, and the nuances, and also a complete guide on the unique features that each AI video platform has. If you want to see how the different AI video platforms

stack up against each other, go watch this guide right here. Heat. Heat. N.

[Music]

Loading...

Loading video analysis...