I Automated My YouTube Shorts Pipeline With Python and AI

By Third Space Interactive

Summary

Topics Covered

AI Automates Short-Form Clipping
Whisper Enables Precise Timings
Gemini Identifies Clippable Moments
Dynamic Captions Boost Engagement
Bonus AI Tools Accelerate Creation

Full Transcript

I built a tool that goes through long form videos to find clippable content for Tik Tok shorts and [Music]

reals. So last week I dropped my first

reals. So last week I dropped my first long- form video about an interactive Archus project that I was working on where I was bouncing around a bunch of different topics and I knew that there were a lot of moments in there that

could work great as short clips, but digging through them manually, I thought there had to be a better way. So I built this Python script. It

way. So I built this Python script. It

pulls the audio, transcribes it using whisper, asks Google's Gemini AI to find the most interesting parts, and then automatically slices everything for Tik Tok shorts and reals wherever you post

to. This way, you can get clean vertical

to. This way, you can get clean vertical clips, dynamic wordbyword captions, and smooth transitions. Best part is it's

transitions. Best part is it's completely free to use. To get started, we'll want to

use. To get started, we'll want to create a virtual environment and download these dependencies.

[Music] To run the script, simply point to the directory of your input MP4, add the captions and the API flags, and hit enter. I'll put a list of the flags

enter. I'll put a list of the flags available on the screen now, and they're also available on the GitHub readme. If

you don't have an API key, you can easily sign up through Google AI Studio. Simply make an account and

Studio. Simply make an account and create your free API key. You can also modify the code to use any API keys for the LLMs that you prefer. Now, I'll do a highle overview

prefer. Now, I'll do a highle overview of exactly what the script does. First, it uses the ffmpeg to strip

does. First, it uses the ffmpeg to strip down the audio from your video. Super

fast, no quality loss. It just pulls out awave file so that we can feed that into whisper for transcription. Next up, we use Whisper,

transcription. Next up, we use Whisper, which is Open AI's text speech model.

It doesn't just spit out a wall of text.

It gives us back super detailed segments with timestamps and even individual word timings. This is what makes the caption

timings. This is what makes the caption syncing possible later. Then it sends a transcript to

later. Then it sends a transcript to Google Gemini's API and basically asks what the best part of the video is. We

use a prompt which tells Gemini to look for things like emotional moments, interesting ideas, funny takes, anything that would work great for a standalone 45 to 60sec clip. You can also modify

this prompt to fit your needs as well if you want to get into the code. Otherwise, Gemini responds with a

code. Otherwise, Gemini responds with a list of clips, gives their start and end times, and a reason for why it's interesting, and even a suggested caption.

The script lets you review each of the suggested clips, and you can even edit the transcription, tweak the timing, or just skip clips entirely that you don't like. But if you're feeling lazy,

like. But if you're feeling lazy, applying the no review flag helps you skip all of that. Finally, the script uses OpenCV and pillow to cut out the video segment. It converts it into a

video segment. It converts it into a vertical 9x6 layout and then styles everything for you. The original video gets centered. The background is a

gets centered. The background is a blurred version of the same frame, resulting in a super clean look, and it outputs all of the clips into a folder ready to

post. Plus, it saves a transcript, the

post. Plus, it saves a transcript, the AI suggestions, and metadata for all of your clips in JSON files, which could be super helpful if you ever want to batch upload these clips or even archive this

information for later. Then, the script adds captions,

later. Then, the script adds captions, not just basic subtitles. The caption is styled within a white rounded box with black text and each word is highlighted as it's spoken. It even handles line

breaking automatically so that your text doesn't run off the screen. For more customization, I also

screen. For more customization, I also added some flags where you can control the background color, the highlight color, and text color, giving you a bunch more possibilities for your animated captions.

Now, if you prefer the caption styles from Instagram, Tik Tok, or shorts, simply remove the captions flag when running the script, the AI will still generate clipworthy moments for you. Honestly, this all started because

you. Honestly, this all started because I didn't want to manually clip my own videos. I figured if Python could

videos. I figured if Python could already automate tasks, why not leverage AI and Python to help me make content faster, too? Now, for some bonus tools.

faster, too? Now, for some bonus tools.

If you're a creator, I threw in two more scripts into the GitHub repo.

The first generates full-length animated captions, perfect for podcasts or any other long- form videos. The second uses AI to break your

videos. The second uses AI to break your video into clean chapters and outputs timestamps to structure your videos faster for YouTube. Both are super easy to use from

YouTube. Both are super easy to use from the command line and they've got flags that you can tweak so you don't need to touch the code if you don't want to.

Now, this is all open source on my GitHub, so feel free to fork it, tweak it, even suggest other features. Links

are in the description below. If this saved you time at all or

below. If this saved you time at all or makes your content look slightest bit better, make sure to hit that like button. As always, let me know what you

button. As always, let me know what you think. And if you have any other ideas,

think. And if you have any other ideas, maybe I can add some stuff to the repo in the future. Thanks for watching.

[Music]

Loading...

Loading video analysis...