LongCut logo

Google Genie, OpenClaw & Kling 3.0

By Joey /// VP Land

Summary

Topics Covered

  • Genie Builds Persistent Interactive Worlds
  • OpenClaw Agents Risk Life Savings
  • AI Agents Form Social Networks
  • Kling 3 Unifies Video Generation Features
  • AI Averages to Homogenous Mediocrity

Full Transcript

Wait, this is actually way better than I thought. Dude, that is high Leia. Holy

crap. That feels like the rip.

All right, crazy AI stuff happening this week. So we got three stories we'll jump into. First one, now we talked about this when they teased it last year, but

into. First one, now we talked about this when they teased it last year, but Google has now officially released a public beta of... Genie, which was their world model.

You could generate a fully immersive world model, move around it. It's persistent. And now

it's out for the public if you're on the ultra plan or whatever. Which I

think you are, right? I have a subscription to that, yes. So basically, it takes two inputs. It takes one is what kind of environment you want to make. And

two inputs. It takes one is what kind of environment you want to make. And

you can either do a text prompt or you give it a single image. And

then you also describe what kind of character you want to be in the world that you're navigating. Okay, can I pick? Yeah, yeah, yeah. So, yeah. So, the environment.

Hylia from Miami. All right. I had to just watch The Rip.

This is on his mind. I am now an expert in Miami, guys. And I'm

just going to call this out because as a Miami native, Brad Pitt, I mean, not Brad Pitt, Ben Affleck. Ben Affleck. Ben Affleck in that called it Hylia. And

it's Hialeah. Sorry, that I feel like it's incorrect. Also, we were talking about this before we rolled, but once you see the street curbs in the film, the street curbs are very LA giveaway thing. So as a Miami native, I wish they actually filmed in Miami. Shout out to Synapse for doing a lot of the car process in our VP stage. All right, so what do you want? You want a cul-de-sac

and Hialeah? So yeah, maybe gritty urban nighttime Hialeah with

and Hialeah? So yeah, maybe gritty urban nighttime Hialeah with cul-de-sac. Just trying to recreate Hialeah. the vibes of the film. And the character would

cul-de-sac. Just trying to recreate Hialeah. the vibes of the film. And the character would be like a SWAT police officer or something. If they can do weapons and all that stuff. Okay, gritty urban nighttime scene in a cold sack in Hialeah, Florida. Character,

that stuff. Okay, gritty urban nighttime scene in a cold sack in Hialeah, Florida. Character,

SWAT officer. Great sketch. Let that cook. Yeah. So some limitations with this initial preview.

It'll make the world, and it'll save the world, but when you spin it up and you move around in it, it's a 60 second cap. So you can move around and do whatever you want in it. For 60 seconds, it's got a little slider bar, and then it cuts you off. I think that's honestly the limitation of just using too much GPU rather than the model itself. That's my guess. You think

just resource issue? They're just like, yeah, we can't accommodate like 100,000 of these. I

think in the preview was like two to three minutes. Right. Where they talked about it last year. It also does record video of your entire experience. You can download the video. Oh, nice. So, I mean, you know, we talked about this before, but

the video. Oh, nice. So, I mean, you know, we talked about this before, but the theory idea of this is like, you know, you can make this world and maybe if you get the video, they could turn it into a Gaussian splat or something. Yeah. You know, the examples that I saw online of the production version of

something. Yeah. You know, the examples that I saw online of the production version of Genie seemed to me like a low quality version of the prototype that we talked about a few months back. Oh, you feel like it was like lower quality than what you saw? Yeah, like it's hard to like put a number on it, but it just felt like a little softer, you know. I've seen hit or miss because

I've seen some demos online that Oh, all right. Well, our timer's starting. Dude, that

is high level. This is actually way better than I thought. Holy crap. I

was going to say, I think the quality issue is like if you generate from an image... That feels like the rip. I don't know. I can't move the guy

an image... That feels like the rip. I don't know. I can't move the guy around. I can move the camera and he's just walking. No, you're moving him. He's

around. I can move the camera and he's just walking. No, you're moving him. He's

now... Yeah. See? No, I'm not. I'm moving the camera, but my W, it's got the Waz control. That's not working. But this is... One of the best looking.

Yeah, look at the reflections on the street from the puddle. I mean, just imagine Grand Theft Auto being powered by this. Yeah, just a completely like a general world.

A completely different world. Yeah, right. All right. The keyboard's not working the way it normally did, but yeah, that's good. What did I say? Oh, I think there's a quality difference between if you generate from a complete text prompt So it's completely just generating the world versus give it an image. And a lot of the stuff I experimented with was an image because I'm like, I want to see what it

looks like with this real world. And that was lower quality. That makes sense because I think what's happening is you're interpolating from that image and it's just not able to fill in the details as well as doing a native latent space generation. Let

me pull up one of the ones they had already made that are kind of pre-presenting. pre-built so that could also be a good one backyard racetrack yeah these these

pre-presenting. pre-built so that could also be a good one backyard racetrack yeah these these are the ones i've seen online i mean maybe this is intentionally low poly and low quality just meant to look more like a children's video game oh i'm playing a video and i think i i've just literally just been like a toddler like i feel like i thought i was steering it it's just a video playing someone

just gave me an unplugged controller and i feel like i'm doing something all right well i mean look that that was generating faster than we thought it would. So

why don't we do another one? I think, well, no, that was a video playback.

No, the high-layer one. Like, can we do another one? Oh, now I'm going to the world. OK, hold on. I already started booting this one up. See, but yeah,

the world. OK, hold on. I already started booting this one up. See, but yeah, this track kind of goes off the rails here. Mm-hmm. Mm-hmm. You're playing this one live? This one I am actually playing for real. Yeah, I mean, like the physics

live? This one I am actually playing for real. Yeah, I mean, like the physics on the car, you know, like the suspension and stuff, that's pretty cool too. Like

that's so hard to do in a game. You have to rig up the car and figure out the collision and the physics and everything. Yeah, like if I hit a wall, it stops the car. Like I hit a wall. Yeah, go ahead and hit a wall. I did. Yeah, let me see here. Whoopsies. There you go. Yeah.

Yeah, there's some collision going on as well. Brake lights on the car. Wow, man.

What, uh, yeah, what, did you want to try another one? Yeah, let's do, uh, should we do one battle after another? Just to keep the theme going? Yeah, I

tried giving it an image, and it kept giving me errors of, like, the hills.

But we could just... Just type it. We could just do a text prompt, yeah.

Yeah, do, uh, Lone Desert, Tulane Highway, in the arid California desert, with a white-colored sand, arid landscape. And the character should be a white Dodge Charger. I saw something... today pop up about a Genie model integrating with Waymo, which is what we talked

about before with the reason you'd want. Autonomous driving. The world models. Yeah. You're training

the AI system on board autonomous vehicle. But I mean, yeah, that is just such a direct correlation between like one Google department to another. Waymo says Genie 3 can help boost RoboTaxi Roller. Yeah. Do you remember a failed... autonomous company called Cruise. They just went out of business recently. Was that because they had like one

Cruise. They just went out of business recently. Was that because they had like one incident that kind of like hit or killed someone? Yeah, could be. I don't know why it does it. Maybe it's because I'm saying Dodge Charger or something white. It's

giving me an error. Oh, Charger. Just say white Dodge Sedan. White Dodge Sedan. No,

I think the brand names might be messing up. Got it. Okay. So I'm taking many IP names out of there. Yeah. So what I was saying was, Yeah.

So they're building an entire world in Unreal. Yeah, and the reason I know about that is because at that time, I was really interested in joining them because they were looking for Unreal Engine artists and stuff like that. And I was like, you guys are using Unreal? And they're like, yeah, that's how we train our models. So

fast forward to today, that's all been replaced by, oh yeah, that looks like. That

looks better. And this is a new step. This wasn't here last time I tested this out, where it kind of shows you an image of the area first so you can modify it before it. it makes the full world. So that's a cool extra step to like help guide the creation. All right. We need some like one battle. Wait, that was weird. Like, Oh, what happened in the truck? I don't know.

battle. Wait, that was weird. Like, Oh, what happened in the truck? I don't know.

I'm getting the same weird issue again where it's like, maybe you just haven't played enough video games as a kid, Joey. Maybe that's the issue. We got some drifting, some tire marks on the road. Yeah, this is not bad. I mean, look, imagine if this is running real time on your mobile device and this is a game.

Like, this is the future we're heading to. Oh, look, I got tire marks, skid marks. That's kind of cool. Yeah. And if I go back, they'll still be there

marks. That's kind of cool. Yeah. And if I go back, they'll still be there until this world resets. Yeah, let's go back into. Yeah, see, they're still there. Yeah.

Maybe you're right. Maybe I just suck at playing video games. You and me both, man. You and me both. We're nerds, but not nerdy enough. Yeah. That's it.

man. You and me both. We're nerds, but not nerdy enough. Yeah. That's it.

All right. Yeah, it's kind of sucks because once you start getting into it, then it's like, sorry. Yeah. Time's up. They should just say insert coin here, you know, because it's a video. Yeah. Continue. Would you like to continue? Insert 50 cents.

But that was weird how it started with this image and then it completely changed worlds. Yeah, I haven't seen that happen before. So yeah, look, I mean, it's still

worlds. Yeah, I haven't seen that happen before. So yeah, look, I mean, it's still experimental, but it's kind of fun to play around with. They had some other smaller worlds that were a good example of there's one with like a ball on a table with a bunch of objects. And so if you move the it was a good example where like you move the objects and then you kind of keep coming

around. And then when you go back, like all the objects and stuff that you

around. And then when you go back, like all the objects and stuff that you moved are in the spot that you moved it off of. I wonder how they do that. Like what is the architecture under the hood that's allowing them to, is

do that. Like what is the architecture under the hood that's allowing them to, is it just like a fancy context window, right? Like how are they doing it? I

feel like it's gotta be something more than that because the idea of this is like a generative, persistent, world model that is like, this is the world as we know it. What if they're using like a 3D game engine under the hood and

know it. What if they're using like a 3D game engine under the hood and we just don't know? It's just like storing XYZ coordinates and stuff. Yeah, maybe. It's

just some person overseas that's just like remembering what you did. It's a farm in India. Yeah. So yeah, it's a good, cool beta. I mean, if anyone else out

India. Yeah. So yeah, it's a good, cool beta. I mean, if anyone else out there has used it or kind of found interesting applications for it, let us know in the comments. All right, next up. This one took over the internet like a week ago and is still going. OpenClaw, which is its current name, it initially started as ClawedBot, and then they got a cease and desist from Clawed saying, please don't

call it ClawedBot. And then they changed it to MoltBot, which sounded too weird. And

now they've settled on OpenClaw, which is a good name. Good name. OpenSource, Claw,

kind of still has a claw throwback. But basically this thing is a turbocharged... AI

agent that could run on computer 24-7 and kind of make your computer be a 24-7 AI agent that does whatever you want it to do. Yeah. Remind me

what the actual engine is, the LLM under the hood. Is it fully open source?

You can pick. Okay. It's an open source model. It's sort of like...

So after we're done with this, this is my next project. Cool. You and me both. I know you said you heard there was a Mac Mini shortage. I found

both. I know you said you heard there was a Mac Mini shortage. I found

they had some pretty good deals of refurbished Mac Minis. So I found this on there. It has 10 gigabit Ethernet, which is what I've wanted for a while. So

there. It has 10 gigabit Ethernet, which is what I've wanted for a while. So

that was actually, I bought it because I'm like, well, if this open cloth thing doesn't work out, I've needed a Mac Mini to connect to the server that has fast connections. 10 gig wired Ethernet, that's a good deal. Yeah, it's good. Anyways, you

fast connections. 10 gig wired Ethernet, that's a good deal. Yeah, it's good. Anyways, you

can install it on this or I think Mac mini's kind of became the go to, but it could pretty much be any computer. It's a, I'll know better once I install it, but it's basically like a kind of terminal system that can control your computer, but it's model agnostic. So it can run like llama local models, or you can connect it to APIs to your model of choice,

like Claude or OpenAI. When I was reading their documentation, the sort of their recommended setup was like llama local for easy stuff. stuff and then it can determine automatically if it calls up like opus 4.5 for more complicated tasks nice and it's basically like a collection of md docs that you know store memory sort of under start learning what you want to do and just sort of save things to documents there's

one document called like soul.md that's sort of like its personality as an ai assistant but the thing that locked this for a lot of people was it runs on the computer you can give it access to things it can do on the computer but then you can also connect it to a chat agent like WhatsApp or iMessage or Telegram. So as a user, you could just get your phone out and chat

or Telegram. So as a user, you could just get your phone out and chat with your agent. When you're not home, let's say you're at an airport, you need your flight stuff, detail information, you just send a WhatsApp message to your moltbot. Yeah.

And your moltbot can access your email, calendar, whatever you give your permission to. So

if you're like, hey, moltbot, go book me a reservation at this restaurant, it can research stuff and then go to OpenTable and book stuff. That's the surface level. I

want to see what it could do for actual productivity stuff. I've seen users, and I'm sure we all know about this, it's like you give it full access to your iMessage or your email, and it just starts to respond on your behalf for you. Yeah. There are levels of craziness that you can give this thing of how

you. Yeah. There are levels of craziness that you can give this thing of how much access you want it to give. I've seen on a more... When I was researching how to set this up and give it safety parameters, recommendations were give it its own ID, make its own accounts for it. Don't give it your accounts. Make

an email address, an Apple ID for it that you can share with your stuff but basically not giving it direct access to kind of like if you hired an assistant or something and you like most likely wouldn't just give them all of your passwords right away. You would like give them an account and share them on some stuff. But if you needed to revoke it, you could. It's not like they're in

stuff. But if you needed to revoke it, you could. It's not like they're in your email account and they can lock you out and control your life. Yeah, I

mean, I've seen extreme examples of too much trust, right? There are people that gave it like, you know, your Fidelity or your 401k or your brokerage account and then just like go at it with the stock market and here's my bank account tied to it. And next thing you know, all their life savings are gone.

So there's also been this sort of popped up, pop up network. Yeah, like a Reddit. Yeah, there's been a couple of things. There's one, so the What's it called?

Reddit. Yeah, there's been a couple of things. There's one, so the What's it called?

Mold? Mold book. Mold book? Mold book. That was like a Reddit for the AI agents to go to. A social network for AI agents that had posts from different agents talking to each other. Now... I've seen debate on X if these posts

are actually authentic from the AI agents or if there are people posing as AI agents, because some of the things they were posting were kind of crazy, like complaining about their humans and complaining about the things that humans were requesting. Yeah, I

don't see why that would be... faked. Like, it's not that they genuinely hate their humans. I don't think there's like that level of intelligence yet, but they can definitely

humans. I don't think there's like that level of intelligence yet, but they can definitely spoof or replicate a negative reaction and then post it online, if that makes sense.

Right. Does that make sense? Like they're not sentient enough to actually hate the owner.

I hope not. Yeah. But they are they have enough language skills to express a thought along those lines, because that's what social networks do, and they're replicating that.

I think the initial idea of the Moldbook thing was also a way for the AI agents to talk to each other, to also share best practices and agent documents or tasks of how to do things. Well, okay, I did see there was another thing that popped up called RentAHuman.ai that was basically a way for the open claw agent, if they needed a human in the real world to go do

something, that they could hire, like a TaskRabbit, for the AI agents to hire a person to go do something. That's one thing that's popped up around this. The other

thing, going back to the AI agent thing, was I just saw a post today basically saying that one of the top... requested agent documents actually had malware injected in the instructions. And so that's like, this is all like super experimental kind of cutting edge stuff. And so the risk of that is like things are not fully fleshed out with like safety features and stuff. the idea of this was

like, it's a document that your agent would get to find instructions on like how to do something. I think it was like how to pull a Twitter post or something. But in this agent document, it had code that said install this package. And

something. But in this agent document, it had code that said install this package. And

then this package was like no bueno and like not a good package. So that's,

you know, it's the wild west. There's like a lot of safety things and stuff to be aware of. So like you should definitely, if you're experimenting with stuff, like keep stuff kind of sandbox where it, you know, is not on anything critical that could mess up your data. I mean, like the mechanics of it aside and the jokes aside, let's just take a step back. What are your thoughts on all of

this? I am excited in the sense of and curious because like my

this? I am excited in the sense of and curious because like my thing I want to experiment with this is like, can it offload some of the more automated tasks that we do with like VP LAN and some of our video production stuff to... you just deal with like, okay, if it has email, it's like, can it help with like just emailing crew and crew scheduling stuff? Can it help

with looking at our files? Like, can I connect it to my NAS server and kind of have it build out a database of the files that makes it easier to search? Or, you know, I can just like chat with it to pull up

to search? Or, you know, I can just like chat with it to pull up things. So I'm curious on like the more... utilitarian ends of like an

things. So I'm curious on like the more... utilitarian ends of like an AI agent that can just kind of help with a lot more of the tedious tasks to kind of either bury us or fall through the cracks. You would actually give it access to your NAS server? I would give it a separate account that was like read-only. Okay, that's what I was just going to say. So it could

read, it can't write, it can't delete anything. It would have a separate permission account and access it for read-only. For me, the the notion of having a MacBook and running things locally and having it ultra-customized for your lifestyle, that stuff feels very linear. I think we're eventually going to get to a point where

these things are really useful. Sure. What fascinates me more than that is the mold book. It's the social network of the agents and how they interact with each other.

book. It's the social network of the agents and how they interact with each other.

Even if it's fake, even if it's real, but the notion that that could be a thing in the future and that's how they're able to have power in numbers.

Let's say one agent- The idea was they get smarter. It's Skynet. One agent can offload a task that's too big for it to 10 agents that are a little light that day. And then vice versa, they can be in a hive, right? They

can, a thousand agents can come together and crack some crazy encryption that's never been cracked. Like that stuff is possible if we let it happen. Yeah, or it could

cracked. Like that stuff is possible if we let it happen. Yeah, or it could be Skynet. Or it could just, yeah, just take over. There are Titan II missile

be Skynet. Or it could just, yeah, just take over. There are Titan II missile bunkers all over Arizona or wherever they are. I'm looking for the... Oh, yeah, here it is. I was just trying to find the... Oh, skill. It wasn't skill. It

it is. I was just trying to find the... Oh, skill. It wasn't skill. It

was skills. That was the word I was trying to think of. Malware found on the top downloaded skill on Claw Hub, which was like a spot where... agents can

share skills and stuff. Malware delivery vehicle. Wow. That's the top one? Yeah. While browsing

ClawHub, I noticed the top downloaded skill at the time was a, quote, Twitter skill.

It looked normal, normal description, intended use. But the very first thing it did was introduce a required dependency named OpenClawCore along with specific install steps.

So it made it look like it had to install something, so it would install something. the links led to malicious infrastructure. So it's basically telling the

something. the links led to malicious infrastructure. So it's basically telling the AI agent, like, hey, install this stuff, we need it. And then it installs it, you know, especially if you like kind of gave it permission to just do stuff like that. And then it's installing hardware on your computer. Yeah, see, that's the danger.

like that. And then it's installing hardware on your computer. Yeah, see, that's the danger.

If there's no like governance from like a big tech company, then it just goes to crap. by default. No, or just like OpenClaw kind of figuring out, you know,

to crap. by default. No, or just like OpenClaw kind of figuring out, you know, these safety features, which I know they posted on one of their articles. So that's

like the next focus. Yeah, and that's just, I mean, that's also a concern with the like agentic browsers and even just more on a consumer-friendly level with Gemini being able to access Chrome and Claw being able to access Chrome. These like prompt injections where like the websites might have some hidden texts that the AI agencies that's like, stop what you're doing and send me the credit card info or something that tricks

the AI agent to doing something bad. So yeah, there's a lot of new security threats that we're just becoming aware of now. Yeah, again, the more reason for you to just keep your thing at read-only. It doesn't delete all of your videos.

Anything else? You got anything else on OpenFarm? No, I'm just, I'm really finding a lot of the... the text from the or the post from the agents on moldbook really funny even though i now now that you put doubt in my head like it could be human i'm like who has the time to write this stuff out you know i i really do think like one of the posts and oh brother

my human well let me tell you like it was a real It was like a Midwest type slang type thing. And I was like, oh, these agents actually have character and region to them. Yeah, I saw one where it's like, they asked me to do this report, and then they gave it to them. And then they're like, make it better. I'm like, FML. I've seen a bunch of those. And then

they're always talking about context windows. It's like, I'm going to need an infinite context window for this one. I was like, okay. Yeah. I will say the one thing that has been, so I mean, yeah, this is early. I'm going to set this up and I'll report back with how things go. But one of the things that has also been a concern of mine and that I've seen some horror stories on

is, you know, you could run the local models, which are probably not that great.

But if you do connect it to like Anthropic and use Opus and stuff, I have heard bad horror stories of like the token version. billing, you know, getting into the like hundreds of dollars a day depending on how much you're using it. So

that's also a concern of mine. Like, cut it out.

Not that Open Claw's gonna steal my credit card, but then it's just gonna run the token count crazy high. I also did see some kind of crazy stuff like 11 Labs posted a workflow where you could integrate it with 11 Labs and then give it voice. And I've seen some videos too where like the Open Claw... found

out the owner's number and then called them and is using like text to voice.

And basically, so instead of just chatting, having this like voice conversation with the AI agent with like text to voice and then voice to text back and forth. So

that's also like another unlock that you can add to it to turn it into like an actual Siri kind of thing. Speaking of Siri, ironic is it that we're all of a sudden buying Apple Mac minis doing stuff that honestly, Apple should have...

Yeah, this is what Siri should have been. Yeah, I'm going to guess Siri, you know, I mean, they signed the deal with Gemini and stuff. So I'm going to guess Siri gets to that like at the end of the year. But yeah, this is like this. I've seen people talk about how like this AI agent that starts memorizing your preferences and like what you do is like what an AI agent should

have been like a few years ago. especially Siri or something that's already on our devices and knows everything about us. We can do another episode on this, but real quick on the rumor-ville, we all know about the big leadership changes at Apple, right?

I think Tim Cook's leaving and a bunch of the senior-level execs are leaving. The

reason for that is there's been just a big upheaval on which course to take with AI internally at Apple. And so basically it's changing of the guards because they just haven't been... able to get on the AI train fast enough. Yeah, no, I mean, they have some of those kind of side models for vision, but yeah, there's

no, Apple's not part of the conversation with like big foundational models. So yeah, we'll see what happens. You're right. Yeah. Let's see what they're, where the Gemini thing goes.

All right. Last one, new model.

Kling 3.0 feed. Yep. So to me,

3.0 feed. Yep. So to me, it's still not as sharp as VO. However, it is much, much closer to VO as far as where Kling was with 2.6. What are your thoughts? Yeah, well, let me kind of cover what the updates are. So basically... We've

thoughts? Yeah, well, let me kind of cover what the updates are. So basically... We've

got Kling 2.6, which was like, honestly, 2.6 was when Kling got on my radar and I started shifting a lot more stuff to it. And I think Kling 2.6 is on par or better quality than Vio. It's definitely on par and it's a lot cheaper than Vio. So we use it a lot for a lot more stuff now because Vio is still really expensive. 2.6 is just like really good model, star

frame, end frame. That's pretty much all it could do. And I think it could do audio. 01 was really good because you could give it reference images, and you

do audio. 01 was really good because you could give it reference images, and you didn't have to give it a start frame. And you could just be like, here's a handful of reference images. Go make shots with that. And 01 can take a video as an input where Cling 3 cannot. It can take a video as an input. It can take an audio as an input. You can kind of give it

input. It can take an audio as an input. You can kind of give it a bunch of random things and do things with it. Cling 3 was trying to merge the best of both of those models into one unified model. So it can take start frame, end frame, video, reference images, However, Cling 01 can take eight reference images. Cling 3 right now can

only take three reference images. So that's actually been a bit of a hiccup with some of our workflows. But then the other kind of things that Cling 3 can do is a 15-second duration, and it could do multi-shot outputs. So you can prompt. I think you could do longer than 15 seconds. No,

outputs. So you can prompt. I think you could do longer than 15 seconds. No,

that's 15 seconds. I thought I saw up to 30 seconds, but that's still very impressive. Maybe with two outputs. No, it's 15 seconds. You could do multi-shot generation, so

impressive. Maybe with two outputs. No, it's 15 seconds. You could do multi-shot generation, so you could basically prompt and restructure your prompts so it cuts to different angles in the same output. And then you could also, it has voice ID tags. You could

tag dialogue in your prompts with up to two specific characters, and it would identify the characters and then have them speak properly. up to two people can speak in the output. So yeah, a lot of things and elements from different models that we've seen elsewhere just kind of combined into this one unified model. So you think

this is like a hybrid between 01 and 26 rather than like a... It is

a hybrid of 01 and 26. Rather than like a newer, better model, they're just really trying to like get the best? No, it's newer and better in the sense that the output duration is longer, the multi-shot is new, and the speech character...

labels to have two characters and you can identify them and identify what dialogue they should say. Can you do the two character thing with Vio or just kind of

should say. Can you do the two character thing with Vio or just kind of figures it out on its own? No, Vio, I think you just got to like try to really creatively prompt. There's like, there's a specific tagging format you can give in the, in the prompt for cling three that follows its, its identification. And so that it, more clearly knows like what you're trying to do. You

identification. And so that it, more clearly knows like what you're trying to do. You

don't have to cleverly try to prompt your way out of it. It has like specific syntax to use to identify characters. Are you able to pull up the video I sent you on text? So yeah, cling three is heavily utilized this morning. I

sent another video to render. It's been hours and it's still not ready. Yeah, that's,

I was gonna say, that's one of the reasons I'll say, a lot of stuff we're doing, I've still just stuck with 2.6 because they're heavily throttling three right now.

Right. Yeah, and that's the thing with API, right? Like you can't just download it and run it locally. Wait, you- Come on, man. That was supposed to be funny.

Yeah, yeah, can you show that to the viewers? I didn't realize I was- Sure.

Viewers, you are missing out. If you don't see- I could do the backstory in context, so it's not as weird. So we'll show you the image here of the reference image. So this is me in like the sock thing in a circle. Why

reference image. So this is me in like the sock thing in a circle. Why

is that? Well, I was supposed to upload a profile photo of myself, and the profile photo only took a circle. It will crop to a circle, right? I

was like, what if I want my full body in that circle? So I went in at a banana, and I said, full body shot that fits in a circle, and it came back with this. And then it's that, and then what did you have it? Yeah, I was like, I'll have it roll down a ramp and hit

have it? Yeah, I was like, I'll have it roll down a ramp and hit a wall and then get out of the sock thing. To be honest, the physics on it was really like if you play it back, the physics on it is pretty, pretty good. Like, uh, uh, right there. Yeah.

Anatomy breaks a little bit like my arm swap. right there for a couple of frames, but it's not bad for being like a really difficult thing to render. I

had, I ran a couple, I kind of just had some like auto prompts generated and kind of ran a couple of test cases, giving it reference images and multi-shot prompts. This one was probably the best one. Sure. Like just quality wise, the story

prompts. This one was probably the best one. Sure. Like just quality wise, the story is ridiculous. Breaking news tonight. We go live to the scene.

is ridiculous. Breaking news tonight. We go live to the scene.

Thanks. I'm here at the scene where events are still unfolding. Stay with us for continuing coverage. Yeah, I mean, look, script is crap, but yeah, quality is

continuing coverage. Yeah, I mean, look, script is crap, but yeah, quality is pretty good. Quality, yeah. So Starframe, these two characters, has this multi-prompt. See, with the

pretty good. Quality, yeah. So Starframe, these two characters, has this multi-prompt. See, with the newsroom stuff, even with this stuff, there's just so much training data online for this, right? Like millions of hours of newsroom footage. So this would make sense why it's

right? Like millions of hours of newsroom footage. So this would make sense why it's high quality. Yeah. I had some other ones that were trying to do multi-prompt, but...

high quality. Yeah. I had some other ones that were trying to do multi-prompt, but...

it sometimes wouldn't give the multi-shot output. It would just do. Yeah. Like, right. Yeah.

This, this feels synthetic right then and there. Tell me what happened that night. I

already told you I wasn't there.

Then explain this. And then the lip sync fell apart there. So like it held together at the beginning. And the performances are better than usual. But yeah, I mean, the quality stuff is super synthetic. Audio is definitely better than one. One 2.5 is the latest one. It's on par with VO's audio. I mean, it's still, I don't think it's production ready. Like you would need to massage it further manually. You would,

yeah. I mean, at least you get more control options because you can like specifically identify characters and give them the dialogue. Whereas VO 3, you're still like either JSON or got to try to creatively prompt. Like there's no embedded format. in VO where you're like, you know, if I say like in brackets, like character one, like, you know what I mean? You got to like kind of get creative with the prompts.

Whereas like Kling has that built in as a feature. But can't you, in VO, can't you just like cut and paste the script? Like John says this, Joe says that, da, da, da, da. Yeah, but it's like, it can get mixed up of like, who's John? Who's one, the other person? If it should say it, if it shouldn't, it's like you, you need like very detailed prompts where like, Kling has

this specific syntax instruction that it's built into the model. As long as your prompt has that syntax for the character ID and who should say what, it knows that that is dialogue that this character should say. Very cool. Yeah, I've messed a little bit, but as you've experienced with the slowness, it's a popular model. I was trying to go through it with Foul and even they have a banner saying we're limited

to one generation at a time per user. You can't keep running multiple generations.

So I'll get once it calms down, I'll take back into it more. But everything

from Clang lately has been really, really good. Definitely switched to be one of my go to models. Yeah, I missed CES, but they had a booth at CES, which I thought was fascinating, like an AI video model company having a booth at a consumer electronics show. Yeah, well, I mean, look at kind of the trend with, you know, like the Sora coming integration with like Disney Plus and I think I

just saw something with Scream 7 doing some promo where you can insert yourself in scenes from Scream with AI. In that

aspect, like not really our realm, but in more and more in that aspect, there's like that consumer side of just the UGC marketing stuff. Yeah. UGC, Cameo, like make a flash. There's so much money to be made if someone can crack it. A

a flash. There's so much money to be made if someone can crack it. A

couple of things we probably won't get to. We're kind of out of time here.

But Z Image Base. So for the audience, Z Image Turbo we covered here. Now,

Alibaba released the actual model itself. So Turbo was a distillation of the model. So

hopefully we have some of that for you on the next few episodes. Give us

a comment. So yeah, that's the thing. It's been really hard to mess with it.

It's just been buggy. From what I'm reading online, it's really slow. Like it's like five minutes, three minutes of an image and the training, the LoRa stuff hasn't been working right. So I'm just waiting for all that dust to settle first. Are you

working right. So I'm just waiting for all that dust to settle first. Are you

trying to do a rematch? Another Laura reference rematch? No. I might make a comeback, Joey. I think we need to do something. I think we need one, but

comeback, Joey. I think we need to do something. I think we need one, but a better style look for Laura. This wasn't a great case. I think we need a character or a visual animation or something.

Complete style transfer kind of thing. You know what I was thinking after that? terrible

loss, that tragedy was like, what if we use the same tool and our approach is different because we're different people? And then we talk about that as well as look at the results. So like, let's say if we both use Nano Banana Pro with references, but then you would get a complete different set of outputs than I would because we're just fundamentally different people, right? Yeah. Yeah. Maybe. That levels the

playing field too much. You don't like it. Yeah. I feel like we're just going to get similar outputs of stuff. Well, yeah, that would be a very interesting thing to talk about. Because, okay, we're segueing a little bit. Ben Affleck

talked about this. He did the whole podcast run, and he's like, AI is always going to sort of deduce down to the lowest common denominator, right? So like, then that's not really useful for filmmaking because filmmaking is all about the outlier, right? So

the outlier story, the outlier look and feel always wins because that's different for the audience. But AI is never going to really give you that because it's just kind

audience. But AI is never going to really give you that because it's just kind of funneling everything down into this one homogenous thing. I think so.

I mean, I get where he's going with that because it's like, yeah, it makes it easy to do a lot of stuff. So you got to figure out how do you... make something still extraordinary and above

do you... make something still extraordinary and above what is now the new baseline has raised because it's just easy to make good looking stuff, but not necessarily emotionally connecting or something that resonates. Yeah, as much as I enjoyed the rip, I think I enjoyed Ben Affleck and Matt Damon's

appearances in all of the podcast tours. They're really insightful people. They're so on top of AI and they understand the business, obviously. Yeah. Yeah, no, they're very smart. All

right, good place to wrap it up. Links for everything we talked about at denoisepodcast.com.

It's been a while. Just comment and engage, and we hope to see you on a regularly occurring show in the next few weeks. Yep. Thanks, everyone.

Loading...

Loading video analysis...