Nodes Aren't the Future of AI Creation. Here's What Is.

By Bilawal Sidhu

Summary

## Key takeaways - **Nodes Illusion of Control**: Nodes give you the illusion of control, but you're still hitting the generate button, waiting for the model to come back with something that you hope is close to what you want. They get in the way for creative tasks. [00:21], [00:39] - **Need 3D Viewport Control**: We are largely flying blind without the equivalent of a viewport, this 3D representation that allows us to see exactly what we had in mind before we actually hit that render button. In 3D applications, it is the visual anchor, the source of truth that gives you this interactive window into the world that you are creating. [00:35], [00:59] - **Nodes Are Assembly Language**: Nodes are like the assembly language of AI creation, but we need a higher order abstraction that feels a lot like it does to create on a sound stage. Nodes are a very effective way for you to take a bunch of different models and operations and connect them together, but that's plumbing, not filmmaking. [01:39], [02:02] - **Unreal Engine Ideal Model**: Inside of Unreal Engine, you've got node graphs when you need them, the timeline editor when you need them, and the viewport to understand exactly what it is you're doing and manipulating. The spatial solution has all these tools working together. [04:29], [04:56] - **3D Scene Graph for AI**: A 3D scene graph is nodes for the AI itself, this representation that tells the AI system everything that's in the environment, the entities within it, their characteristics, and interactivity. It gives you the language to intuitively author in viewport and timeline mode, but gives the AI something it can understand. [08:34], [09:03] - **Nodes Bridge, Not Destination**: Nodes are essentially a bridge to the future. They are not a destination; they're a good side dish, but can't be the main course. We need the viewport and the spatial 3D engine to make creation feel more human. [10:42], [10:51]

Topics Covered

Nodes Illusion of Control
Nodes Are Assembly Language
Need Real-Time Viewport
3D Scene Graph Unlocks Continuity
Nodes Bridge Not Destination

Full Transcript

If you used any generative media tool lately, you have probably noticed a trend.

Nodes everywhere the eye can see.

Whether it's Crea, Weevi, the OG Comfy UI, freaking Runway, heck, even Adobe is leaning into nodes. And quite frankly, I'm a little worried cuz to me, nodes aren't the future. In fact, I would

posit that they're getting in the way.

They give you the illusion of control, but you're still hitting the generate button, waiting for the model to come back with something that you hope is close to what you want. I would instead

posit what we need is 3D control. Here's

why.

Look, in my opinion, when we're doing generative AI right now, we are largely flying blind. We don't have the

flying blind. We don't have the equivalent of a viewport, right? Like

this 3D representation that allows us to see exactly what we had in mind before we actually hit that render button. It

really does feel like typing in some prompts and maybe a couple of image references and hoping for the best. Now,

if you used any 3D application, you know exactly what I'm talking about. It is

the visual anchor, the source of truth that gives you this interactive window into the world that you are creating.

And I would posit that in the case of AI. That's exactly what we need, except

AI. That's exactly what we need, except we're co-creating this world. And it's

really funny to me because a lot of people are like, "Oh my god, but like 3D is really hard. Like, I don't want to have to manipulate the camera angle and where all the people are and the lighting sources and all this other stuff." and instead they're dealing with

stuff." and instead they're dealing with this like this crazy node graph that like just is so hard to gro. I mean look like I grew up on After Effects and

Nuke. I appreciate when you need nodes

Nuke. I appreciate when you need nodes but honestly most of the time especially for creative tasks they kind of get in the way. Now look, this of course makes

the way. Now look, this of course makes sense, right? Because what we're doing

sense, right? Because what we're doing is plumbing, but let's not call it film making or content creation, right?

Because essentially nodes are a very effective way for you to take a bunch of different models, a bunch of different operations and be able to connect them together. Yes, it's great for that, but

together. Yes, it's great for that, but think about how far that is from the language and art of actually being on a set and recording something with your friends, even completely night and day.

And that's sort of my point that nodes are like the assembly language of AI creation, but we need a higher order abstraction that feels a lot like it

does to create on a sound stage.

Now look, nodes of course aren't new as I mentioned earlier like they have been the backbone of all sorts of crazy tools like of course Houdini famously, Nuke is another great example of course Blender as well. And the whole idea was like how

as well. And the whole idea was like how do you create a visual interface for people who aren't savvy with programming so they get the right knobs and can control sort of like the flow of data

the sequence and order of operations in a visual fashion not having to resort to writing lines of code. But the thing is we're missing half of the equation right like sure you can have this like node graph representation. Heck you can

graph representation. Heck you can maintain a very detailed 3D scene graph that you can manipulate but you don't have the viewport right now. The

viewport is basically hit that slot machine, baby, and hope you get a nice video generation or incrementally build out an image and then post-process that and then post-process that and on and on we go. In a sense, what we're lacking is

we go. In a sense, what we're lacking is real-time preview. And what I'm excited

real-time preview. And what I'm excited about is we've got this whole new crop of real-time video models that are popping up that will actually make this a possibility.

Now, before I get ahead of myself, let's take a look at sort of like the three approaches to create content right now and look at the strengths and weaknesses very quickly. Like, obviously, you've

very quickly. Like, obviously, you've got the stack, right? Like, this is a layer-based editing approach that we all know and love. If you use Adobe Photoshop or After Effects, you know, this sort of sandwich logic is very easy

to understand. Like, the order of

to understand. Like, the order of operations of compositing is you got the bottom of the stack and then you're just sandwiching things on top of it, right?

You can easily move things above or below an object just by changing where it is in the stack. Now, of course, as you probably also know, if you're in After Effects and you're like 50 pre-MPS deep into, you know, a composite that

you're trying to do, those layers can quickly become your enemy. This is where the node graph approach is super super helpful. Like essentially, you've got

helpful. Like essentially, you've got this approach for plumbing all your data with all sorts of operations that you can compose together and do very sophisticated image manipulation and, you know, data processing. Yeah. But as

you probably noticed, like you know, you still need the timeline view, right?

like the layers work really well in concert with the timeline, especially when you're doing things like key framing, right? That gets a little bit

framing, right? That gets a little bit more challenging once you start getting into node land, right? The point being the flow itself is not enough. You need

that timeline view. So, I would pause it that the closest thing we actually have is number three over here, which is the spatial solution Allah Unreal Engine.

Inside of Unreal Engine, you can do all of these things. You've got node graphs when you need them. You've got the timeline editor when you need them.

Heck, you can even switch to a sequence reviewer view where you're like basically moving around your shot list when you need that. And of course, you've got the viewport to understand exactly what it is you're doing and

manipulating.

So, this is the problem, right? Like

right now, we're in this space where you've got a new Lego block that drops every freaking two weeks that the main problem everyone's trying to solve is like, well, how do you put all these pieces together? And everyone's

pieces together? And everyone's defaulted as a consequence to a node graph. there's very little innovation

graph. there's very little innovation that's actually happening on like temporal or timeline based events, right? Like the best you can do is sort

right? Like the best you can do is sort of extend stuff, right? That's not super helpful. I mean, like that's very

helpful. I mean, like that's very limited, right? So, the solution to me

limited, right? So, the solution to me really does feel like a 3D application where we take advantage, especially of all these new world models that are popping up to populate the canvas, right? You create your character once

right? You create your character once because you know that's the character that's going to be in all of your generations. You create your environment

generations. You create your environment once and you can image it from any direction that you want. And then you can frame the shots exactly as you like rather than typing in esoteric like prompt incantations hoping that the

camera angle comes close to what you had in mind or drawing weird scribbles just to nudge things in the right direction.

It gets really weird. So, and I think that is going to be super freeing. Like

once we put the Lego bricks together, what are other views on exactly the composition that we're trying to build?

And I think that'll take us from this sort of like cage of nodes to giving us a lot of agency to build something where you are defining the composition and the motion intuitively in a fashion that you know really well. Like in fact, I really

want to be able to take my phone, have this motion tracked, have the camera feed go into the 3D application and use that to frame my shots out, right? Or

use something like a 3D mouse if you're familiar what this thing is.

But I think the fact is the reason companies aren't doing this is because it's hard, right? Like on one hand you've got the node graph style approach. The other way is templates.

approach. The other way is templates.

Like you see companies like Higsfield doing this basically saying like let me take a bunch of these complicated comfy UI graphs and wrap them in a very simple to use template. But that's only really good for making memes or really short

form content. Like the moment you start

form content. Like the moment you start talking about doing something that's longer than 3 to 5 minutes, it gets really exhausting. It is perhaps then

really exhausting. It is perhaps then unsurprising why most of the professionally generated content that you're seeing is a bunch of these chaotic 1 to two minute advertisements.

That's all you really see. And there are just a handful of creators that are pushing this boulder up a hill to make multi-minute cinematic content. And hats

off to them. But to bring everyone else into the mix, I think you need direct manipulation and real-time feedback.

And as I alluded to earlier, finally we have video models that are capable of doing exactly this where it's not just you using something like Genie to WSAD around and move the camera, you know, using your keyboard, but you could start

pumping in the AR camera pose like you mentioned from your camera itself. And

of course, you've got other models like motion stream that let you essentially manipulate objects in the generation directly as well, just like you would in a 3D engine. And that's where the problem, right? Like these systems don't

problem, right? Like these systems don't have spatial memory. And it's you as the human that's sort of managing the spatial context for it. In other words, you might go into World Labs and capture the environment from a couple different

angles. And every time you do that

angles. And every time you do that generation, you might provide that image reference sort of as the ingredient. But

I would posit that the system should do that for you.

And you're seeing that World Labs is actually investing in a similar direction where like if you're using a model like Genie and going around and you see an image, it records the three-dimensional pose of each of these images to give it that spatial memory.

But it's more than just the spatial memory of what you saw in environments, right? We're talking about building an

right? We're talking about building an entire experience, which is why I think it is very beneficial to have an explicit 3D scene graph. And as all these like visual language models get better and better at doing amazing

things in things like 3JS, you must have seen some of my previous experiences building amazing fun things. In fact,

heck, I built two games that you can go play on YouTube that were entirely vibecoded, right? Like this tech already

vibecoded, right? Like this tech already exists. it just hasn't been married

exists. it just hasn't been married perfectly with these amazing video diffusion models and auto reggressive models that are capable of doing amazing things on the visual end. So the beauty of a 3D scene graph is sort of like

nodes for the AI itself. Like it's this representation that tells the AI system everything that's in the environment, the entities that are within it, the characteristics of those entities, interactivity that is attached to any of

those entities. It gives you the sort of

those entities. It gives you the sort of language to intuitively author in viewport and timeline mode, but give the AI something that it can understand. And

isn't it ironic an AI system is better at understanding nodes than we are? Now,

look, I'm excited to see there are companies that are trying to tackle this problem with a 3D first approach. A

great example is intangible AI, which basically lets you use text prompts to flesh out a scene in 3D, right? Like, so

you know exactly what you want. You can

manipulate everything with full 3D level control, and then you just use generative AI to sort of take it all the way. There's another tool like this

way. There's another tool like this called Artcraft that lets you do exactly what I just described. Like you fleshed out your environment, right? Like you've

got a character within it and then you can go about prompting it. So like

especially when you're doing multi-shot continuity exactly like this example where let's say the protagonist needs security camera CCTV view. Very easy to do. Or it needs to like hide behind this

do. Or it needs to like hide behind this like artifact over here. You can scope that out so easily in this 3D world rather than having to prompt it or scribble it and do weird things. want to

get another angle like and then get like the jump scare reveal. Like those are all things that are so much easier for you to plan out in 3D. That is

fundamentally what I think is going to unlock this like multi-minute creation.

And so I think having this hybrid approach of 3D that's combined with generative AI, right? Like where you've got the timeline view when you need it.

You've got the nodes when you need it, but critically you've got the node representation of the 3D environment that the LLM can absorb and reason about and help you manipulate. You can finally start unlocking multi-minute creations.

Like we need that spatial AI engine. And

it's kind of funny if you've seen my previous videos about world models and physical AI. This is exactly what

physical AI. This is exactly what robotics needs too, right? Like except

the constraint for robotics training data is like make this look super photorealistic and true to life. What

we're talking about allows you to take creative liberties, right? Because photo

realism and cinematic realism are very different things.

So that's my take. Maybe I was a little spicy at the start. But you know, you kind of have to do that for these type of videos. But hopefully the points that

of videos. But hopefully the points that I'm bringing up engage with you, right?

Like nodes are essentially a bridge to the future. They are not a destination.

the future. They are not a destination.

They're a good side dish, a good companion, but they can't be the main course. We need the viewport. We need

course. We need the viewport. We need

the spatial 3D engine because ultimately we want creation to feel more human and not less, right? Like nodes are representation replace coding. Most

coders today are just using clawed code and not even looking at the code. Why

are we having creators fiddle around with all these node graphs? It makes no sense to me. And once we start embracing spatial interfaces, not only are we going to make better tools, right? Like

we're creating tools that work like our minds do. You're replicating that

minds do. You're replicating that experience being on that virtual production and sound stage with your friends creating a piece of content except your budget is now unlimited and

you can do it all inside the computer.

All right, so that's it for this video.

Uh so let me know in the comments below what do you think about nodes? What do

you think about timeline? What do you think about this spatial AI engine and this hybrid approach? What do you think is holding back multi-minute generations? And I'll see y'all in the

generations? And I'll see y'all in the next one.

Loading...

Loading video analysis...