Runtime PCG in The Witcher 4 Unreal Engine Tech Demo | Unreal Fest Stockholm 2025
By Unreal Engine
Summary
## Key takeaways - **Runtime PCG Triggers Grid Updates**: Runtime PCG is a trigger that signals the system to update when you get close to a cell in a partitioned grid structure, dividing the world into cells updated on approach. [01:13], [01:33] - **GPU Nodes Enable Visual Efficiency**: PCG GPU support uses checkboxes or specific nodes like custom HLSL for GPU execution, keeping visual components like grass from competing with gameplay tasks on CPU. [02:14], [02:40] - **Foliage Pipeline by Asset Importance**: Hero assets are hand-placed by artists, medium-size assets like shrubs to trees use pre-baked PCG tools, and grass with more instances than all other foliage combined requires runtime generation to avoid performance issues. [04:40], [05:55] - **Hierarchical Grids Minimize CPU Work**: Hierarchical runtime PCG uses a huge unbounded cell for one-time CPU tasks like reading parameters, passing data to 128m medium cells for high-overhead work, and 32m small cells for final scatter. [08:28], [09:35] - **Dual Scattering Hides Density Transitions**: Dense grass scatters on small grid with 128m radius, sparse on medium grid with 256m radius and scaled assets using min/max draw distances to smooth transitions, recoloring edges to match ground. [14:39], [16:13] - **Minimize CPU-GPU Transfers Hierarchically**: Perform CPU uploads like graph parameters once at unbounded grid level, letting data flow down hierarchy without repeated readbacks; use GPU backends and skip CPU readbacks where possible. [20:40], [23:50]
Topics Covered
- GPU PCG Enables Visual Runtime Generation
- Tier Foliage by Asset and Area Importance
- Hierarchical Grids Minimize CPU Work
- Dual Ring Scatter Hides Density Transitions
- Minimize CPU-GPU Transfers Hierarchically
Full Transcript
Hi. Hello. I'm Max, tech artist from CDPR and we and I'm Hugh BS and I work on the procedural team as a senior tools programmer with a focus on runtime
generation and GPU execution.
>> And here we are to talk mostly about uh PCG in visual for demo. Uh not about this specifically the folage but about
its distribution in the world. As you
see, it's uh pretty dense. Um at least in my opinion, looks pretty um pretty good.
[snorts] Um uh so I'll be talking mostly about uh the practical use of PCG. So how I used
the system in the demo and Hugh is going to expand a bit more on its uh technical side on its performance future and stuff like this.
Uh a quick reminder what the runtime PCG it is because it's um it will be very much relevant. Um it's just a trigger in
much relevant. Um it's just a trigger in the PCG. Uh and uh it will uh signal the
the PCG. Uh and uh it will uh signal the system when to update it. There are
different triggers. runtime specifically
of course means it will be triggered in runtime and it's almost always used with partitioned u setting uh what it means
that uh PCG will divide the world into this um gridlike structure and every cell will be updated when you get close to it which is depicted on this image.
Uh otherwise it's just a normal PCG.
A great benefit for uh this approach is that whenever you modify a parameter uh rather in in editor or in runtime it doesn't matter uh the things will be
updated uh immediately which is um very important for artistic work.
Uh another important thing we wanted for the vure for tech demo is um for this runtime PCG to be executed on GPU.
uh because uh as you'll see later I'm mostly using it for grass uh and grass is mostly visual uh component of the picture and you don't
want really visual components to fight for performance with gameplay uh related uh tasks. Luckily uh PCG team just
uh tasks. Luckily uh PCG team just released uh the GPU uh support which is not really a mode per se but just a set
of uh nodes. Um some of them are just a checkbox as you see in the picture. You
click it, the note works just the same but on GPU. Uh but there are some uh GPU specific nodes which are uh especially
powerful. Um that's what you see on this
powerful. Um that's what you see on this picture. Uh they are categorized as
picture. Uh they are categorized as custom HLSL and they are especially great because they are essentially a blank canvas. You
can write your own logic there. Um you
can combine multiple logical steps even in the same node. Um there are different flavors for the node. Uh and they mostly
differ in how the node structure it works inside. So for example, point
works inside. So for example, point generator uh will generate certain amount of uh operations based on however many you say like on the interface of
the node you say like yeah make me a,000 points.
uh while point processor you give it a set of points and it will just look at this set and decide on the number of points how many uh iterations to do.
Um for the note you get this uh UI which is three text panels. The first one is uh well I call it hints but uh actually
it's just a declarations for functions.
The these functions are um dynamic. So
whenever you change inputs for the node, you'll get different kinds of uh functions. So it's very handy. The
functions. So it's very handy. The
middle panel is for your own functions.
Uh it's for code u optimization and code organization. And final small uh panel
organization. And final small uh panel here is for your own logic. It's the
main work area. In my case, you can you see it's just five lines to spawn the grid of um trees on a landscape. So no
boiler plate at all which is in my opinion awesome.
Now where the PCG runtime sits or like PCG in general sits in the fage pipeline for the demo uh we uh identified multiple criterias
and uh one of them is asset importance.
So the big assets, hero assets, stuff like this, uh they will be placed almost guaranteed by artists. Um
and so it's not really PCG at all. And
that's what you see on this picture.
It's purely handplaced assets.
Uh next, we give artist a tool, uh PCG based one, uh to fill areas with like generic polish, let's say. Um the tool
is uh for um mediumsiz assets like from sharp to a tree.
Um the logic for placing them is uh kind of not as complex as u um hero assets of course uh but also not
as simple as just randomly placing stuff which means um that we can potentially come up with the BCG graph but this graph is not runtime. It's uh it will be
pre-baked foliage.
And finally, to finish the picture, we need some grass. Um although it doesn't seem like it, there are more grass instances here than all other foliage
combined. Uh which means we cannot um
combined. Uh which means we cannot um unlike previous slide, we cannot bake this foliage. We have to generate it in
this foliage. We have to generate it in runtime because otherwise um we'll run it into the performance issues.
Uh another [sighs and gasps] uh big criteria is uh area importance.
So uh not only asset is important but where it is placed. So if it's some edge of the map or some area that nobody cares about. Uh more likely than not the
cares about. Uh more likely than not the PCG or like any procedural generation will be used to place stuff there. While
for quest areas and like key visual areas almost guaranteed it will be mostly filled by artists by hand and anything in between well you decide
that's how I imagine this distribution in my mind.
Uh and that's how it works out um in the demo.
Uh this is panel side by side. Uh the
first one is just handplaced. Uh middle
one is hand place plus pre-baked. And
finally is the complete picture picture.
Uh quick word about the tool we used in um for pre-baking stuff.
Uh this is a task system we made uh that collects data from different sources uh from uh unreal world for from uh other
applications and it um organizes in into PCG readable data.
Um it's fully preset based so only one person have to know how it works inside and he will prepare um all of the cool
biomes if if you will. Um it works on tiles um which means artist will not um clash together. There will not be
clash together. There will not be checkout problems unless they work on one specific tile. Exactly.
And the tool is optimized for um mostly quality and uh flexibility rather than speed. Um which makes it uh well in my
speed. Um which makes it uh well in my opinion great for this mediumsiz foliage.
Now for uh the grass um the way it's made uh it's not only uh partitioned runtime it's also
hierarchical and it means that there is a uh every cell will be subdivided into more more cells.
Um the benefit here is that we have the one huge cell over the whole world depicted in blue here. And this uh one cell is essentially for stuff you want
to happen only once. Uh so like reading graph parameters uh some organization stuff uh there and um I'm personally
decide to dump all of the CPU work there and then uh you pass all of this uh data that you collected on the big cell to
smaller cells um and do some other work.
uh for example this um red uh cells depicted here it's 12 28 m per one cell uh and there I'm doing some work that
has high overhead but has to happen at some point and once I've collected all of this data I will pass it to the smallest tiny cell there are just 32 m
in size uh and there will be a lot of them of course because of the size uh and the final scatter will happen on them.
This is the like logical overview of where that data comes uh from to the graph. And I have two main sources. One
graph. And I have two main sources. One
is landscape and one is um graph parameters. For landscape, it's pretty
parameters. For landscape, it's pretty easy. Uh I'm getting the height map, of
easy. Uh I'm getting the height map, of course, the normals and through material I'm getting grass maps.
for parameters. It's um nothing too special. It's slightly more but it's
special. It's slightly more but it's slightly more tricky. Uh I'm getting the I have two arrays of assets. One is
skeletal, one is uh static um and one structure which is a setting per one grass map. So there is a a decent chunk of parameters.
Um yeah and once all of this are uh passed into the graph that's how uh the processes look like the logical overview
of it. You can notice that I'm using
of it. You can notice that I'm using uh CPU almost all exclusively on this unbounded grid uh because I'm I'm trying
to avoid the CPU as much as possible and once um the data is collected there I'm dumping it into the GPU right away.
Um you can notice that landscape and grass maps are read on the 120 28 m grid. As I mentioned the idea here is
grid. As I mentioned the idea here is that of course you can read the landscape uh on the huge unbounded cell but then you're getting the whole landscape at once which is has memory
implications. file if you'll read them
implications. file if you'll read them um on the smallest cell um it's also possible but then uh these operations have um a noticeable overhead
uh so you'll get a into a performance issues this that way and I'm doing the scatter on both of the um grids and you'll see why in a second
that's the whole graph if you ever worked with PCG you know it's very tiny small graph and Even then uh the blue box uh which is the most of the graph is
just reading the parameters. It can be collapsed into the smaller um like subgraphs or some sort of uh
loop but then it will be slightly harder to explain but in production you probably will do it.
Now I'll go through every part of this graph. I'll explain everything.
graph. I'll explain everything.
Uh so this part is nothing too special.
It's just reading the inputs for landscape and um settings. As I
mentioned, we have a structure of um for every grass map I have some kind of a structure and the settings as part of the structure. Um I'm keeping a close
the structure. Um I'm keeping a close mind uh in of the order of the settings because um if the order because I'm using the order as a key to link all of
them.
That's another part of the inputs. Um
this one is for assets. Um there are two streams. One is for static, one is for skeletal. Um and it's necessary because
skeletal. Um and it's necessary because well for static uh let's say for a rocks you don't need uh skeletal and vice versa for uh grass you do need skeletal [snorts]
uh then I'm classifying what what is going where uh adding and also I'm adding a one placeholder uh item to this
stream uh just to make uh just to make sure the data will not be collapsed um because well as I mentioned order is important in my graph.
uh and if artist did not provide any assets the data will go away that's the another part of the uh
computation on the unbounded grid here I'm doing weight reormalization uh of course it can be done later for example when I'm scattering uh but since
this operation is common for all uh tiles it's better to do it only once on the input side rather than in the nodes U like in custom shell cell nodes. Uh so
it's a tiny performance optimization but just so you know.
And finally for um inputs you of course want for your artist to have some way to paint where stuff goes. Luckily Unreal
already have a system for it which is this uh landscape grass system. uh I'm I have almost complete setup for it except uh I don't provide any assets to the to
this landscape grass system and that way uh the generate grass map node can fetch the grass maps and create PCG textures for me to sample down the line
and finally for uh nodes I'm doing the you know point generation of course and you can notice that I have two exactly same setup uh and that's for a reason. Uh initially
there was only one uh the this orange grid uh which I kind of renamed to be a dense uh grass scattering. Um and what happens
there is of course I'm just scattering some points. Uh but uh if you you see a
some points. Uh but uh if you you see a great deal of transition between this points and nothing. So if it's uh dense
the grass and nothing you'll see a huge jump. An artist constantly asked me to
jump. An artist constantly asked me to increase the radius of course. So you
can't really fight uh this transition by increasing the radius infinitely. Uh
then you'll run into the number of instances uh issue.
So the solution I used is I added another scattering pass with this medium uh grid on the medium grid which I called sparse foliage. each. It's
exactly the same setup just fewer number of points and u to compensate for um lack in density I'm also scaling them up
a bit. Uh and also they are uh cooled
a bit. Uh and also they are uh cooled away using minimum draw distance and maximum draw distance. So for the closed grid um there is no minimum draw
distance for the dense one you know it just matches the previous one. Of
course, the picture here is not to scale. Just just a note. The actual
scale. Just just a note. The actual
radius is 128 m for the dense and 256 for sparse.
Um there's of course drawbacks to this approach. Uh if you zoom very closely uh
approach. Uh if you zoom very closely uh you'll see the uh transition between dense and uh sparse uh because you'll
see the scaling of the assets. And for
transition from sparse to nothing, um I'm recoloring assets to match the color of the ground at the very edge. So it
kind of hides the assets themselves, but I cannot really recolor the shadows from the assets. So you'll see the black
the assets. So you'll see the black spotes u disappear.
But luckily, if you don't zoom that close as on the screenshots, you'll you won't won't see any issues. There is
another one uh related to performance uh that Hugh will uh show later but if you your performance fine then all fine
and now for [sighs] code in the this custom HL cells I showed a bit late a bit earlier uh that's where another
great uh chunk of uh graph is hiding um with these custom HL cells unlike uh surface sampler you don't get um automatically points already generated
for you. You have to find out their
for you. You have to find out their position by yourself. The note comes with the grid function and adding offset to the function is super easy.
Um but also Unreal comes with its source codes open for you which you can browse.
And uh I found the Halton sequence there which in my mind at least looks a bit more dense than um grid offset. Not as
dense as poison like somewhere in between but still very good.
And next um we uh have to figure out which rough category of assets we want to scatter.
Um, in the setup I showed earlier, uh, I had this grass maps generated and, uh, here I'm just sampling these grass maps. Uh, it's essentially a
grass maps. Uh, it's essentially a texture at this point. And, um, I'm figuring out, um, which is the their value essentially and I'm using
this value as a, um, bias in randomization. That mean it means that if you have multiple layers overlapping
um you'll get a still smooth transition between your uh grass uh layers because adjacent um items might get uh different
layers uh just randomly selected and other than that I'm just figuring out what is the ID for the layer and using it as a key to select appropriate
settings and asset um asset array.
And finally we figured out what is the point position. We figured out a rough
point position. We figured out a rough category of assets we want to scatter on this point. Now we just have to select a
this point. Now we just have to select a uh an asset from this category. That's
what happens here.
Um the way I'm doing it I have a uh noise flowing through world calculated just like you would do in any shader.
And this noise doesn't have any particular meaning to it. It's just for artists to have this asset grouping in the UI. As you can see, they have a
the UI. As you can see, they have a range where they can decide on um for each asset on what range it should live.
And this creates this more natural groupings um of assets as you see this blue boxes for example.
Um so in in essence what happens is that there are kind like a fertility map. You
might imagine it looks a bit more natural than just randomly selecting assets.
Um so you can imagine that all of this um setup was pretty u tricky on the early development of PCG.
Luckily, I had Hugh helping me and whole PCG team. So, let's hear from him what
PCG team. So, let's hear from him what uh other uh technical things there are.
>> Thank you, Max.
[applause] >> Yeah. So as we started building the
>> Yeah. So as we started building the grass graph and putting PCG through its paces, we identified a few principles for authoring high performance graphs that we want to share today. Firstly, we
use hierarchical generation to reduce the frequency that work is done as Maxim illustrated. Secondly, we use GPU nodes
illustrated. Secondly, we use GPU nodes where possible which are dispatched using compute shaders and run efficiently on GPU hardware. Thirdly, we
minimize CPU toGPU data transfers which are relatively expensive and we'll we'll examine this in more detail using a simple example graph which
scatters assets on a landscape and uses a custom HLSL node.
Inspecting the graph, we notice a small yellow up and down arrows on the highlighted nodes which are indicating GPU uploads and readbacks respectively and thus highlighting potential
performance issues. We'll look at each
performance issues. We'll look at each in turn. Starting with the static mesh
in turn. Starting with the static mesh spawner. This is currently set to
spawner. This is currently set to execute on CPU and will trigger an expensive download of the point data after the custom HLSL node executes.
For this, the fix is simple. We have a GPU backend implemented for the static mesh spawner and it can be enabled using the node setting below. The two GPU nodes will now execute together and pass
the data directly without involving the CPU. We are continuously working to
CPU. We are continuously working to expand GPU support across our built-in nodes. So, it's worth period
nodes. So, it's worth period periodically checking each release.
Now, we turn our attention to the highlighted surface sampler node which is running on the CPU before doing an expensive upload of data to the GPU. We
don't yet have a GPU backend for the surface sampler. So the solution instead
surface sampler. So the solution instead is to implement the sampling logic in a custom HLSL node such as a point generator similar to the approach
covered by Maxim in the previous slides.
Notice we now have an upload on the get landscape data node which is uploading a patch of landscape heights within the generation area to the GPU. There are
options to remove this upload which we'll cover in the later slides.
Next, let's look at the get texture data node, which is doing a readback of the texture data to the CPU so that it's available for other CPU nodes to use.
In this graph, we don't require the texture data on the CPU. So, we can disable the readback using the skip readback to CPU node setting. This
setting also applies to other nodes that output textures such as the generate grass maps node.
That leaves a final upload which occurs after this attribute is created.
Obtaining attributes from the CPU is common for example for graph parameters which are CPU by nature. Despite being a small amount of data, this upload will
have a significant and measurable cost.
A good solution for this which Maxim also touched on is to enable hierarchical generation in the graph settings and perform this upload once at the unbounded grid level either using an
empty attribute set processor node which has been added for 5.7 or by using a custom kernel type to build the attribute set. Note that the uploaded
attribute set. Note that the uploaded data flows down the high grids without repeated readbacks and re-uploads. The
data is uploaded to the GPU just once and is cached there while the component is generated.
In a similar vein, GPU data can also flow into and out of subgraphs without repeated readbacks. So these are also
repeated readbacks. So these are also safe to use with all subgraph types supported.
Either way, the final graph with all the optimizations we've just applied will execute much more efficiently than the one we started with.
So revisiting the get landscape data node. Firstly, when running on GPU, we
node. Firstly, when running on GPU, we recommend enabling get height only and disabling get layer weights, which can save some setup time and resources on the CPU. The landscape layer weights are
the CPU. The landscape layer weights are not directly accessible on the GPU and should be sampled via grass maps instead. As Maxim showed, even with
instead. As Maxim showed, even with these settings, there is a game thread cost to uploading landscape heights to the GPU, which is around half a
millisecond on base PS5 to generate a 32 m grid cell.
As an alternative, if a height runtime virtual texture is configured on the landscape components, the system can use this to obtain heights directly on the GPU. This is efficient, but comes with
GPU. This is efficient, but comes with no guarantees of res residency within the virtual texture. So, it's possible to sample the height before the virtual
texture is sufficiently populated. To
address this, we have a preload path that sends load requests to the virtual texture system. And we used an initial
texture system. And we used an initial warm-up time delay to give the virtual textures time to populate, which got us over the line for the demo, but we aim to have a more robust solution for
sampling virtual textures in the future.
In the meantime, we have this new hot off the press node that just landed in the 5.7 branch that can generate a height texture using the same underlying
path as the grass map generation. It
generates on the GPU so it has low CPU overhead. It does not depend on virtual
overhead. It does not depend on virtual texture streaming and it comes with low additional cost over just generating the grass maps. It currently generates a
grass maps. It currently generates a high texture only. So normals have to be computed manually.
So now let's move on to CPU performance for GPU PCG graphs. This is
predominantly render setup and resource management.
While shipping the demo, we reduce the average game thread time spent in PCG for this type of graph by almost 50%. We
are focusing on this again for 5.7 and have so far reduced this a further 10%.
The graph below shows a capture of the game thread time used by the PCG subsystem over a playthrough of the demo on base PS5. The CPU processes task for
pending cells up to a prescribed per frame processing budget. And we set a base budget of 2 milliseconds. The busy
parts of the graph, especially around the middle, are heavy sections of the demo where the camera is moving fast and triggering many cell generations.
The 2 millisecond budget is relatively large, especially at 60fps, and we originally planned to reduce this, but there turned out to be ample CPU time available for most of the demo,
albeit with a couple of exceptions we'll discuss in later slides. So we left it at this value to maximize the throughput of cell generation and therefore minimize something that we call
generation latency which is a time from when a cell is scheduled for generation to when generation is complete. If this time is too large for a given camera speed the
generation may lag behind and the instances will visibly pop in.
This effect is illustrated further in this diagram. The camera at the center
this diagram. The camera at the center is moving forwards and the highlighted grid cell starts generating when it overlaps the generation radius in red.
If the generation takes too long, the grid cell may overlap the primitive call distance in white before generation while generation is in progress and the
instances will visibly pop in when it completes. So there's a balancing act
completes. So there's a balancing act between the camera speed, generation radius, primitive call distance, and the in and uh the processing budget being
the key factors. Also, a test or shipping build will generate substantially faster than a development build. So as you can imagine, it's a bit
build. So as you can imagine, it's a bit of a handful.
But to help with balancing these, we added in world debug draw to visualize the progress of the generation. The red
wireframe sphere is the generation radius and wireframe boxes with grid coordinate labels are drawn around grid cells while they are generating.
This is a useful visualization to understand how the generation is progressing during playthroughs.
We also often took video captures and step through the videos frame by frame to correlate PCG generation with any visible pops in the build.
So revisiting the processing budget, we have the ability to set the budget at runtime. So while the default budget was
runtime. So while the default budget was left at 2 milliseconds, we dynamically change this budget in two key places.
We reduce the budget to a minimal value when entering the village using a trigger volume. Here the PCG system is
trigger volume. Here the PCG system is dispatching one set of operations per frame on game thread and worker threads before yielding. Setting the budget this
before yielding. Setting the budget this low increased generation latency significantly, but it was okay here due to the slower camera motion.
The other key section was the marketplace sequence during which we paused the system outright to minimize costs on the game thread. The camera
motion is relatively contained in this sequence and there were no issues such as the range of the grass being visibly limited or other.
Now to talk numbers, the game thread time spent in the PCG subsystem was 180 microsc average over the whole run. Of
this 130 microsconds was fixed overhead arising from the runtime generation scanning grid cells for generation and clean up. We aim to drive this cost down
clean up. We aim to drive this cost down by our continuing push on optimization.
Now moving on to GPU performance. The
graph below shows the GPU time spent executing PCG work over a playthrough of the demo. The work is executed
the demo. The work is executed intermittently when cells generate. Note
that this is unfiltered data with measurement noise and we verified carefully that the spikes here are not indicating a larger PCG workload on those frames.
The average cost over the whole run was minimal but more relevant is the cost average for frames when PCG work was done and this was around 71 microsconds.
The instance counts per tiles are around 10 to 20,000 and the generation kernels did not come close to saturating the base PS5 GPU. And also our theoretical
occupancy for our kernels was almost 90%. So we posit that in the future this
90%. So we posit that in the future this workload could be off moved off the GPU critical path entirely and moved on to async compute.
For the memory side of things, the demo was well below the overall memory budget. and detailed memory analysis did
budget. and detailed memory analysis did not bubble up the priority list for 5.6.
In particular, the PC the PCG cache was left enabled for the demo which would have accumulated some data over time.
This has however become a big focus for 5.7. We now have the vast majority of
5.7. We now have the vast majority of PCG allocations tagged so that they're attributed properly in memory traces which can generate
detailed reports like the one shown. The
core memory usage of PCG excluding the data generated typically takes low numbers of megabytes in our test graphs.
In the next section, I want to mention some tools that aid in using PCG in production.
Firstly, some highle tools relevant to all disciplines. These two C bars pause
all disciplines. These two C bars pause and unpause the runtime generation and trigger a refresh of nearby cells. So
very useful for just general sanity checks and initial debugging.
In 5.7 we added an inbuilt debug overlay that gives some insight into what the system is doing on each frame.
And this car enables a debug draw showing the generation radius and currently generating cells that we saw previously.
Finally, for the virtual texture usage, we have C bars to enable and disable the virtual texture path and to configure the warm-up time for HLSL authors. Besides the usual
debug and inspection features you're used to in PCG working also for GPU nodes, we also have a shader debug print which will expose a function that can write an array of values which will be
printed to the log after execution.
And in 5.7, we've added a debug feature that randomly initializes GPU memory buffers before execution, which helps to test that all data is properly written
and also to reproduce undefined behavior bugs, which can be hard to pin down, especially when uninitialized data is often zero by chance. Any data
initialization issues or logic errors in kernels can seem to run fine on one device but crash on others. So this tool is important for production stability and we recommend running with these
regularly.
Now for engineers or those familiar with debugging tools, starting with the CPU side, we've improved the break and debugger node setting so that also works for GPU nodes and it triggers a break
point when the compute graph containing the GPU node executes.
We also now scope this to the currently inspected grid cell to avoid a flood of break points.
And for those set up for GPU debugging, we have added the trigger render capture feature in 5.7 which will do a scoped capture of the dispatch for that kernel.
And now for performance owners and optimization, we have additional C bars which I'll go through very quickly, but check the help text and documentation for more information.
We have C bars for the frame budgets. We
have a C bar that throttles the number of generating components which limits how many are allowed to generate simultaneously.
And we have a C bar to configure a pool used to recycle partition actors.
And related to runtime memory, PCG has a cache where the outputs of CPU nodes are stored to save re-evaluation which can save compute time but increase memory.
The cache can be configured using DC bars and we have a postgeneration GC trigger which can be set to trigger an incremental or full garbage collection after generation.
Finally, for GPU profiling, this can be a challenge at the moment due to the PCG work being intermittent rather than running every frame. Sometimes this
means refreshing generation and trying to trigger a capture on the right frame, which is not a great workflow. To
streamline this a bit in 5.7, we added a feature to repeatedly dispatch a kernel instead of dispatching it only once so that a trace can be captured reliably.
We hope to improve this situation going forwards and I'll revisit this in a future work section but I wanted to leave these steps here so they can be referred to later when the talk is uploaded.
So that brings us to the end of the tool section and to the final forward-looking part of the presentation starting with current challenges and improvement directions.
Firstly, Maxim illustrated the double ring setup and some of the limitations of this approach. It would be valuable to find a new approach or some combination of existing approaches that
can efficiently generate a long range appearance that is faithful to the near field instances.
As a side note, nano folage was enabled on the grass instances in the demo and it helped greatly, but individual instances ultimately do not scale to the far distance.
Secondly, virtual textures are a powerful source of data and it would be useful to have a way to query virtual texture residency and/or more gracefully deal with partial
residency in runtime generation use cases.
And finally, as we discussed earlier, generation latency can be a trick tricky topic. In the demo, we benefited from
topic. In the demo, we benefited from the camera motion being one long continuous path, whereas teleport will exacerbate this issues.
We have a PCG gen source component which triggers component which triggers generation at its location in the world and is designed to address this but we haven't yet used it extensively in
production. We anticipate continued
production. We anticipate continued effort and R&D going into minimizing the impact of generation latency.
And now some additional future work directions. As mentioned previously, we
directions. As mentioned previously, we will continue to push on optimization including driving processing costs down on the CPU side and also explore async
compute for GPU work.
One major component of the CPU cost is setting up actors and actor components.
We will ship experimental support for spawning primitives via fast geo components in 5.7 and we are aiming to build on that to add a mode that doesn't use actors for the generation grid at
all.
Besides that, we will continue to push on expanding functionality.
In 5.7, we have a first version of support for override pins, including the point count on the point generator node, and we'll extend this over time.
We are also exposing the basic compute types and classes for 5.7 to allow you to write your own GPU node backends.
And we will also add support for GPU generation of instance skin measures.
Next, many of the supporting tools are in an early state for GPU and we look forward to improving them over time. The
profiling window in particular needs a big update to reflect GPU memory usage and GPU execution costs and that's high on our list. And finally, we're excited
for the PCG framework to transition to stable production ready state for 5.7.
That means we are committed to supporting our current feature set, not breaking existing projects, and we're going full steam ahead with PCG development.
Within that, the PCG GPU feature is transitioning from experimental to beta as our focus shifts to stability, platform support, performance, as well
as expanding GPU support to more nodes as mentioned.
And before before we wrap, we both wanted to give a huge shout out to just a subset of the many people that contributed to this work.
And with that, I'll thank you very much and invite questions.
[applause] [applause]
Loading video analysis...