3D Gaussian Splatting for Realistic Physical AI Simulations

By NVIDIA Omniverse

Summary

## Key takeaways - **Balloons Explain Gaussians**: Imagine painting a picture using balloons attached to nails on a board, with different shapes and colors; enough balloons create a scene view, but many require runtime and memory, and optimization adjusts for multiple perspectives. [02:03], [03:29] - **3D Gaussian Ray Tracing Advances**: 3D Gaussian ray tracing introduced ray tracing to Gaussians, enabling secondary lighting, nonlinear fisheye cameras, and rolling shutter compensation, though it impacted training and rendering time. [04:21], [04:44] - **3DGU Handles Distorted Cameras**: 3DGU uses a hybrid approach with splatting and sigma points from the unscented Kalman filter to approximate Gaussian distributions, supporting nonlinear camera models and rolling shutter for robotic platforms. [04:37], [05:26] - **Neurik Workflow from Data to Sim**: Start with high-quality sensor data via Encore, reconstruct with 3D Gaussian methods and Fixer refinement, harvest assets, store as USD, then loop novel views for robotic simulation via Omniverse or gRPC. [08:35], [10:16] - **Generative AI Fills Observation Gaps**: Gaussian splatting struggles with novel views like shifting one lane unobserved, but Fixer, the Cosmos World Foundation model trained on millions of drives, enhances reconstructions by filling holes. [10:27], [11:11] - **USD Stores Evolving Gaussians**: NVIDIA uses USD to store Gaussian data as no industry standard exists yet; extensibility is key to avoid slowing rapid innovation while enabling industrial use like autonomous taxis. [12:19], [13:14]

Topics Covered

Gaussian Splatting Mimics Balloon Optimization
3D Gaussian Ray-Tracing Handles Fisheye Distortions
Omniverse NeuRik Loops Real Worlds into Sims
Generative AI Fills Gaussian View Gaps

Full Transcript

So for our next session, we've got an exciting session on 3D gajian splatting as we explore how this transformative technique for highfidelity scene reconstruction is used in physical AI.

Please welcome Nick Schneider, engineering manager of neuro rendering at NVIDIA to the stage.

[Applause] Yeah, thank you for the introduction.

Um, I'm Nick. I'm leading a team at NVIDIA that brings all of the great research uh that Nvidia has on neural reconstruction um into Omniverse Neuro.

And today uh I'm talking about how gions have within just two years uh become a key technology uh in enabling physical AI by providing highly realistic

simulations uh for everyone.

Gosh splatting um is a me method that belongs to the family of neural reconstruction technologies and as an input neural reconstruction gets sparse observations from a scene which are then

used to train a neural reconstruction model. Um we can then query this model

model. Um we can then query this model for novel views which allows us in this example to run the 360 degree views of that scene for robotic applications.

This is a key feature actually um as we can now simulate um sorry uh as we can now simulate uh novel camera paths uh but it also allows us to test different

sensors um or other um viewpoint angles from the cameras to understand um how we can train such a model we first need to understand how gshian work and for that

I would like to share a small story from a recent weekend I had with my family so I was sitting in the garden with them And they were asking me that uh what are

you actually doing for work? And I said that's tough to explain. Um but I tried to do it in a very simple way. So let's

imagine you would paint a picture of where you're sitting right now. But

instead of doing that with a pen uh and a canvas, you draw the picture using balloons. So basically you would attach

balloons. So basically you would attach a set of balloons at some nails that are placed on a wooden board. Um and you would use different shapes and colors for the balloons. So if you put up

enough balloons, you would end up with a nice picture uh which could roughly represent the grass and sky as seen from uh your perspective. And my son was actually very eager to try that

directly. Um so he was putting up the

directly. Um so he was putting up the balloons and after a while he was like, "Wow, dad, that's exhausting actually."

Um and I said, "Yeah, right. So um

congratulations. you just figured out that putting up a lot of balloons takes a lot of runtime. Um, so we looked at the four balloons and now he said, "Hm, but that's still pretty far from being a

good picture." Um, and I said, "Yeah,

good picture." Um, and I said, "Yeah, that's another important point, right?

For good reconstruction, we need a lot of balloons." Um, so yeah, we kind of

of balloons." Um, so yeah, we kind of kept doing that and at one point, um, he was, "Oh, we are running out of nailstead." And I said, "Yeah, that's

nailstead." And I said, "Yeah, that's another important, uh, aspect. you just

realize what it is like running out of uh GPU memory.

So we looked at our balloons and my son was very happy with it. Um but I said to him, well this looks great now from your perspective, but my perspective looks completely different and where you just

put your blue balloon for example in my perspective there's the house. So I

would also go and try to rearrange the balloons together um with him to optimize the set of balloons for both his and my view. uh and this I said is what we call optimization.

So in a very simplistic way uh this is what gshian splatting does. Um instead

of balloons we're trying to optimize the positions scale color and rotation of gions to then them uh to then project them later onto the rendered image right

um we use the actual image to calculate then a loss which is back propagated uh to the gions and the original paper was actually introduced here at in 2023.

So how has the world changed since then?

Two key papers from Nvidia that enabled gion splatting uh for a broader range of applications are 3D gion rate tracing and 3D gion unscented transform. 3D

gshian rate tracing uh introduced rate tracing to gions and with that enabled secondary lighting um nonlinear camera models such as fisheye camera um and also compensated rolling shutter

effects. However, training um and

effects. However, training um and rendering time was quite impacted by that. So uh 3DQ introduced a hybrid

that. So uh 3DQ introduced a hybrid approach where splatting is used to determine the rays that are used for tracing. This was done via sigma points

tracing. This was done via sigma points as you might know them from the unscented colman filter. Uh hence also the name and these sigma points are cleverly selected to approximate the actual distribution of the gshion. This

now allows you to use nonlinear camera models and consider rolling shutter uh in a planning process.

In comparison, 3D GU uh achieves slightly above state-of-the-art results for the reconstruction quality and rendering in comparison to the original paper and it's only slightly slower uh

in reconstruction time.

But the key benefit um as I mentioned earlier is that this enables now support for distorted nonlinear camera models.

And this is actually a key enabler for many robotic platforms um that come with a variety of cameras and different lenses such as fisheye cameras.

In this comparison video, you can see that gion splatting um in the middle has a higher fidelity than the nerve reconstructions on the left. The

difference between the standard dion splatting implementation um and 3D gout becomes more apparent when the vehicle takes a right turn. Here you can now see

um that the original implementation does not uh consider the rolling shutter effect of the recorded camera causing really blurred reconstructions. Here

the code of both VDGRT and 3D uh is made publicly available on GitHub. um please

check it out and reconstruct your own real world scene. Uh this allows you really to bring everything into simulation. So you can imagine scanning

simulation. So you can imagine scanning your living room or a neighborhood street um and then put that later into Omniverse and Isax simut allows us to bring the real world into

simulation and with that provide a playground for all kinds of robotic applications. With a set of sparse

applications. With a set of sparse images and a robotics model, you can now place your robot into a virtual training gym where it can train safely how to

interact uh with the environment.

And with the just recently announced Omniverse Nurk, we're bringing this capability now to everyone. So Omniverse

New is a set of APIs and libraries to generate interactive 3D simulation from real world data. It includes

reconstruction, rendering and generative enhancement uh for highfidelity real-time simulation environments.

For reconstruction, URI leverages gshian based methods which reconstruct scenes into millions of particles uh that can be moved and manipulated and this significantly actually speeds up

reconstruction times compared to traditional nerve-based methods.

For rendering, we add rate tracing um to improve the fidelity of the reconstruction using our 3D rendering algorithm. And finally, we have

algorithm. And finally, we have generative models such as fixer, I will talk about that later, um that fill in reconstruction artifacts for highquality novel view synthesis.

Murich is actually built on a top of on top of a variety of NVIDIA models and uh accelerated libraries. We leverage for

accelerated libraries. We leverage for example the latest Cosmos World Foundation models to enable generative AI for gion splatting. Um we'll see some more examples of this later in the talk.

On top of the models we also use Nvidia accelerated libraries um to really squeeze the last bit of performance out of the GPU.

So how does a typical workflow with uh Neurick look like? Um at the very core is the sensor data of the robotic platform. You have to imagine like this

platform. You have to imagine like this can be any autonomous vehicle platform.

It can be a robot or autonomous vehicle.

Um and the very first step is to make sure that the input data um into the reconstruction pipeline is really good.

So we have encore at the very first stage. Um it comes with a set of

stage. Um it comes with a set of libraries and tools that make sure the data quality is right at the input. And

this is really one of the key steps here because the quality of your reconstruction can only be as good as the quality of your input data.

The next step is tuning the real world data into a highfidelity reconstruction.

And at the core of this is as I mentioned before 3D. Um but actually extends that with a domain specific enhancement for example automotive applications.

And then at the very top you can see fixer which allows us to refine the reconstructions either directly uh during training or after rendering. So

we can have it as a post-processing step. them. The Neurick asset harvester

step. them. The Neurick asset harvester allows us uh to extract objects as gshion splats from a scene. And this

allows you later to build your own asset libraries with gshions. Um extract

really everything from vehicles to pedestrians um as assets uh and and place them later back into the scenarios.

The neur reconstruction output is stored as USD and from there it can later be loaded for rendering via omniverse or via gRPC service that allow you to communicate directly with any uh

simulation environment and this last part is uh the one that is actually most relevant for your simulation. Right? So

you can now ask Nurick to render a novel view. Um use that as an input into your

view. Um use that as an input into your robot robotic application, react based on those inputs. Uh generate a novel path. Uh ask Nurick again to create the

path. Uh ask Nurick again to create the next frame. Um and then this continues

next frame. Um and then this continues in a loop.

Gaution splatting is limited by the observation we provide during training.

Especially generalize generalizing to novel views um such as in this example rendering a view from a completely new position um is very tough and results in ad effects which can affect the robotic

robotic applications. So here in this

robotic applications. So here in this example we for example shifted uh from the original recording by one lane and this shift is actually has never been observed in the data. So it's very tough

actually to then reconstruct that scene when you have never seen it before in your input data. Um so neurixer is a model that enhances the reconstructions and with that it allows us to create

much cleaner renderings. Um the

basopixer is the cosmos world foundation model uh which leverages knowledge from millions of physical grounded drives. Um

so we can then fill those holes in the observation and enhance the reconstruction fidelity significantly.

Another new extension that leverages generative AI is at asset harvester. I

really do love that video. Um might make you a bit mad, but um so as it harvested generates novel views of objects using a diffusion model and then uses a

transformer to estimate gions from that.

And with that we can now build uh large libraries of different vehicle assets for driving applications which can later be used to generate novel scenarios by adding and replacing vehicles or

changing their trajectories. Looking at

the video you might see like we have only observed some of those vehicle from one side only right um but the generative AI allows us really to take also the perspective from the other

side. Um and that allows us to have a

side. Um and that allows us to have a complete view of that vehicle and that is so important. If you go again like want to render novel views, we have fix it to help that but we also have asset

harvester that allows us to really complete the shapes of the assets.

Nvidia uses USD as a way to store the gshion data. Currently there is no

gshion data. Currently there is no industry standard uh for gions and the field is still evolving very rapidly um which makes standardization really challenging. Um when talking about a

challenging. Um when talking about a standard it is really important to consider that if it is too rigid um innovation might become very slow and there's still a lot of innovation going on I think especially looking at this

conference right now we've seen so many gian splatting papers some use MLPS in between right it's a very complex and very quickly emerging field where we have to really be careful if we define a

standard not to slow that down if it is too loose however um there's really no point in having a standard so uh at At this point, we really believe that extensibility um is the key to a good

standard and we're looking forward to working with the community uh to define a suitable standard there.

Why are we already talking about standardization then? Um well, the fact

standardization then? Um well, the fact is that many industrial partners are already using gion splatting and in their applications today. So, we've seen autonomous taxis in San Francisco,

Phoenix, Austin, LA, uh and other cities right?

But scaling autonomous driving safely to the whole world um requires to be able to navigate safely in basically any scenario and realistic simulation um is

a key enabler for AV to test millions of scenarios before going out into real traffic. The videos you see here are

traffic. The videos you see here are from our internal AV simulator which is already generating and rendering a large set of new reconstructions uh daily.

A major milestone I would also like to announce here is that we have just enabled omniverse neur rendering in kala. Um this allows any developer to

kala. Um this allows any developer to test now a a applications in simulation on real world gshing splatting data.

We've reconstructed over 80 uh different driving scenes uh and encourage the community really to engage with those uh and use it for replay based testing.

Similarly to Kala, we also just announced IS65 and it comes integrated with Murric. So using the publicly

with Murric. So using the publicly available 3D repository as I described earlier, you can now record any real world environment and turn it into a simulation where you can place a robot

now and let it interact with the environment. You could also see in the

environment. You could also see in the videos that it is possible to um let me replay that. Um

replay that. Um yeah, you can see in the video also here in the middle that it's not also possible to place objects into simulation. Um which allows the robots

simulation. Um which allows the robots to interact and um make sure not to like to avoid those objects in simulation.

And on the very right you can also see um a very new work actually where we interact with gions and we can manipulate those.

Yeah. So just recently we have announced further research on gions such as genai relighting of assets. Uh this allows you to replay a real world drive at dusk,

dawn or night. Um and by adding physics uh materials to gshian we enable robots to manipulate gshian and thus interact with uh with the objects.

Further we have work that really enables large scale uh scenic constructions which allows you to rebuild complete cities right so um you really um when I talked about new earlier we we're

talking about AV applications that typically are much smaller scale um but with this new research we can really scale out and and rebuild whole environments with that

and finally um yeah we also have uh a Cosmos model that allows you to generate basically any environment um from text and represented as gions and that allows

you to uh place robots into that scene and interact with that environment.

We're really looking forward to enable uh those within your to enable further physical AI development in the future.

Yeah. So finally, I really want to encourage everyone of you to test this out. Go to the GitHub page of BDTR.

out. Go to the GitHub page of BDTR.

Bring your own living room or your neighborhood road into simulation. uh

and then test it together with fixer incala or is sim all those tools are publicly available and I'm really looking forward uh to more exciting research in this uh direction. Thank you

all.

We do have time for questions. So please

raise your hand and we will pass a mic to you if you have a question.

>> Oh, >> have you ever tried to >> Oh, hold on one second. There's a mic right next to you. Mic is so loud. I

really don't need it.

>> Have you ever tried to transform a volutric data set from like a like a medical data set like a stacked set of of clinical data into a gauian splat?

Has that has that been tried before?

>> I personally haven't done that. Um but

it's definitely possible. So what I'm presenting right now is really the goal is to bring the real world um into simulation. So you can imagine you could

simulation. So you can imagine you could just use um like a cell phone camera um and just go into a medical environment and really um capture that and transform

it into gions and that lets you then interact with that >> because like the current state-of-the-art like what people currently do with with with medical if you want to have like a volume stack let's say you want to look at an organ

you have a stack of data and then you can extract you know a kidney or something like that and it's typically aliased but I'm just curious But it sounds like it's like mostly camera

captured, but it'd be interesting to have volume data transformed.

>> Yeah, that's a good point. Actually,

it's interesting research uh direction that I have not yet uh dealt with so far.

>> Let's talk.

>> It sounds amazing. Yeah, cool.

>> Thank you. Um, on one of the slides you showed the USD to a new recre um, sort of connection. Is that through a file

of connection. Is that through a file format plug-in of some sorts? And if so, is that available with Omniverse already?

>> Yes, you can look it up in an Omniverse.

Um, we describe the format. Um, and you you can see how it looks like.

Currently, as you can see here in this image, um there's an Nvidia logo, which means this is a custom format that we have on top of uh USD. Um which doesn't

make it possible to actually interact with other simulations right now. Um and

this is exactly where we need to work together to have a standard there.

>> Thank you.

you get a lot of depth data from this and you're talking about processing the visual data and cleaning it up from another point of view.

Um I'm just curious how the depth works into the training and um cleaning sort of the actual sort of geometry up maybe.

>> Yeah, that's a good point actually. So

we use geometry uh we use 3D data if we if it's available. So we have the possibility to also use lighter data uh as an input into the reconstruction. Um

and that does help actually for novel views because you have a geometric prior that you can then use to reconstruct the scenes. Um we also use uh regularization

scenes. Um we also use uh regularization techniques to uh for example because we know it's a ground right we know from segmentations that this is a ground it has to be flat still it is very

challenging because the model is not constrained to do anything like enough uh in that respect so going to novel views is really one of the most challenging things and um I do think there's still a lot of research

possibility ahead for that um and we can use generative models for that right we don't need to just use the data that we have from that on drive to reconstruct a scene. We have knowledge how uh driving

scene. We have knowledge how uh driving scenes look like from millions of data that we collected already. So why not leverage that and this is where the generative models come in to help us uh

actually make that better.

>> Hi there. Uh I just wanted to understand like when you spoke about robotics and its applications since the gshian splat is kind of like a volutric scenario and

uh when you have like tessillated triangles everywhere like how is that like replicated in terms of like you know interactions with the robot and the environment.

>> So your question is uh if I have like floating gions uh in my simulation how we interact with those?

>> Yes. What are the control points then?

Like what is the surface? There is no surface there. It is just bunch of gas.

surface there. It is just bunch of gas.

>> What are you interacting with?

>> Right.

>> Yeah. So actually we we uh typically do this with a combination of meshes um and gshion data. So the mesh is actually to

gshion data. So the mesh is actually to constrain like where the robot can drive but the uh gshian give you the visual input um that lets you control like based on what you see, right? Um so it

typically comes with a combination of a mesh. Yeah.

mesh. Yeah.

>> Awesome. Thank you, Nick, for diving into the latest in 3D gosh and splatting in physical AI.

>> Thank you.

Loading...

Loading video analysis...