LongCut logo

Multi-Control with Cosmos Transfer | Robotics Office Hours

By NVIDIA Omniverse

Summary

Topics Covered

  • Cosmos Transfer Boosts Robot Generalization 80%
  • Control Weights Normalize Above Sum of One
  • Edge Preserves Structure for Texture Changes
  • Prompts Describe Desired End Result Directly

Full Transcript

[music] Welcome everybody to the office hours for robotics. We are super excited to

for robotics. We are super excited to have you here. I'm also super excited to introduce a couple of my fellow colleagues from Nvidia. We're going to be going over multicontrol with Cosmos transfer. Super exciting top topic.

transfer. Super exciting top topic.

We're going to kind of go over a nice overview of everything so you get an idea of what we're going to be covering.

We're going to cover the Cosmos cookbook. We're going to cover brev.

cookbook. We're going to cover brev.

We're going to show you everything you need to get started. We of course welcome everybody's questions and comments throughout this live stream. So

don't delay. Say hello. Let us know where you're watching from. If you're

doing any cool Cosmos projects or you're leveraging any recipes, absolutely let us know in the chat. Um we would love to find uh those stories to help highlight them on social media. So, uh, if you forget, just hit me up on LinkedIn

afterwards, and I'll be glad to help you, uh, with any any any questions you have for leveraging these recipes. Okay,

without further ado, let me bring in my great colleagues here. We'll start with Aiden since you're right next to me.

Hey, Aiden, how you doing?

>> Great. How about you?

>> Great. I'm so excited that you're going to be covering this topic today. Uh, we

actually put a lot of great announcements almost every day of today on Cosmos and the recipes. So, this is super timely. What is your role at

super timely. What is your role at NVIDIA and what specifically will you be covering today?

Yeah. So, I'm a robotic solutions architect at NVIDIA. Uh, I mostly work with Cosmos and I'm going to be covering a little bit on how some basic workflows about Cosmos and how people kind of use

our model uh further downstream specific cases.

>> Okay. Awesome. That's super helpful. All

right. And I see your your colleague Aul here. How you doing?

here. How you doing?

>> Hey, I'm doing good.

>> Thanks for joining us. So, okay. you're

going to actually help kick things off, but before you do, why don't you give us a little bit background about what your role as NVIDIA and what specifically you'll be covering today.

>> Sure. I'm Akul. Uh I'm also a robotic solutions architect and I'll be going over like an overview of Cosmos as well as going over uh the different

generation parameters uh for Cosmos transfer.

>> Okay, amazing. Um let me know when you're ready to share your screen and I'll put that up for everybody to see.

Uh while we're waiting, just so everybody knows, Cosmos cookbook is a resource for the physical AI community to share examples of Cosmos applications for robotics, autonomous systems. We're going to get a nice overview right now

from Mool. So, I'm going to sit back and

from Mool. So, I'm going to sit back and enjoy the show myself. Um and again, feel free to ask your questions uh throughout. We will either pause and

throughout. We will either pause and take a question or we'll come back at different points during live stream. So,

if we don't answer you right away, just hang out and we'll cover the news at the end of this live stream, too. Okay,

cool. I think we are seeing seeing your screen share now. Yeah, I'm ready for it.

>> Go for it.

>> Perfect. Yeah. So, uh I will start by giving a quick introduction to Cosmos.

So, uh Cosmos is our world foundation model development platform. There's like

three parts to it. There is the world foundation models. Predict, transfer and

foundation models. Predict, transfer and reason. Predict and transfer are world

reason. Predict and transfer are world generation models while reason is a VLM with chain of thought reasoning. The

next part is the post- training scripts and the inference scripts uh for all the three models as well as the data search search and cosmos curate. We also have

models for guardrails and watermarking.

So Cosmos predict is a world foundation model uh which can generate future states as a video. So the idea here is that you either use your text prompt or

an image and your text prompt uh to generate the next future states.

You can generate up to 30 seconds of video video like that. This is uh designed for post training to develop

specialized AI models.

So Cosmos transfer doesn't Cosmos transfer is used to uh augment or create variations in existing videos. So unlike

predict this doesn't generate new videos but instead it takes an existing video and creates variations. We use it using we uh we do that using a control net

architecture. So it you have to provide

architecture. So it you have to provide a different control modalities like west, segmentation, h and depth and you use that to control the generation

process. So in this case the the control

process. So in this case the the control modalities are coming from simulation and uh so if you're doing something like robotics you can generate these control

modalities using cosmos writer uh in ISX.

Yeah, the video is up.

So in the scenario where you're doing realtore, you can still use cosmos transfer to create these variations. So

to uh to compute the control modalities, you can do that using different uh AI models like grounding dyno plus SAM 2 to to generate the SEC modality or depth

anything to create the depth modality or blur and canny edge algorithms to create V and edge and this plus your prompt will allow you to create augmentations

on the real video.

So uh just to kind of understand transfer and how the control net works, I'm just going to explain the general training process for this. So the way

this works is that we condition the input uh we so we condition the input by with the noise video uh the text the different control modalities and the

control weights. We so we train the

control weights. We so we train the control net while keeping the diffusion backbone frozen uh and the diffusion backbone here is predict. So this gives a control modality the ability to

control a certain aspect of generation.

I will tell what those certain aspects [clears throat] are in the later part of this presentation.

So before we go there I just want to take you through a workflow that we did internally here at Nvidia uh using transfer uh where we did synthetic data

generation for policy training.

So uh I'm taking a simple example uh of a long horizon manipulation task where the objective here is to like pick up a bowel with one hand take an apple in the other place the apple in the bowl and

then keep the bowel back on the table and this should be we should be able to do this in varied visual conditions. I'm

training like three policies. So we

collect 100 teleyop data from the real robot and the first policy we train it using only those 100 100 teleop data and without any other augmentations.

The second policy the baseline policy we use the 100 demos plus we add some standard image augmentations like gauium blur or uh jitter and contrast and

brightness changes. The third one is the

brightness changes. The third one is the Cosmos augmented policy where we have the 100 demos plus we generate 5x Cosmos transfer data and every other

hyperparameter is kept identical.

So this is how those Cosmos transfer uh gener generated videos look like.

some more variations with or more dramatic variations.

So then we do a generalization test where we evaluate this on 10 unseen scene configurations with like new bowels, fruits, different backgrounds,

different lighting, different distractor objects on the table or the drawers are open in the background or the tablecloths are different. And as you can see the base policy only has 3%

success means it only worked on the training scene. The baseline policy has

training scene. The baseline policy has 16% success but the Cosmos augmented policy has 80% success. So the main takeaway here is that with Cosmos transfer you are able to get strong

generalization capability.

So here is a video of us uh deploying it on the real robot. And as you can see for all the Cosmos um uh the the the the version where we are using the Cosmos

augmented policy the robot is able to actually perform the task.

So now uh I will go over the different parameters that we have for transfer. So

the way we uh do a generation and transfer is by defining a spec uh which looks something like this. And inside

the spec we have these different parameters uh like prompt, guidance, control weight, the different control modalities and mask input. So I'll go over each of these parameters and

explain how they affect the generation process.

So first is guidance. So guidance is guidance and control weight. So both

these parameters govern strength. So

guidance is literally your prompt strength. So the higher the guidance,

strength. So the higher the guidance, the more transfer will follow the prompt instead of the control modalities. Uh

while if it's lower then it will follow the control modalities instead of the uh prompt. Control weight is specified for

prompt. Control weight is specified for each control modality and it tells you how what the influence of each of these control modalities is. One thing that we

see uh people usually um make a mistake is that the way they define the control weights. So the way it works is that if

weights. So the way it works is that if the sum of all the control weights is less than one then we won't normalize it. So for example if you if seg is 2

it. So for example if you if seg is 2 and edge is 2 then the sum is 04 which is less and because of that we will keep it as is. Now in the scenario where the

sum is greater than one we will rescale it and we will normalize it. So uh seg is 4.0 and edge is 1.0. The sum is five now and then we will normalize it and

make it into 8 and2. The reason I'm saying this is that I have seen people like uh just increase the uh weights for one control modality with the hopes that

that will be uh used lot more but uh as you can see it doesn't work like that.

So the first control modality is edge.

So we use edge to preserve the original structure or the shape of the video. Uh

increasing the control weight for edge will make the generated video follow the same spatial layout of the original video. Uh this is really good for

video. Uh this is really good for changing things like textures and clothing and lighting. So the general idea here is that uh everything within the outline can be changed. So if you

have a outline around the dress, the position of the person and the dress and everything like that will remain while everything inside it can be changed.

That's the reason why this is really good at changing textures and clothing and lighting. So it it'll be something

and lighting. So it it'll be something like this where you are now able to uh generate different lighting conditions and different textures uh in your original video.

The next is V. We use vest to preserve the colors of the original video. Uh we

use it alongside edge and sec to make it more realistic. So from experimentation

more realistic. So from experimentation what we have seen is that uh even if you are creating a large variation in the final video having a little bit of this

or or v with a little control weight uh allows you to get realism. So that kind of moves the simulated look that you sometimes get from transfer to a more

realistic look.

Um and if you have a high vest then uh then it will look more like the original video. So if I were to show you that. So

video. So if I were to show you that. So

in this scenario as you can see uh with a high I'm trying to do a background augmentation here where I'm trying to change the background of the uh background while preserving the subject.

So having a high V will make the elements of the video original video appear while with a low V we are able to now properly get a more realistic background.

So the se control modality uh now with this this is the so all the other control modalities so far are used to preserve things from the original video.

So edge and vis allowed you to preserve the structure and the colors of the original video. With SE we are able to

original video. With SE we are able to now do uh create structural and semantic changes. We use this to remove or

changes. We use this to remove or replace or change objects or people or people or backgrounds. Having a high control weight for SEG will may create hallucinations depending on what you're

trying to do. And we usually use SEG only on the regions where we want to change it.

So the next is depth. So uh depth also is used to create structural and semantic changes while respecting the distance and perspective. Uh we use it

alongside edge and sec. This is mostly used in AV scenarios where the perspective is more is more important.

So the next is mask and reverse masks.

So masks are essentially binary mask. So

they are like black and white videos where the white pixels tell uh tell the areas where the control modalities can be applied while the black pixels uh

don't apply the control modality. We

usually use it with SEG where using a mask now we can tell this is the region of the video where we want the change to happen while we want to preserve the rest of it.

So, next thing is the prompt. Now,

honestly, uh prompt there's like infinite ways to write a prompt, and it's hard to screw this one up, but uh just to give you a quick way of defining

a prompt, a good prompt. So, step one would be to just describe your current video. So, this could be like describing

video. So, this could be like describing your scene with along with the lighting.

example like a brightly lit office space or a classroom with afternoon sunlight coming through the window etc. Then you describe your main action and the actor that's doing it. Like an end person

wearing a blue shirt is waving his hand or a bimmanual robot with a gripper is picking up an apple. And then add any additional details like there's snow on the roads or the camera is panning from

the left to right or if there's like special uh extra thing that you want to add here. This is this would be the

add here. This is this would be the place to do it. Then in the step two in from the in the description that you currently have just do the desired

augmentation that you need to see. So

for example if the in the original video if the young person was wearing a blue shirt and you want to change it into a red t-shirt just change the word from blue to red. Or if you want to change from apple to a different fruit like a

banana just change the apple to a banana. The reason I'm saying this is

banana. The reason I'm saying this is because uh some of the common mistakes that I see people make is that they think transfer is like an LLM and they

talk to it like a chatbot and they use phrases like please maintain the structure of the video and sometimes they describe the augmentation say uh using phrases like change black t-shirt

to blue t-shirt. We should not do this.

uh instead we should just uh provide the end result that you want to see in the prompt directly. And then the next thing

prompt directly. And then the next thing that I see is that sometimes people provide too much details. They use

phrases like uh in the shelf right next to the window there's an apple like essentially they describe what's happening in the scene and try to describe all the different elements in the scene. So instead of doing that use

the scene. So instead of doing that use the prompt for describing or augmentation while use the control modalities to preserve structure.

So next, Aiden will go through uh a notebook explaining all the different things that I just told now and showing some quick recipes to achieve common uh augmentations.

>> Thank you. That was great. Super

helpful. I love those prompt tips at the end. I could see that that would triple

end. I could see that that would triple our developers up. Um okay, uh Aiden, you now have you now have the helm. So

>> awesome. Yeah, thanks for the great introduction, Uncle. So uh I'm quickly

introduction, Uncle. So uh I'm quickly going to kind of go through our Cosmos cookbook. Uh so this is uh we gather

cookbook. Uh so this is uh we gather some recipes of our internal and external work using Cosmos uh in their specific domain. Um so

specific domain. Um so and let me know actually so cool we so we have some good questions and comments. We could go through and take a

comments. We could go through and take a few now that might have be uh you know closer in line to what what the viewers just saw or we can or we can do them at the end after aid you you do your presentation. Uh why don't why don't we

presentation. Uh why don't why don't we answer some of the questions uh since most of these questions are from Nicole.

I'm sure we're gonna have more questions after we talk about the cookbook.

>> All right, sounds great. Okay, so I'm going to look through uh our producer Amelia here is working on hard in the background and she's going through some of these comments and putting some asteris here. Um I think she's also

asteris here. Um I think she's also pasting some nice links. So check your links in the comments uh for some um important information from Amelia, including the Cosmos link. All right, so

one question we got uh this is coming in from LinkedIn. Um, I'll put it on the

from LinkedIn. Um, I'll put it on the screen. What about pottic systems data

screen. What about pottic systems data modeling capabilities of calibrating these systems? Is that I'm not sure if

these systems? Is that I'm not sure if there's enough information for you guys to answer. Uh, we might need some more

to answer. Uh, we might need some more context here. Yeah, I think we need some

context here. Yeah, I think we need some more context. Jesse, do me a favor.

more context. Jesse, do me a favor.

You're on LinkedIn. Add some more details to that so we can try to help and I'll I'll circle back to that. Um,

oh, he did actually say he followed up this involves postural architecture search featured in the recent Jet Neatron paper. Oh, maybe does that help?

Neatron paper. Oh, maybe does that help?

>> I don't know.

Okay. [laughter]

>> Okay. Still a little bit more context, Jesse, but you're getting closer. Um, so

thank you for that. Um, okay. This

question, uh, relates to OpenUSD. So,

uh, is Cosmos based on the OpenUSD system?

>> So, Cosmos can work with simulated video as well as real video. And if you're using our simulation platform Omniverse or any uh like program that we built on

top of it like ISXM or islab then you will be able to define your uh world in open USD and then be able to render out videos from it and then use Cosmos to make it photo realistic.

>> Okay, great. That's super helpful. Um I

see a a couple of comments here from William also on LinkedIn. Thanks for

watching William. The way you're balancing influence across control modalities almost look like it looks like a behavioral rating system for robotics. Curious whether you've

robotics. Curious whether you've explored stability curves as guidance increases similar patterns show up in LLM interaction coherence.

uh I think so uh I think the question generally asks like what's the best way of balancing these control modalities and uh uh at this point we kind of we

have like scripts that we have written that go through different permutations uh but we kind of first like just using the explanations that I give for for the different control modalities we kind of

get close to a good generation and then from there we kind of balance it out by permuting overs uh different like small changes for each control modality and

that's how we get like uh good results.

>> Okay, great. Super helpful. Um okay, let me see. Uh here's a question coming in

me see. Uh here's a question coming in from YouTube. Thanks for watching. Does

from YouTube. Thanks for watching. Does

Cosmos support consuming a video extracted from Ross Bag camera topics or is there a recommended way to stream robotics camera data into the model?

So yeah, Cosmos takes in an MP4 video or any type of uh video in general. Uh it's

not really dependent on ROSS or the mechanism that you take the video out from the robot.

>> Okay, great. Super helpful. And finally,

before we move on to the next presentation, a question from YouTube again. Is there a way to generate all

again. Is there a way to generate all the augmented prompts automatically?

Like given a video, transcribe it and then change the words in that to generate augmented props and use them with transfer.

>> Yes, you can. There is a way to do that.

You can actually use Cosmos Reason uh the VLM to go through your video and to create a caption for it and that can be a base prompt and then use another LLM

to make variations in that uh depending on the augmentation that you want to see.

>> Amazing. Okay, that's great. Super

helpful. And uh before we go on to Aiden, I want to put this on the screen.

So, great job. You're getting some nice comments there from YouTube. Um, okay,

you have now you have the mic. You want

me to share your screen?

>> Yes, that would be great. Thank you.

>> Okay, here we go.

>> Awesome. So, uh, just a little bit about an introduction of Cosmos cookbook. Um,

so you might, you know, you might hear a cool and be like, oh, where do I start?

Where can I use Cosmos? What are some downstream tasks that I can use? Is it

only for AB robotics? Um, so this is a collection of recipes of what people have done internally and externally from Nvidia. Uh we have examples of like how

Nvidia. Uh we have examples of like how to go from sim to real um do AV use cases, robotics use cases and I think most recently we posted a kind of a

recipe of someone augmenting moth which I think is really cool and it's supposed to be really important in um some something in the nature ecosystem that I don't know too much about. Uh so this

will kind of walk you through like um their like input video, how they weighted these control modality generations uh and how they outputed the

the final answer. Um so I'm going to talk about today uh one specific recipe in this case and it's the multicontrol

recipes for cross transfer 2.5. Uh so

Aul and I think u we made this uh page in the Cosmos cookbook and we think um these are one of the most common workflows in the physical AI space of

how people use Cosmos transfer specifically to augment their uh their physical data uh in order to train their models. Um so very quickly um if you

models. Um so very quickly um if you guys want to follow along you can actually launch our brev instance. Uh so

this will automatically kind of set up the environment and allocate the compute for you. Um of course you would need to

for you. Um of course you would need to request some compute. Um but if you press the deploy launchable over here, it should generate um it should spin up

a like server in the back end with a GPU and it'll set up the entire environment with Cosmos transfer and the examples I'm about to walk through. So feel free to follow along. Um, and then I will

share my next screen where I already have this notebook deployed so I can walk everyone through what we've been doing.

>> Very cool. It's so easy. So quick.

>> So quick and so easy. Takes around 15 minutes.

>> Wow. And I think we're going to check to see I think our producer is working on this right now. We're going to see if we can give away any breath credits towards the end of the live stream. So another

reason to hang out. Um, she's working on that. So no promises, but we're working

that. So no promises, but we're working on it.

>> Awesome. Yeah. So, if you guys get some brev credits at the end or you can request some credits by yourself, uh you can go to the page. I think we just linked it in the chat and then you could just press that brev launch button and

uh it should pull up this exact same environment um that I have up over here.

Uh so, the first thing we want to make sure is that we're on the correct environment. Um we've set this

environment. Um we've set this environment up so that it downloaded all the models. It pulled the script. We're

the models. It pulled the script. We're

make sure you're in the correct Python environment. Um and then yeah, we can

environment. Um and then yeah, we can start. So, uh, the first thing you want

start. So, uh, the first thing you want to do is you're going to want to, uh, load in your hugging face token. And for

those of you not familiar with hugging face, uh, hugging face is kind of a open source platform that has host these open source models. And you're very, uh,

source models. And you're very, uh, you're able to download these models quickly and run them on your machine.

Um, I've already pre-populated my hugging face token, so I'm going to skip this step. So, uh, the first few

this step. So, uh, the first few concepts are what Aul talked about earlier, uh, just in text. It talks

about the guidance scale. So what does it mean to have a strong prompt, a weak prompt, and the control weight normalization. So um the the values that

normalization. So um the the values that we're about to enter over here, what it means to have like a 4.0 segmentation weight versus like a 1.0 edge. So um the first thing I'm going to do is I'm going

to talk about the how to generate each control modalities and then I'm going to go into our recipes of how we kind of did different augmentations like lighting changes, color changes, and

background changes. Uh so this is the

background changes. Uh so this is the video we're going to start off with.

It's a person waving. Um this is just an example of like what we might see in something if we want to train like a human robot interaction or trying to do some gesture recognition. Uh so the

first thing is we're going to do edge control and as a cool mentioned uh edge control is there to maintain structure.

Um and cosmos makes it very easy to generate this final video. Uh you can run through this uh this edge control

command over here or you can just let cosmos automatically generate this edge control modality. Uh this is this option

control modality. Uh this is this option is here uh just for people that might have specific use cases where um the edge the cany edge detector might not work very well. Uh for example, if

you're say your background and the subject waving their hand had similar colors uh then it might the canny edge detector might uh make it kind of hard to detect um because it's trying to

figure out the different contrast of colors. So in that case uh you can up

colors. So in that case uh you can up the brightness and the contrast uh in order to generate a little bit more um uh you can generate a little bit more uh

edges in these cases.

So uh the next thing I'm going to go through is segmentation control. So uh

as a cool mentioned segmentation control is when you want to change something uh you want to use segmentation as a control modality to try to modify something or generate new objects in the

scene. So segmentation is a little bit

scene. So segmentation is a little bit trickier since we need to identify the different objects in the background, the different semantics, and we're going to

have to segment it. So we use um SAM 2.

Um I know SAM 3 just dropped, and I think we're going to integrate SAM 3 at some point. Uh but for the purpose of

some point. Uh but for the purpose of this demo, uh I'll just show you a little bit how this works. Uh we use a combination of DOV2 and SAM 2. So what

do v2 does is that when we actually specify some objects like floor, ceiling wall staircase railing bench, light person, it would be able to identify where these objects are in the

scene. And what it does is that we the

scene. And what it does is that we the outputs of DO2 we put into SAM 2 and SAM 2 will be able to kind of track these objects over time. Um, so I will show

you an example of kind of what this looks like. But for now, uh, we'll run

looks like. But for now, uh, we'll run this and then we're going to start our video segmentation and end up with something like this where we have, um,

the entire video split up with like um, with each color representing its own semantics. Uh, so for example, maybe

semantics. Uh, so for example, maybe hand and arm and t-shirt and pants might be it it's it means different things, right? So that's why we kind of see uh

right? So that's why we kind of see uh different colors in this scene over here. Um SAM 2 is open source. You're

here. Um SAM 2 is open source. You're

able to download this and also um it's built into Cosmos Transfer 2.5. So if

you just want to try this out uh using Cosmos Transfer 2.5, it'll automatically kind of um figure out how to propagate Dino and then SAM as long as you specify

these prompts um where you have like objects separated uh as periods over here. So let's see, we'll finish for

here. So let's see, we'll finish for this. We'll wait for this to finish

this. We'll wait for this to finish running. Um we will show you the result.

running. Um we will show you the result.

Shouldn't take too long.

and then we can go into a specific example.

Um, yeah, and then feel free to interrupt me if there's any question.

>> I was gonna say we got a couple questions here we can take while we're waiting. Um,

waiting. Um, >> so Ameilia has posted a link. I'll put

it on screen here so everybody can see it really quick, but it's in the chat uh both on LinkedIn and in YouTube. Thank

you for Amelia for doing that. Um,

here's a question coming in from LinkedIn though. In practice, how do you

LinkedIn though. In practice, how do you decide where to apply seg versus edge when seg improves standardization but also increases hallucination risk?

>> Yeah, so that's a great question. Um, so

as a cool mentioned earlier, edge is mostly used to maintain structure and segus to generate new things. So for

example, one of the use cases that we saw is that um uh maybe for like training the AV model um if you want to change the lighting conditions or if you

want to change like the weather but maintain all the same things like the cars driving in the same direction that's when you want to use edge because you want to maintain exactly what's going on just in a little bit of a

different appearance versus SEG. uh you

would use it for something like if you want to generate humans walking on like the sidewalks or you want to you want to add debris onto the road. So these are this is changing the the structure

fundamentally, right? You're adding new

fundamentally, right? You're adding new things in there. Uh and these are both use cases and augmentations that you need in order to train these models. So

that's kind of the difference between like when you use edge and seg. Um and

then obviously you want to use it as a combination. Uh because if you only use

combination. Uh because if you only use Seg, you would like generate a completely new scene. So you want to maintain some structure while also kind of generating new objects.

Awesome. So our segmentation is done. Uh

when our um when we actually use Cosmos 2.5 to generate these augmentations, it's going to take a bit. So that might be a good place to I can stop and then

answer some more questions as we go.

So um we don't need to generate any viz control. Um this is like the blurring

control. Um this is like the blurring effect that a cool talked about earlier because this is automatically done by cosmos transfer uh when you start using viz control as one of your modalities.

So uh I'm going to talk a little bit about generating the binary mask. Now we

use a similar workflow uh while we use SAM 2. Uh we first select the objects we

SAM 2. Uh we first select the objects we want to mask out. We use SAM 2 to kind of propagate them throughout the video and then we change them into a black and white mask at the end. So I I'll show

you how that happens. So the first thing we want to do is we want to load all of our frames. Um so in videos it's split

our frames. Um so in videos it's split up by like many many frames of images.

Um the yeah and we just want to we just want to get these frames prepared for our model to be able to ingest them. The

next thing is that we want to be able to select our object. Uh you can do this in many ways. You can either use like a

many ways. You can either use like a natural language query to specify what your object is. Uh you can use a box prompt. So that means having like a

prompt. So that means having like a bounding box over the over the specific object. So this would be that would be a

object. So this would be that would be a use case you would have if you have like an object detection model that you have beforehand. It could be like yolo v9. It

beforehand. It could be like yolo v9. It

could be used dino um or your custom object detector. We use a point prompt

object detector. We use a point prompt over here. So at uh the coordinate 620

over here. So at uh the coordinate 620 and 300 uh you can see it as the green star. We specify the object we want to

star. We specify the object we want to be able to track over the time. So in

this case it would be the person waving their hand. So from that you can see on

their hand. So from that you can see on frame zero SAM 2 is automatically able to identify uh what this object is. What

does it mean to have like a human waving their hand. So the next thing is that uh

their hand. So the next thing is that uh we want to propagate we want to track this object throughout time. So this

person is going to be waving their hand right we want to be able to fluidly track their motion of waving their hand.

And that's what it's doing right now.

And you can see as we go down the frames um as like the person is waving their hand, the SAM 2 model is able to track them over time. Uh and then the last

thing is that we want to be able to kind of do all of this together and then generate a a mask in the end. So change

this into a a black and white mask. Um,

and just as a refresher, white means that we're going to apply this control modality generation to those specific pixels and black means we're going to

completely ignore it. Um,

yeah. So, we end up with something like this where we tracked the person waving their hand and the background is all

black. Uh and then we would use um we

black. Uh and then we would use um we would use this if we want to apply our control modalities directly to the subject over here waving their hand. But

we also want to be able to generate the reverse mask. So now you might be

reverse mask. So now you might be wondering, oh why would we need the reverse mask? Um so this is we use the

reverse mask? Um so this is we use the reverse mask in like one of our classic examples of changing the background. So

if you think about it, you're using the segmentation control over here to change things, right? But you want to maintain

things, right? But you want to maintain the the main subject, the person waving their hand. You just want to be able to

their hand. You just want to be able to change the the background over here. And

that's why the white pixels are in the background and then the back black pixels are in the main subject waving the hand. Uh this means that when we

the hand. Uh this means that when we apply our segmentation control, the the background will change but the person will remain the same. Um, so that's a

quick introduction on the control modalities and then next we're going to go on to our hands-on multicontrol recipes. Uh, I think this might be a

recipes. Uh, I think this might be a good time to kind of pause and answer some questions.

>> Okay, this is great. Well, first of all, how about this comment from NVLearner? I

love that nickname.

>> Thank you.

>> Amazing session. So, thank lot of great information here. Um, okay, we've got uh

information here. Um, okay, we've got uh we've got this comment here coming in.

Just more of a comment. So I don't think there's a question here but multicontrol with cosmos transfer is a clear signal.

The error of isolated models is ending.

The future belongs to systems that understand tension transh orchestrate every digital modality in real time. Um

do you guys agree with that statement?

Yes. [laughter]

All right. I won't read the whole thing but thanks for the comment. Really

appreciate you watching and tuning in.

Um okay we've got uh this coming from LinkedIn from Christopher. What are the limits to how many objects you can add to segmentation and mask? Meaning, how

many individual objects can you account for? Is there a disadvantage to adding

for? Is there a disadvantage to adding too many distinct objects to detect?

That's a great question. Yeah, that is a great question. Thanks for asking. Um,

great question. Thanks for asking. Um,

so there's two parts. So, how many objects can you add in the segmentation and how many objects can you add in the mask? So, adding objects in the

mask? So, adding objects in the segmentation would be a segmentation model question. um how fine grain do you

model question. um how fine grain do you want to segment this out. So you can add as many objects as you want in the segmentation. It will just take a longer

segmentation. It will just take a longer time to be able to kind of propagate through this video. Uh because uh how SAM 2 works is that it first identifies where this object is and then it has a

memory bank and then a neural network kind of tracking them throughout propagating through time. If you do this uh for maybe like one object, it should go pretty fast. But I would imagine especially some some of these like fine

grain objects, if you have like a hundred of them and then you have to track all of them throughout time, it will take uh a lot longer. Um for the masking strategy, how many objects can

you mask? So, um you could mask uh I

you mask? So, um you could mask uh I think you could you can mask um the same amount of objects as the as many as you

segment. Uh it really depends on your

segment. Uh it really depends on your use cases of like what are the trade-offs or masking a lot of objects.

Uh for example, like in the human uh interaction video I just showed you where someone is waving their hand, uh there is no reason to be able to mask more than one object, right? because

that's the main subject that you're focusing on. But maybe in some other

focusing on. But maybe in some other manipulation test, uh maybe like when a robot is trying to pick an object and then hand it to a person, you would want to you want to you would want to mask

out um like the robot, the object it picks up, and then the human so that kind of remains similar throughout the video. So you can do that. Uh it just

video. So you can do that. Uh it just takes a little bit longer time. And then

the trade-off would be like is this like why why would you want to mask them out this amount of objects for like your specific use case. So I think the harder portion is to automatically figure out

how how to mask out these objects. Um

like what objects you should be masking out.

Great. Okay. Well, thank you so much for that. Um here's another question we've

that. Um here's another question we've got.

This also blinked in. If the human leaves a video frame and then enters again, the model would detect him again.

>> Yeah, that's a great question. So, uh,

how the model works is that it has a memory bank in the background and it's able to to kind of remember the fact that, uh, the object has left the frame.

So, when it re-enters the frame, because of that memory bank, it's it's still able to to identify that's the same object coming back in.

>> Okay, that's great news. Um, and this is coming in from Van Vanessa on LinkedIn.

Are there any limitations on masking if there's any transparency, glass, plastic sheets etc.?

>> Yeah, so that's a great question. Um,

meaning I'm assuming this more means like is it harder to identify uh where these objects are or in that case? Um, I

don't think so since um I think the training data set had some pretty good examples of using like like glass um and being able to kind of segment all of

these things out. Um so I I don't believe so. If the the question was um

believe so. If the the question was um um is it hard to track a person behind like uh like transparent glass or

something like that? Um then maybe I'm not sure depending on how I guess like transparent Yeah. or sunglasses are a

transparent Yeah. or sunglasses are a little bit harder to see. Um

>> you know what I would love I I would actually love if any users actually are are using this let us know because that's it would be great information for us to to to also be aware of. So if you got interesting use cases that you're

trying out or noticing things that work really great or unexpected uh Amelia is going to post a link in the chat for our live stream discussions about on our Discord server for this specific live stream. So, you can update us there.

stream. So, you can update us there.

We'd love to hear back from you on this.

Um, okay, we've got another question.

This one's coming in. Uh, oh, actually, just thanks. Uh, we got some thank yous

just thanks. Uh, we got some thank yous happening for the answer. So,

>> awesome. Yeah, very welcome.

>> Great questions.

>> Yeah. Yeah, that's what we're here for.

Um, so feel free to keep the questions and comments coming in. Um, I think Aiden, that brings us pretty much up to date.

>> Okay, awesome. So I'm going to go through the multicontrol recipes of uh using actually these control modalities that we just generated uh and doing some

uh actually doing the augmentation. So

um because of timing purposes, I'm not going to run all of these recipes. I'm

going to run some main ones that I want to show. And actually, let me interrupt

to show. And actually, let me interrupt you one second because you made a notice in the chat and I forgot to put it up on the screen. But we got a good question

the screen. But we got a good question that we want to make sure we hit before moving on further from tech lover.

Techlover B asking, "Is this notebook public?" Well, Tech Lover, we've got

public?" Well, Tech Lover, we've got good news, I think.

>> Yes, this notebook is absolutely public.

Um, you can find the Cosmos cookbook page on our on our GitHub and then you can also find this notebook in the Rev link uh that I showed at the start of the stream.

>> Amazing.

Okay, great. Um, actually maybe one more question comments coming in actually.

What level of familiarity with CUDA or GP program is expected to effectively extend the recipes in the Cosmos cookbook? Basic Python background

cookbook? Basic Python background suffice or should developers have GPU level debugging skills?

>> Yeah, that's a great question. Um, you

don't need to know any CUDA to be able to do this. Um, that's why we use libraries like PyTorch or TensorFlow that actually communicates with the CUDA back end over here. And all you need to

do is kind of interact with the highle API. Um, but in order to actually like

API. Um, but in order to actually like recreate these recipes, you actually don't need to do any of that because um there's there's all the instructions in and the scripts of how to run this, uh,

all you need to do is go to the Cosmos transfer page. Uh, and there's like a

transfer page. Uh, and there's like a little inference section and it would just clearly walk you through on how to set this up and run it.

>> Amazing. It looks like uh, Tech Lover has his weekend planned out now. So,

awesome.

>> Keep us updated on your progress.

[laughter] All right, we have a couple more questions, but maybe we'll circle back to these after this next se segment.

>> Yeah. Yeah, that sounds good. Okay, so

the first thing I'm going to walk through is color and texture change. So,

this could be like the example A talked about earlier like if you want to change um the person's shirt color or if you want to change the texture to maybe I'm wearing like cloth. If I want to make

this like um I don't know like a silk texture, then this would be the recipe that you want to use. So, uh, this one's the most simplest one, uh, since we're trying to preserve the structure, and

all we want to do is change the color.

So, all we need to have is the edge modality, and we'll just put that as 1.0 as that's the only one we're using. And

this is kind of the result. Uh, you can see that the person on the right is wearing a red shirt over here. Um, so

yeah, so this is probably the simplest one. I would always try this one first.

one. I would always try this one first.

Uh if you guys want to check out the prompt that we use, it will be over here. And then if you actually want to

here. And then if you actually want to check uh run Cosmos transfer, um you can uncomment this command and run it. Um

and then the you should get results something like this. The next thing I want to talk about is lighting change.

So uh this would be modifying some scene lighting. Uh either going from like day

lighting. Uh either going from like day to night, night to day. uh if you want uh different kind of lightings like maybe you want it on like a sunshine morning or like a dim evening. Um the

goal of this is to be able to kind of augment in different conditions so we when we train our model uh it'll understand what the gesture is in different conditions, right? Because um

we don't want the model to just work on a specific time of day. We want it to work at lighting conditions, dark conditions, uh foggy conditions. So, uh

this is just an example uh that I'm showing over here, but uh you can try it out with your own prompt. Um you don't have to change it to uh like afternoon lighting. You can try like what would

lighting. You can try like what would this look at at night or like early in the morning or even like foggy conditions. Um so this recipe uses viz

conditions. Um so this recipe uses viz and edge and we use a very low amount of viz uh in order to kind of understand what the original struct original colors

looked like. So you can see that like we

looked like. So you can see that like we have a different lighting over here, but the main structures like the bridge in the back or like the benches, they remain the the same color and then it's

more of the background lighting changing. Um so again, this is the

changing. Um so again, this is the prompt and then feel free to uncomment this and then run it. Uh and you should get some results uh resembling this. So

um the most complex one, most complex recipe that I'm going to walk through today is the background change. So this

is when you would go from like a video like this into some other background. Um

so this recipe is a little bit more complex where you're going to use viz seg a filtered edge and a reverse mask.

So um what does all what does all of this mean? So the ver reverse mask I

this mean? So the ver reverse mask I talked a little bit about earlier is when you're trying to apply the pixels uh in a specific area, right? So in this case, we want to keep the person waving

their hand and change the background. So

that's why the pixels, we have black pixels, which our model's not going to apply our control modality on. And then

we have the white pixels where our model is going to apply the control modality on. Because segmentation generates uh

on. Because segmentation generates uh these like new objects in the background, we're going to pair our reverse mask with our segmentation modality, which essentially tells our

model, hey, let's generate um new objects based on our prompt and everything besides this person. waving

we're using viz most viz and then finally we have our filtered edge so uh I talked about edge modalities I have not talked about what a filtered edge is

um but uh essentially this says that um we're maintaining the structure of the the person over here and that's why you see kind of the cany edge control applied on the person but not in the

background um so we I'll talk a little bit about how we're going to generate the filtered edge So we have the selected object mask which we generated earlier. Uh so this is the subject that we're going to

maintain. We have the Kenny edge that

maintain. We have the Kenny edge that was generated which we also showed earlier. And if you add these two then

earlier. And if you add these two then you can see we end up with a filtered edge. Um Cosmos transfer does have

edge. Um Cosmos transfer does have commands to be able to run this. Uh you

can run this on yourself and then you'll end up with something like this. And the

purpose of this is uh remember edge is there to maintain structure. So we're

trying to maintain only the structure of the person. Um so once we have all these

the person. Um so once we have all these four ingredients uh you can generate you can generate a realistic background. Um

so you can actually play with viz as a hyperparameter because that figures out how to maintain how much background you want to maintain. Right? So, ul showed you an example earlier where we have the

original video. Um, but with high viz,

original video. Um, but with high viz, if we want to change it to something completely different with ocean, uh, it won't work that well because you can see kind of like the bridge structure in the background because viz is trying to

maintain consistency of what's happening. Um, and if we have an ocean

happening. Um, and if we have an ocean without viz, you can see it's very, very nicely changed. Um, with viz, if you're

nicely changed. Um, with viz, if you're trying to change it to something similar, uh, you could see that our background changed into a street, it looks very realistic. Um, and it still

has some of the the fragments behind like you can see like the bridge that was maintained. You can see like the,

was maintained. You can see like the, uh, the side bridges, the colors are changed, but the structures are sort of maintained in this area. This is the kind of results uh you would expect

using viz as a this is control modality and giving it a high weight and this I will actually demonstrate running uh cosmos cosmos transfer. I am using eight

GPUs um H100s so this will take around I'd say maybe like 2 minutes. This will

be um this will be a lot faster than if you guys use brev uh because I think brev allocates automatically a single GPU um and because there's less compute it's going to take a little bit more

time. Um so yeah feel free to try this

time. Um so yeah feel free to try this on your own time. Um if we get some brev credits allocated feel free to use that.

You can use your own brev credits and you should be able to generate some results like this. Um, but we'll wait for this to run so that I can actually just show you a full generation and

maybe I can answer some more questions while uh trans and I have some I have some good news that uh we got permission to share some brev credits that we gave away a few weeks ago in a digital twin

live stream. Um, awesome leveraging

live stream. Um, awesome leveraging brev. So, I'm going to put that up on

brev. So, I'm going to put that up on the screen. Um, and I would recommend

the screen. Um, and I would recommend that anyone watching this live try to try to snag these. Uh, because you're gonna watch you the people who are watching the replay also gonna have ability to grab these. So, we'll make sure that everyone watching live

definitely has a chance. Uh, okay. So,

let me see. I apologize for the quality here. I had to take a screenshot from

here. I had to take a screenshot from our live stream to get this over to everybody. But, I think it's um I think

everybody. But, I think it's um I think it's going to be viewable. Let me just make sure I got the right thing here.

Is this going to show window? I think

this will show. All right, let me see if I can make this a little bigger.

Let me not block the QR code. Oh, yeah.

Let me put the the text first. Let me

Can I move my webcam here? Let me go to this.

There you go. Okay, so there are the instructions uh for grabbing some credits. I forget how much time this is,

credits. I forget how much time this is, but I think it's a good chunk uh nice chunk to get started. So, um,

so take a look, screenshot that, um, and I'm going to go ahead and look at some other questions and keep this on the screen for everybody.

All right, we've got some, uh, good questions that are started by Amelia.

Let me go from the latest to backwards.

So, here's one from um, from Tech Lover again. Quick question because we got to

again. Quick question because we got to help him get make sure his weekend is okay. um to generate SD of different

okay. um to generate SD of different activities within a warehouse like forklifts, is it better to generate them in Isaac SIM or Cosmos? Could do it faster from a first video?

>> Yeah, so this is an amazing question and I'm really glad you put this up. So

there's two main approaches that we do this in Nvidia. One is to run in a full simulation and one is to kind of um use Cosmos. So, uh, an actual simulation has

Cosmos. So, uh, an actual simulation has a physics back end and we're we're able to actually import 3D assets and simulate what's going to happen if we put them in like a warehouse and we put

physics and we can actually record a video and learn from there. Cosmos tries

to embed these physical properties in their neural layers. Um, so it's going to be a lot faster. Um, but, uh, it won't be as physically accurate as our

full 3D simulation. Um we have a lot of um I guess because um we have a lot of like interplay between them. Um there's

like a lot of workflows that you can use with omniverse and cosmos at the same time. Uh some they kind of complement

time. Uh some they kind of complement each other and I guess like omniverse might not look as realistic but cos cosmos can help that make it realistic right. [snorts] Um so we have I actually

right. [snorts] Um so we have I actually have an example of how to use cosmos with omniverse at the end. Um, but

hopefully that that answers your question. Um, so I I guess the

question. Um, so I I guess the conclusion would be uh it would be a lot faster if you use it with Cosmos, but it would be more physically it there would

be a better chance that it would be physically accurate and you can kind of use it in different scenarios if you run it with Isixim.

>> Okay, cool. Super helpful. And I

apologize, but my producer Amelia is telling me that I posted the wrong coupon there. I'm uh my head's doing

coupon there. I'm uh my head's doing spinning here because I'm trying to find the right chat here. Uh all right. So, I

will grab this the the Oh, here. Wait.

Here it is. Let me see. This is it.

Okay. Yeah, this is it. Okay. So, hold

on. So, if I had the blended black thing, I would go So, forget what you just saw a minute ago. And uh we'll try this one now and then we'll go ahead and take your questions while we're sharing this on the screen. So, these are the

credits that uh are available for you to use. Ameilia, please yell at me again if

use. Ameilia, please yell at me again if this is the wrong one.

Uh, okay. All right. So, there's the uh that

okay. All right. So, there's the uh that is the correct brev credit to use. Um,

so far I don't see any messages coming into me that that's wrong. So, this

should be good. Okay. Um, so I apologize for that. All right. We got another

for that. All right. We got another question. Um

question. Um this is coming from uh YouTube. For

robotics workflows using Cosmos transfer multicontrol, can we stream generated frames RGB and depth plus sec live into SIM to real pipeline? Um and then he

follows this up. For example, Isaac Sim to continually augment training data during development.

Is that is that enough information for you guys?

>> Um maybe I can just uh try to take that.

So uh so from a technical perspective you are able to stream in data from ISXM and you can use that to uh uh like use

Cosmos after that to make some modifications to that data. Cosmos would

obviously you would need like a lot of compute to run it in real time. That

would be one consideration. And uh um the second thing would be like um if the goal at the end of the day uh is to use that data that you have collected from a

robot to train some policy model or something like that. I believe a more scalable way to do this would be to collect a small set of demonstrations

first and then use Cosmos with like a good uh pipeline where you're changing the prompts to define different types of augmentations and then running it at

scale to generate large amounts of data.

>> Okay, great. Very helpful. Okay, I'm

going to move this on the screen now.

Hopefully everybody got it. Um, all

right. I think uh Okay, Aiden, you're ready to go. I'm gonna switch back to your screen.

Okay, here we go.

>> Awesome. Yeah. So, you can see the end results over here. Um, it's the person waving in like an ocean background over here. You can see cars going by, people.

here. You can see cars going by, people.

Uh, and then you can see like the lighting being nice and real. So, uh,

one last workflow I want to talk about is, uh, generating realistic data from Omniverse. So, I'm sure a lot of you had

Omniverse. So, I'm sure a lot of you had some great questions about this, and this should be able to answer some of your questions. uh after this uh I'll I

your questions. uh after this uh I'll I I'll share two kind of uh recipes. One

is to use photoreal generation and then one is creating a realistic background.

Um and then after that I think we can focus on answering questions.

>> So uh in in Omniverse which is our 3D simulation platform it's very easy to kind of generate these uh different control modalities that we talked about

using Omniverse Writer. uh since we have the ground trth um objects in the scene we know exactly where things is uh and we're able to generate these very

easily. So um kind of what we shown

easily. So um kind of what we shown earlier like we have the original video we have the edge control modality we have the segmentation and then we have

the binary mask. So photorealistic

generation um generates something that looks more real. So Omniverse is a 3D simulation platform right? So things if you look like right over here uh this

may look a little bit simulated. It

might not look like a real warehouse.

There's no lighting. Um so if using Cosmos transfer we're actually able to augment this and make it look something a little bit more realistic. Uh we have lighting coming in the boxes look

realistic and this is using depth uh seg with a mask and then the filtered edge.

Uh the second workflow is generating a realistic background. So this is uh

realistic background. So this is uh similar to what I showed earlier of like the person waving with an ocean background. Um so this is using the

background. Um so this is using the reverse mask um viz seg and a filtered

edge. So all of this will generate um a

edge. So all of this will generate um a a video of this robot operating this box with a realistic background based on the

prompt that you use. Um so I kind of walked through two ways on how we can generate these like different augmentation. One is to use cosmos

augmentation. One is to use cosmos directly with real video and then one is to use cosmos with a simulated platform with omniverse. And once this finishes

with omniverse. And once this finishes and we're able to actually run uh show this video I think that is it on my end.

Uh so while this is running I I can answer some questions and then Okay.

>> Yeah, >> sounds good. Here's one coming in from LinkedIn from a little earlier in the broadcast. What are the wider

broadcast. What are the wider implications you see happening with Cosmos in future across different fields and how long or what achievements need to be made to get there? So this is a kind of broad broad high level question

but an interesting one. What do you guys think?

>> Yeah. So I think the the biggest thing with Cosmos is we're able to scale up uh data. So um in in not only in physical

data. So um in in not only in physical AI or AV situations um there's it's very hard to get data from real world

environment um and especially in some use cases like physical AI where you actually need a robot to actually go out and perform actions record it and send it back to your computer. And because

we're training these like generative models, um we need tons and tons and tons of data, right? Like chat GPT was generated with like the entire scale of

internet uh data to to kind of scale this up with like physical AI um or any other examples that have data that's

hard to to to get. we needed to be able to to have a cheaper way to kind of um augment these videos, able to generate new videos, able to reason over them um

to kind of get to the point where we have enough data to generate high quality results for our downstream models. So that that is I think the

models. So that that is I think the purpose of Cosmos. Um Ale do you have any thoughts on this? Uh maybe one thing to add to that would be like cosmos at

the end of the day represents a world foundation model. So it has the

foundation model. So it has the understandings of the world like the physics in it and the and the different interactions that are possible in the real world. So uh we are doing a lot of

real world. So uh we are doing a lot of work internally uh to kind of leverage that ability and bring it directly to manipulation and other tasks in

robotics. uh one cool paper that you can

robotics. uh one cool paper that you can go out and read now is something called Cosmos policy where we fine-tuned a predict model to generate uh trajectories or generate actions that

control the robot. So if you look at the paper you can see that uh using very little uh training data like five like less than six or maybe like 20

demonstrations we were able to uh teach the predict model on how to control a robot and do different tasks. So that is kind of like the nability that you get

from a world foundation model and I think over the next couple of months and years you'll be uh able to see more and more progress in that direction.

>> Amazing. Got this question regarding Spark. Spark is all the rage lately. Can

Spark. Spark is all the rage lately. Can

we run these notebooks on DGX Sparks? We

load the launchable onto the Spark.

>> Um so a launchable is uh it makes an instance for you on the cloud. Um so if you have a DGX spark um you you don't need to you don't need to you know allocate some more GPUs on on the cloud.

Uh you can just run this directly on the DGX spark and the the answer to that is yes you can run this on a DGX spark.

>> Great very cool another question coming in from LinkedIn from Rishav. Uh is

Cosmo transfer closer to domain randomization or domain adoption or something fundamentally different?

Um, you could say both. Uh, because we are randomizing domain obviously if we're going to do like lighting changes and color changes. Uh, but you could also use this to adapt to new domains.

Um, if you if you want to change, for example, like the background that I showed you, if you want to try this in a completely new area, you can do that.

Uh, so I would say the answer to to that question is yes. [laughter]

>> Okay.

>> Um, yeah, we got our results over here.

You can see it's the robot operating like in a street environment. It has the same box. Um, so yeah. So I think that

same box. Um, so yeah. So I think that wraps up my uh kind of hands-on notebook session over here. As a reminder, uh this is available. Uh you guys can try

this out yourself. Um and yeah, I think that's it from my end.

>> Okay, great. All right, so I'll uh let me see. Let me add us back to stage like

me see. Let me add us back to stage like this. That's not correct. Here we go.

this. That's not correct. Here we go.

That's better. Okay. Um, you guys have time for a couple more questions. I know

we're almost out of time. Um, just a couple more quick questions. This one's

coming in from Omniverse Ambassador, uh, Dylan Tobin, aka Tom Bombattles. Uh,

that's his nickname. Don't ask me why.

Are there any models that combine segmentation like SAM 3 with point cloud and then marching blocks or something to render meshes?

>> I think Aul got this one. Yeah, I think there are some models out there like that do 3D SAM where you're able to segment point clouds directly. Uh I

don't think we have it directly integrated to Cosmos, but you can definitely try that out.

>> Amazing. Very good. Thanks for that question, Dylan. Okay, another question

question, Dylan. Okay, another question coming in from YouTube from Melvin Tang Games. I've been working on an Isaac sim

Games. I've been working on an Isaac sim to cosmos transfer pipeline. Do you

think we can run scatter 2D replicator to randomize objects runtime without having to use a register replicator?

Do you need more context on this one?

>> Yeah, maybe. Do you think you can run scatter 2D replicator? Uh, you can use replicator to random uh randomize objects like positions and different things in runtime and that will allow

you to generate different renders out that you can use with Cosmos as a downstream pipeline. I don't know what

downstream pipeline. I don't know what specifically you meant with scatter 2D, but I hope that answers your question.

>> Okay, feel free to clarify if you want in a in a follow-up comment. Um, this

one also coming in from YouTube from the machine Ericcom. Can Cosmos transfer

machine Ericcom. Can Cosmos transfer help define a consistent control style across different agents so motion and behavior feel unified even when data comes from mixed sources?

>> Do you want to take it or should I try?

>> I I'm not sure what this question means, so you might want to take this one.

>> Yeah, I think it's not fully clear. uh

motion and behavior feel unified when it when the data comes from diff mixed sources. I'm not uh exactly sure what

sources. I'm not uh exactly sure what that means.

>> Okay. So feel free to if you want to follow up and give us more clarity there that would be fantastic. Um, and I also want to so you know what I'm really excited about in addition to these

amazing demonstration and overview provided by our guests is the that the viewers over there have so many interesting questions um and comments on uh on this and um we would love to work

directly with you. So, if you're actually one of the viewers here and you're asking these questions, you're working with it, you fooling around with it, uh I I would love to follow personally, I would like to follow you up with you so we can kind of learn more about the work you're doing and how how

your experiences and if we could share any of those stories, too. Um I have found that the most uh inspiring uh things actually come from developers in the community. So, u so we love sharing

the community. So, u so we love sharing those stories and celebrating them uh and help promoting your great work as well. So, if that sounds like something

well. So, if that sounds like something you're interested in, um I would love to work with you. our team would love to work with you. Uh the easiest way to get a hold of me is through Discord if you're on our Omniverse Discord server,

obviously. Um and I'm I'm Edmar on

obviously. Um and I'm I'm Edmar on there. Very easy. Uh or you could send

there. Very easy. Uh or you could send an email to me at Nvidia. So my email address is edmarinvidia.com.

Uh very easy. Just my name invidia.com.

Um you could also go to edmar.net and that brings you right to my LinkedIn page. So you can shoot message there.

page. So you can shoot message there.

So, if you're working with Cosmos, bottom line, uh, get in touch with me because I want to I want to chat with you because there might be some things we could do to help each other. Um, okay. So,

getting a lot of good thanks here from the community. Oh, from Frank Shang, one

the community. Oh, from Frank Shang, one of the one of the longest community members we've got. So, really good to have Frank watching here today. Um,

here's a good question coming in from YouTube. What is the maximum features

YouTube. What is the maximum features can can uh maximum features given at a time of multiple tasks? Um

I'm not pronouncing this correctly but um I guess they're asking uh about limitations here. Sounds like

limitations here. Sounds like >> um so the maximum features uh >> I guess what maybe what affects performance or I guess you know I guess

it's hardware I'm assuming and uh you know what kind of caps are there and um I'm guessing but feel free to go if you

want to you want to follow up with a little bit more detail there but u is there any guidance we can give on on some limitations that we know Maybe I think I think the question is

like related to the like the number of things that you can have changed in in a single generation.

>> Okay. Yeah. So I think um I guess because you're playing with these different control modalities. Um so

depending on the task, this will be a different uh answer. For example, if you want to play around with maybe if you want to change the color and then the lighting condition, uh it's the same similar recipe, right? because you're

using you're you're trying to maintain the structure. You're not generating any

the structure. You're not generating any new objects. Uh so you're able to kind

new objects. Uh so you're able to kind of do a lot of these augmentations as once. But once you start to mix in more

once. But once you start to mix in more complex ones like for example uh say I'm holding a water bottle and you want to change that into like a banana or like a watermelon that's not physically

realistic. It's hard to kind of add on

realistic. It's hard to kind of add on different tasks while using Cosmos. So

uh the answer to that is it depends on what you want to do.

>> Okay, they clarified here. I don't know how this person did that, but he's got two pictures here, but it's the same name. So, the same person. Number of

name. So, the same person. Number of

tokens, vertices, edges, or polygons.

That's what he's kind of talking about.

>> Gotcha. So, um the I think the number of seconds Cosmos transfer specifically supports uh I think, correct me if I'm

wrong, Cole. 30 seconds.

wrong, Cole. 30 seconds.

>> I believe like um the best generations are like around 10 to 15 seconds.

>> 10 to 15 seconds. Yeah,

>> cool. Nice, nice uh compliments here to both of you uh from LinkedIn. Insightful

session. Thanks, guys. All right, let me see if we have any more pressing questions. Um, okay, this is a

questions. Um, okay, this is a clarification from an earlier comment from the machine Ericcom. I'm asking

whether cos transfer can standardize behavior if we train multiple agents with mixed data sets. Can it make their motion patterns feel consistent instead of each one behaving differently?

>> Is that so? Cosmos transfer would helped uh train these different agents, right?

So it would help augment um these different data sets. So um I guess if you're asking like if the data sets are very distinct and you're trying to put

them into like a similar domain, you could use Cosmos transfer to try to augment into that similar domain. Um

with the downstream tests of does their motion patterns feel consistent in consistent instead of each one behaving differently. This will completely rely

differently. This will completely rely on like the model you're using and what data set you're using and what augmentations you want to do. Um but the to if your question was will this help

kind of converge into the domain I want to be training on then it's yes these augmentations can definitely help you out.

>> Amazing. Okay. And I think this might be the last one. Uh last last question coming from YouTube. I understand the world is simulated via Cosmos Foundation models. How is the actual robot

models. How is the actual robot simulated like the digital twin?

Yeah, this seems like a Isaac Sim versus Cosmos question. Uh, Ale, do you wanna

Cosmos question. Uh, Ale, do you wanna do you want to talk a little bit deeply?

I think um having those separated be nice.

>> I think Yeah, so I think the larger point here is that Cosmos transfer allows allows you to create visual variance. Uh so the simulation part has

variance. Uh so the simulation part has to h happen in either an omniverse or if you're using a real robot, you would have to teleyop it. So that would how you would get the initial data and then

once you have that initial data in a form of a video then you can use Cosmos transfer to create these visual variances.

>> Okay, that's very helpful. And this is a great update I just got a couple minutes ago from our ambassador who's hosting a Isaac Sim and Isaac Lab study group which happens every Wednesday. Uh we

have a great group of developers who hang out on the Amnes Discord server Wednesday 12:30 p.m. Pacific. If you're

watching live, it's basically in 20 minutes from now. Otherwise, it's every Wednesday. Um, so today, um, Amelia had

Wednesday. Um, so today, um, Amelia had asked what, uh, what they're going to be covering and Dylan or Tom is going to be there. Um, and he's going to be sharing

there. Um, and he's going to be sharing more information on his, um, u on his project that he released to the public MCP server u to help developers with

their questions. Um, it ties into

their questions. Um, it ties into reading information, follow our blogs and tech resources. So, great resource.

Uh, so go there for that. Also, if you follow me on LinkedIn, I' I've posted about this and shared it there. Um, but

they're gonna maybe dive a bit deeper into the Cosmos cookbook, which is awesome. So, if you want to take the

awesome. So, if you want to take the conversation further with some members of the community, now's your chance. 20

minutes now, if you're watching live, I think it would be few beneficial to talk about a few of the new features or updates to CUDA. Uh, so that's super exciting. Um,

exciting. Um, hey, study group is awesome. Come see

us. This is from uh Papa Chuck, also one of our our core uh OGs on the in the community. Uh Arena is his real name. So

community. Uh Arena is his real name. So

uh and he just actually did something really cool. Um here's a clarification

really cool. Um here's a clarification on where the where the study group happens. Uh yes, it's in the community

happens. Uh yes, it's in the community room. So if you go to the Omni Earth

room. So if you go to the Omni Earth Discord server and um again, I'm going to put this huge banner up on the screen so people know how to get there. They're

covering. Look, [laughter]

there it is. discord.ggenvidia

gg/invidia omniverse. Um, sorry Aiden for covering your face there. Um, but

that is the uh that's the that's the address. That's how you get there

address. That's how you get there specifically. Again, anyone ever has any

specifically. Again, anyone ever has any problems, reach me uh on email edenvidia.net or on LinkedIn. Just go to edmar I'm sorry, nvidia edenvidia.com or

edmar.net will bring you to my LinkedIn page or on the Discord server. Um, all

right. I'm want to see if there's any last minute comments, questions. I think

we're good. So, I'm going to go over a couple of quick news slides and Aiden and Aul, I don't want to keep you here any longer. You have been fantastic.

any longer. You have been fantastic.

Thank you so much for putting this information together. I'm just going to

information together. I'm just going to share a couple of quick announcements with the community, but you guys, I know you have a very busy day. Um, so you can um you can you can bid I do if you like or you can just watch a couple of these

slides. It's up to you.

slides. It's up to you.

>> Yeah, thanks for having us.

>> Yes, thank you guys.

>> Um, so really exciting content today. Uh

again, I really want to encourage people to uh to come and um share your cosmos stories with us. I think it would be really fantastic for us to learn more about the work you're doing and we also

like I said want to celebrate it with others. Um let me share my screen here

others. Um let me share my screen here so you can um uh where I got to remove this first

and share this.

All right, here we go.

All right. So, this is super exciting because we got a lot of good comments um online uh from our promos of this.

People uh were really excited to see um earlier content because this is basically a series uh and they've done a couple of them so far and these are all going to be available on Nvidia on Demand. Uh and you can still catch the

Demand. Uh and you can still catch the remaining ones. They're basically

remaining ones. They're basically webinars, live webinars, but they're the content is being shared afterwards in a couple different ways. on Nvidia on demand like I mentioned, but also

they're going to be added to Nvidia um um a YouTube playlist uh on robotics um developers u for robotics development.

So um hit that QR code, it'll bring you more details on these sessions uh from this series. I think it's an amazing

this series. I think it's an amazing resource. Um again, if you're following

resource. Um again, if you're following me on LinkedIn, I I posted about this.

You can also join the Nvidia Omniverse LinkedIn group where this has also been shared, but that QR code will give you all the great details. Um, and here also if you're watching in the chat, look at this. Amelia is so good. She's posting

this. Amelia is so good. She's posting

the links there for everybody. So, you

don't have to worry about that QR code.

Uh, and yes, she's reminding us registration is required. So, um, so don't just show up at the last second.

Um, okay. So, then we have, um, this is super exciting. You see, I got my Oh,

super exciting. You see, I got my Oh, you can't see it, but I got my Santa Claus over here. uh make your robot's holiday wish come true. Um this is really cool because there's uh there are some nice sales happening. So if you're

looking for some hardware, hit that QR code and it'll bring you right to the discount page. I know there's a lot of

discount page. I know there's a lot of questions about hardware. Um so if you didn't get that piece of hardware you've been dreaming about, um you know, now is a chance to snag it at a nice discount.

The other thing that's really cool and discounted, we had a good question earlier about OpenUSD, our OpenUSD certification exam. Uh, and I'm seeing

certification exam. Uh, and I'm seeing so many great stories online, uh, from people who just passed their exam and now have that open certification and really excited and gives them that that really nice recognition on their

LinkedIn page to um, uh, to their community and to to companies looking to hire people with that kind of experience. The certification exam is

experience. The certification exam is 50% off, half off. Um, uh, we're going to try to produce a nice video uh, highlighting this later today, so wish me luck and hopefully it comes out okay.

But you might see that on social media soon, but no need to wait for that. You

can uh get the discount right now. So go

ahead and uh let us know in the chat if you need that um information. But so so 50% discount on open certification and there's good hardware discounts at that QR code on the screen.

Uh if you're a member of the community, you're very familiar with these resources. But if you're a new member of

resources. But if you're a new member of the community, maybe you're just watching live stream for the first time.

We have a couple of really great links for you. Uh we have our events page

for you. Uh we have our events page which shows you all the upcoming live streams, the study groups, the office hours. Like today was a robotics office

hours. Like today was a robotics office hours. All these are listed on our ad

hours. All these are listed on our ad event page. That QR code will bring you

event page. That QR code will bring you right there. Cool thing about add event,

right there. Cool thing about add event, you can look at any of those events. If

you say, "Hey, yep, that's what I want to go to." You can add to your calendar literally within a couple seconds. You

can also subscribe to the calendar so you see all the events on your own calendar, which is also pretty cool. If

you have any technical uh issues and you need support, there are two places to go depending on the um on the technology you're you're need help with, most of the things fall under our official

forums. Um and that's the QR code that brings you there. Some things like Isaac Lab, we want people to go to the uh the GitHub page, the issues area on the GitHub page. So depending on what it is,

GitHub page. So depending on what it is, um you'll go to one or the other, but this this link QR code will bring you to the official forums. Um that's the best place to post your bugs. Um, don't post post them on Discord because we can't

track them there. Use Discord for uh getting help in engaging with other users, but for official support, use the forums or the GitHub uh on the issues page. And then we've been talking a lot

page. And then we've been talking a lot about our Discord server and the study group that's happening in in a few minutes from now watching live. That QR

code will also bring you there. Again,

it's discord.gg/invidia

omniverse. An amazing resource with channels by topic, including Isaac Sim and Isaac Lab and Cosmos. Um, I'm

actually just requested some developers from Cosmos team to to engage in the Cosmos channel, our Discord server. So,

look for some more official uh guidance there if you have any questions or comments. Um, and then we have our

comments. Um, and then we have our community page. Um, it says ambassadors

community page. Um, it says ambassadors on the screen, but that that QR code will bring you to our community page where we do list all of our ambassadors.

And, uh, I just gave a shout out earlier to one of our ambassadors who's watching right now and going to be in the study group, uh, Dylan Tobin. Um, really

exciting, uh, group of people. And you

can learn more about them and their work and get inspired by them and maybe reach out and maybe collaborate on something.

Um, you can find all that and more from our community page. Um, I think uh, let me see. I'm make sure I'm not missing

me see. I'm make sure I'm not missing anything. Oh, we have a couple. Thank

anything. Oh, we have a couple. Thank

you, Amelia. She's reminding me that um, we have a couple of great uh, things happening uh, this week in addition to you do not want to see that. This is

what I want to share. This, let me make it bigger. We have a couple of great

it bigger. We have a couple of great live streams happening this week. Uh let

me just make this bigger for everybody.

There we go. Okay. So, today we can cross this one off. We just did this. If

you caught uh Oh, there Amelia's here.

This is perfect.

>> I just realized that the one on Friday is not on the calendar.

>> Oh, okay. Perfect.

>> I wanted to point that one out, but you could touch on the the one tomorrow.

>> Well, we can both do it now. You're

here. Now, now you're in it with me.

They're making sure it's on the calendar. Um, we have a great one

calendar. Um, we have a great one tomorrow which is going to be um on robotics. So, it's going to cover it's

robotics. So, it's going to cover it's part of the holiday promotion. So, we're

talking about how you utilize Jetson's um that it's going to go through a quick workflow. Uh it's at 9:00 tomorrow on

workflow. Uh it's at 9:00 tomorrow on the NVIDIA developer uh YouTube channel.

And then we have one um for those that are interested in DJX Sparks. We get

those questions all the time. It is a getting started with DJX Spart with Isaac Sim. That one will be um on Friday

Isaac Sim. That one will be um on Friday at 11. It's not on the calendar yet. So,

at 11. It's not on the calendar yet. So,

subscribe to the at event calendar and once we have that up, you will have that reminder that reminder should pop up in your calendar. So, um yeah, Edmar and I

your calendar. So, um yeah, Edmar and I would hopefully I think we will be at both of them. So, yeah,

>> I will actually put the link uh I'm sure this was put earlier in the chat, but I'm put it again. The advent link is in the the um the chat now. Um but I get I think that's great advice to subscribe

to the calendar here or you can go to each indiv any individual event here. Um

here's a study group we were just talking about that's happening uh in 10 minutes I think. Um so this is uh this one's specific to Isaac Sim and Isaac

Lab. Um and there's a link you can bring

Lab. Um and there's a link you can bring you right there to our discord server.

So anyway, this calendar is a great resource for everybody. All right.

Right. So Amelia, so that so we also mentioned so we got that holiday discount for hardware which is amazing.

We put that up on the screen earlier. We

also have the 50% off of open USD certification. I'm seeing some great

certification. I'm seeing some great stories from people who have just passed the certification exam. Let me tell everybody we often say this it is not an easy exam that that certification is

hardearned right that means something.

The best way to do well in the exam I think there are two pieces of advice we can give. One is take the learning path

can give. One is take the learning path for open USD. Um that learning path has been open source. So people who may be watching or in teaching that kind of content, you can actually take that content, make it your own, leverage it

for your own curriculum or your own courses. Um however you like to use it,

courses. Um however you like to use it, add to it. We would love that. It's open

source. That's a that's a whole idea.

You can add to it, improve it. Um the

other thing is that um we've seen that really helps people is to actually be part of the open USD study groups uh where we have two study groups that are meeting from the community now every

week. One is hosted by our great

week. One is hosted by our great Omniverse ambassador Nandu Valal. Um the

other one is hosted by another great Omniverse ambassador Michael Wagner. Um

uh and his is based in uh in the German language and Nandu is in English. Um,

Michael was actually thinking about doing his um in English actually um but in a time frame, a time zone that's more friendly to Europeans because he's based in Germany. But right now, we have two

in Germany. But right now, we have two OpenUSD study groups. Highly recommend

that if you want that extra edge, take the learning path for OpenUSD and also join the study groups and talk about um talk about OpenUSD. Lots of different scenarios are discussed, challenges,

problem solving. Um so learn with others

problem solving. Um so learn with others I think is a great way to do it. Um

okay. Uh and yes there there is a uh there is Rean. I don't know if I have a link for this actually. Uh let me see.

So there is um I'll put this on screen actually. Um so this is from Rean uh aka

actually. Um so this is from Rean uh aka Papa Chuck. They have uh a nice little

Papa Chuck. They have uh a nice little conference happening. I um Rean, can you

conference happening. I um Rean, can you put the link in the chat so I can flash it up on the screen for everybody? Um if

we don't get the link in a few minutes, everyone go to my LinkedIn page and I'll I'll post it on there later today. But

there's a really cool uh they're calling the goat conference where they're showing u digital twin uh racetrack and with F-110th. They've been leveraging

with F-110th. They've been leveraging some really cool tech uh from Nvidia Omniverse. Uh there there it is. Uh

Omniverse. Uh there there it is. Uh

goatconf. USA. Uh let me see. He he sent it to me directly, but I meant if you can put it in the chat. Uh goat. I'll

have to say it out loud. Goatconf. USA.

Rean, why you got to make it so hard for me? [laughter]

me? [laughter] Um, hold on. Let me bring this up. We're

in the last minute here. Uh, let me just bring this up. Go confus.

This is a live stream, so everybody knows that we are just doing things. Why am I uh Goat? Is

it Goat Con or Goat Conf? Let me see.

I'm not kidding.

>> All right. Rean, I tried. I tried, man.

You got You gotta send.

>> I have it. I have it. I have it. Oh,

>> you have it. Okay. Okay. What is it?

>> It's uh it's go it's go comp us.

>> Oh, there it is.

>> comp us. All right, I'm gonna put this on the screen so everybody take a look.

Yeah, there you go. All right, let me stop this screen. I will add this one.

Um I got to stop my screen first and then share this one. All right, here we go.

So, this is an event happening tomorrow.

You can RSVP now. It is totally free. Uh

super exciting. But if you are in the area, you could also uh go there in person. Uh it's uh December 11th and

person. Uh it's uh December 11th and 12th and you can be a spectator or participant. Um there uh there's a nice

participant. Um there uh there's a nice video here in the background. Again, the

website is goconf. us and I will put it in the in the chat here.

So there there we go. Re Okay. I asked

him several times, "Put the address."

This is what he put in. What What

website is that? Papa Shuck, how does that help us? I'm just joking. He's

awesome. Okay, so here we go. This So,

what can you expect? Live robot racing, Omniverse Lab, Hands-On Lab. Um, I mean, this looks like a pretty amazing event.

Um, so I think if you're in the area, uh, check it out. I don't know if they're going to be live streaming, so maybe that's something that they can clarify, but look at that schedule. Um,

really cool. super excited to see stuff like this happen. If you are doing anything, uh Andy Boy, he's one of the other leads of of the project, Robert Mureer. He's you should follow him on

Mureer. He's you should follow him on LinkedIn and and there's Papa Chuck here. That's his his real name and

here. That's his his real name and picture if you're ever wondering. Um so

that's the team organizing that. But if

you also we love we love hearing about these types of events leveraging any kind of Nvidia Omniverse OpenUSD tech.

So, if you're doing something, let us know so we can help promote just like we're promoting this. Um, and uh um yeah, get more people aware of your event. All right, I think that's

event. All right, I think that's everything for today. We did a lot of ground. Um, Amelia, are we leaving

ground. Um, Amelia, are we leaving anything out?

>> Um, if you're interested in OpenUSD, we have a great stream next week, so keep an eye on the calendar. We'll update the ad event for it. But Pomi hat Pomi RP

PMM for OpenUSD is has a great um streams. She's get working on a great stream.

Maddie is going to make a special appearance as well. For those that have been around, Maddie is our open USD expert. He'll be here. He'll share some

expert. He'll be here. He'll share some updates on OpenUSD. So,

>> so is Pomy gonna be on the stream herself?

>> I do not know.

>> Um but that would be great. We can ask, but Maddie for sure will be there. So,

>> all right. That's great. Well, that's

going to be a great live stream. So,

that's going to be next Wednesday, same time, uh, and place. Um, in the meantime, uh, I want to thank everybody for joining us today. Such great

comments and questions. But again,

Amelia and I would absolutely love to to learn more about your work with Cosmos.

We are looking for those stories to celebrate or looking to support the people who might need some extra help working with their recipes. Um, if you uh are following Nvidia Omniverse on LinkedIn, and you all should be, you saw

some great announcements shared last week and also on the Discord server on our announcements channel regarding recipes. I think every day we we posted

recipes. I think every day we we posted a new recipe. Um, so please um let us know if you're working with those recipes. We have a Cosmos channel on the

recipes. We have a Cosmos channel on the Discord server. Say hello there. I would

Discord server. Say hello there. I would

love it if all the people who joined today's live stream, work with Cosmos, got in touch with us. I'd like to learn more about your work and what your plans are. Um and again see if we can either

are. Um and again see if we can either provide some guidance or potentially uh promote and celebrate your project to a wider wider audience to get that great

visibility. Um okay uh okay we got a

visibility. Um okay uh okay we got a great uh question here. I don't know uh let me see this is link to join for tomorrow's LLM robotics event. Um

>> it's on >> I think I have it here. So here I'll put it in the chat but here is the link right there. Um,

right there. Um, and I will put it on the screen even though it's not something that uh Ameilia actually are you able to do that to LinkedIn because I can't post LinkedIn.

>> I can post on LinkedIn.

>> Okay, cool. I put it on the screen if you want to screenshot that, but we're going to try to um put the link in LinkedIn as well. Unfortunately, the

thing is Streamyard does not allow us to post links directly from Streamyard to LinkedIn. That's why Amelia is always

LinkedIn. That's why Amelia is always doing double duty here. Um,

thank you everybody for joining. What a

great hour and a half we had uh with the community today and with our special guests. Really fantastic. Um I hope to

guests. Really fantastic. Um I hope to uh see you all on Discord and that study group kicks off in just a few minutes for Isaac Sim and Isaac Lab. If you're

watching live, if you caught this live stream late, the replay is available right now on YouTube. Just go to anybody on YouTube channel and go to the live section and you'll find today's live stream along with all of our other

previous live streams. Um until uh until next time, I'll see you on Discord.

Thanks everybody.

Amelia, say goodbye. Hi. Sorry,

[laughter] I was answering a question.

>> Okay. Hi, everybody. [laughter]

>> Okay. Hi, everybody. [laughter]

Loading...

Loading video analysis...