More Speed & Simplicity: Practical Data-Oriented Design in C++ - Vittorio Romeo - CppCon 2025

By CppCon

Summary

## Key takeaways - **Data-Oriented Design for Performance**: Prioritizing data layout over traditional OOP can yield significant performance gains by improving cache efficiency, making code faster and simpler to reason about. [01:13], [04:09] - **Cache Lines: The Smallest Memory Unit**: CPUs fetch data in cache lines (typically 64 bytes). Accessing related data together maximizes cache hits, as the CPU is often idle waiting for data from slower RAM. [20:37], [21:15] - **OOP vs. Data-Oriented Mindset**: OOP models autonomous objects and encapsulation, while DoD focuses on data transformation pipelines. DoD prioritizes data layout and access patterns for efficient processing. [30:00], [31:57] - **Structure of Arrays (SOA) Benefits**: SOA layouts, where data fields are in separate contiguous arrays, are ideal for vectorization and GPU processing, offering performance benefits over Array of Structures (AoS) by minimizing cache pollution. [43:52], [58:40] - **Simplicity Through Data Grouping**: Grouping data by usage (e.g., separate containers for different particle types) makes code simpler and easier to understand than using flags and branching within a single container. [44:44], [49:25] - **Hybrid Approach: OOP Shell, DOD Engine**: A hybrid approach, using OOP for high-level abstractions and DOD for performance-critical inner loops, offers the best of both worlds, combining flexibility with efficiency. [57:57], [01:21:48]

Topics Covered

Why traditional OOP designs can be a performance trap?
Shift your mindset: Data is transformed, not encapsulated.
Data-driven designs unlock simpler serialization and tooling.
SoA: Surgical field loading for maximal cache efficiency.
Combine OOP and DoD: Shell for API, engine for speed.

Full Transcript

Okay, thank you everybody. Thank you so

much. Jason, as you said, it's my first

keynote. I'm equally excited and

anxious. You know, I usually give niche

talks. I know people are going to come

in the room knowing what to expect. So,

speaking to a large audience is uh

different for me, but I hope you're

going to enjoy the keynote. Today we're

going to talk about uh data entered

design, practical data design. And I

want to start with a photo. This is when

it gets there. This is a photo of Mike

Actton giving a keynote at CVON 2014.

That was my first ever conference. And

at the time, Mike was at Insomniac Games

and he came on stage and he challenged

some of the core beliefs of C++

developers. And this was my personal

start of my own data oriented design

journey. I don't know if you recognize

this individual. That is that is me a

long time ago. I don't look like that

anymore, right? I look a bit different.

But yeah, um I was asking Michael a

question here and for me it was a career

changing moment because I hadn't

realized at the time that there was

another entire way of thinking about

code about software that was different

from OP. So even if nowadays I don't

agree with all the conclusions that were

in that keynote, I would still like to

thank Mike for opening our eyes and I

would still recommend you to check out

his keynote. It's very very good. Okay,

a little bit about myself before we

begin. Uh right now I am an independent

C++ consultant, trainer and mentor. My

business focuses on delivering bespoke

C++ training and one-to-one mentorship.

Um I am available for tech training and

guest speaking. So if you're interested,

feel free to reach out afterwards or

check out my website. I spent the last

10 years at Bloomberg. I've been working

on high performance C++ backend

development mostly. I worked on our own

microser infrastructure team and around

market data analytics team and I've also

been doing technical training in terms

of both modern C++ and topics such as

meta programming and multi-threading for

Bloomberg engineering. I am involved in

standardization. If you heard of sto

function ref, I am partially responsible

for that. So you can love me or hate it

for it. And that's going to be in C++

26. You might also know me for the epoch

proposals which I feel like partially

lives on with profiles nowadays. So

that's also something else you might

have heard before. I am quite passionate

about games and game development and

open source in general. I'm part of the

SFML team. I was the person that led the

modernization of the library to target

C++ 17. So I strongly advise you to

check it out. We're going to use it

today for our demos. But I also

contribute to other libraries including

SDL and I have two commercial games

released on Steam which are also open

source. So you can check how the sausage

is made. And I also like virtual

reality. I contribute to several mods

for games like HalfLife 2 and Quake.

Last but not least, I am the co-author

of this book, Embracing Motors Bus

Safely. I wrote this book with my

friends and colleagues, John Leos,

Russell Lapenov, and Alistair Meredith.

It's a reference of all MoS+ features

that covers all the pros and cons in

detail. So if you're interested in that,

again, feel free to ask me afterwards.

But enough about me. Let's take a look

at today's goals, today's keynote, and

what we're going to do. So the goals for

today are I want you to discover a new

mindset. I don't know how many of you

here are familiar with it oriented

programming. Can we have a show of

hands? Okay, so quite a bit of you and

quite a bit of you are unfamiliar with

it, which is good. So this this might be

something new for you. You're going to

learn how thinking in a data first way

changes the way you design a system. At

the same time, we're going to have a bit

of a refresher on how hardware works,

especially memory, and how it

communicates with the CPU and why that

matters for the design of your software.

We're going to do this somewhat

interactively. We're going to, you know,

build this little demo together. I'm

going to show the demo first and we're

going to gradually build up to it. And

we're going to see how changing the way

that the data is laid out without

actually changing the operations we're

doing on the data can really make a

difference in performance. At the same

time, I don't really want to just focus

on the performance aspect of DoD, but I

also want to make a point that it can

make your code much simpler and much

more maintainable than OP, which might

sound counterintuitive, but I'm going to

show you some examples where I found

that to be true. And also, this is maybe

where I feel a bit differently from the

rest of the gamedev crowd. I think that

there still are places where OP makes

sense. We're going to discuss that

later. And I think that the use of high

level C++ abstractions can actually help

achieve your goals and there are places

where when used judiciously um can help

you get better performance and more

maintainability of your code. Okay, so

I'm just going to dive straight into the

demo. I'm going to show you uh the thing

that we're going to implement and I'm

going to give you an idea of what we're

going to do. So for this keynote I chose

something which is quite simple and you

know somewhat visually interesting. So

we're going to have this demo where we

have these rockets flying around. Let me

just yeah zoom it in. And every single

one of these rockets and smoke particles

and fire particles you see on the screen

everything is its own entity. It's

simulated. It has its own physical

quantities and you can refer to every

particular single thing that you see on

the screen through some sort of

identity. So to make the requirements a

bit more explicit, we want everything to

follow laws of physics. So we have basic

motion model with position, velocity and

acceleration.

We should be able to customize these

effects, particles and emitters of

different types with different opacity,

scale and rotation that change over

time.

Everything is an actor. So if I care

about a particular rocket or a

particular part particle, I should be

able to be able to um to have a handle

to it and do something with it. As an

example here, every rocket entity has

some attached emitters that are kept in

sync and spawn the particles. And

finally,

I want to design this in a way that it

is extensible. It should be easy to add

new effects, should be easy to add new

actors, new particle types, and so on.

So it is somewhat of a toy program but I

try to make it a bit more realistic by

having this requirement of being able to

attach entity to water entities and I

feel like it isolates the performance

principles quite well and the lessons

you learn even with this toy program are

applicable to real world code. Now if I

jump here and I open up this menu you

will see that we have some metrics

regarding update and draw performance.

And if I actually start spawning some

new rockets where the rendering is out

of sync for some reason, you will see

that as the rocket spawn, the frame rate

kind of tanks and it doesn't really

become playable anymore. I can zoom out

as well. I can show you that we have a

lot of them. And here you can select the

data layout that we're going to use.

This is the OP implementation. But if we

switch to something like AOS over here,

you will see that we have a dramatically

uh nicer frame rate and it just becomes

much more playable. And you can play

around with this stuff. You can try

other memory layouts and so on. But the

point is we have this demo. We can play

around with it. The operations we are

doing are the same. What changes is just

the memory layout that we use. And as

you can see, it makes a dramatic

difference. And now we're going to see

uh how that actually works.

Cool. So, excuse me. That said, let's

just get to work and let's start

implementing this. So, we're going to

start designing this according to OP

principles. So we have our own let's say

real world model interpretation of the

problem and we're going to try to

transfer that into code. We know that

we're going to have multiple entities.

So we can start with an entity base

class. We're going to have an emitter

class that produces particles. We're

going to have a rocket class that flies

around and has some emitters associated

with it. And we're going to have a

particle class, excuse me.

The emitter can either be a smoke

emitter or a fire emitter in this case.

And the particle in the same way can be

a smoke particle or fire particle. What

you see in light blue is a base class.

What you see in dark blue is a concrete

class. Now I like this. You know it's a

nice and simple uh hierarchy. If I want

to add something new, I can just derive

from entity. If I want to add a new kind

of a mirror or particle, I just derive

from those things. It fulfills all

requirements. Seems easily extensible.

So I don't see any particular problem

with this. Let's try to implement it and

let's see how the code actually looks

when we transfer this into C++.

So we're going to have a strct entity

here. I'm going to use strct just to

keep the slides concise but you know in

reality you would use class and private

and public and so on. I'm just going to

try to focus on the performance and

memory layout aspect of things at this

point in time because this is going to

be a polymorphic base class. We are

going to have a virtual destructor. In

this case we can default it. And then

we're going to have a virtual update

member function that takes the delta

time. So the amount of time that passed

between frames and uses that to advance

the state of the entity forward. We're

also going to have a draw member

function which is virtual. It takes this

thing called the render target which is

from the SFML library and it's just an

abstraction that allows you to draw

stuff either on a texture or a window.

In our case, it's going to be the window

of the application. So not very

important in this case. Now, because

everything needs to follow the laws of

physics, we're going to have a bunch of

vector 2fs here, which are just 2D

vectors of floats for position,

velocity, and acceleration. And if you

want to see how that actually looks in

practice is just two floats x and y with

some nice operator overloads that allow

you to basically multiply, add, and so

on. And again, this is from SFML. It's a

nice and convenient abstraction over a

2D vector.

And that is pretty much it. If we want

to up implement the update part of the

entity here, we can move the position by

the velocity scaled by the delta time

and we can move the velocity by the

acceleration scale by delta time. So

very simple physical integration not the

most accurate but again we want to focus

mostly on the memory layout not not on

the accuracy of the simulation. We have

a problem though right? Entities might

need to know the state of other

entities. they might need to create

other entities on demand. So we need

some sort of way of making that

possible. So one of the things I came up

with and I've seen uh being used in many

other places is something like this. The

entity knows the world it belongs to and

through that world reference it can

query the state of any other entity or

it can create new entities. It can

interact with the state of the world.

So, we're going to do something like

this for our program. And we're also

going to have this little boolean called

alive, which is going to be responsible

uh to tell the world, hey, this entity

is done. Please get rid of it, recycle

the memory, and use that memory for

something else. So, it's just going to

be used to communicate to the world that

already can be that the entity can be

cleaned up. The reason why we use a

boolean is that we want to be able to

communicate that through the update

member function. there might be some

conditions in the update that make the

entity um eligible for cleanup.

Okay, so that's the entity. Just to show

you some more code, we have the

particle. It's going to have its own

scale, opacity, and rotation.

They can change over time. So, we also

have the, you know, rate of change as as

data members over here. We can override

the update member function use what we

wrote in the entity again dry principle

we don't repeat ourselves and then do

the same sort of idea of physical

integration for scale opacity and

rotation. The only interesting thing

here is that we set the boolean uh alive

to be true only if the opacity is

greater than zero which basically means

our condition for getting rid of

particles is when they fade out because

every single particle is going to

eventually fade out in our demo. This is

going to be good enough to clean them up

when we're done with them.

And just to show you what the smoke

particle looks like, we just derive from

particle. We get everything for free

basically, which is quite nice. And the

only thing that we have to change is we

want to override the draw member

function and specify that we want to use

that data to draw something with a smoke

texture. And you can imagine this can be

done for the fire particle and any other

one that you want to add in the future.

Okay, the world is probably the most

interesting part. We have three kinds of

entities here already on the screen. We

want to store them in such a way that

it's convenient for us to work with

them. So, we're going to do this. We're

going to have a stood vector of stood

unique pointer of entity that contains

all the entities. So, those of you here

that are somewhat familiar with the

rented design might already be thinking,

hey, nobody writes code like this.

There's going to be an allocation for

every entity. It's not going to be very

cache friendly. But uh trust me, I've

seen this in production many many times.

If you look this up on GitHub, even with

share pointers sometimes instead of

unique pointer, you have hundreds and

hundreds of hits and you can read

stories about this happening in AAA

studios. People have successfully

shipped very good software that was

highly successful with this design. So

I'm not saying you cannot do this and

you cannot be successful. But if you

know that the drawbacks of the design,

you you also should know that this has

actually been done in practice. I just

want to know like how many of you have

written code like this or seen code like

this in practice. Okay, so pretty much

everybody, you know, this is not uh toy

code. This actually happens. And I also

want to say very popular libraries and

engines do stuff very similar to this

including the SFML library. So it's

something that is used quite all over

the place.

The update and draw member functions are

quite straightforward. We just iterate

over all the entities and we call update

and draw. You might see the nice aspect

of this. We don't care what update does.

We don't care what draw does. We just

delegate that to the entity. So it's

quite nice. And finally, we're going to

do uh a little bit of cleanup at the

end. We have this call to erase if which

is a quite recent function in the silver

standard. It takes a container and a

predicate and it efficiently rearranges

the items in the container in such a way

that they can be removed in constant

time.

Okay, what else? The emitter uh we're

going to do something like this. We're

going to have a timer that keeps track

of how often we want to spawn a

particle. We might have different spawn

rates. And then we're going to do

something interesting. We're going to

define this pure virtual function called

spawn particle that is going to be

overridden by the actual emitter types.

So the update function of the emitter

base class will periodically call spawn

particle but then is going to be the

derived emitter that actually overrides

that and decides what it means to spawn

a particle. The smoke emitter will

internally allocate a new smoke particle

using make unique and then it will push

that back into the world. So you can

also see here how we are referring to

the world from within the update of an

entity in order to affect his state and

create new things on the fly.

Okay, last thing that I want to show you

which is somewhat interesting is this. I

mentioned that the rocket needs to have

some associated emitters. So what we're

going to do here is we're going to use

raw pointers. We're going to say okay

I'm the rocket. I need to know what

smoke emitter is associated with me and

what fire emitterter is associated with

it with me. Now because we have this

implicit knowledge that the world is the

owner of the entities and as long as the

rocket is alive those emitters will be

alive. Using a row pointer is fine here.

We we thought about our lifetime. So

this works in our model and this is

something that you also commonly see

even in large scale programs. The way

you refer between components of the same

program is by using pointers. you're

relying on the address stability of

these objects. The reason why you can do

that in this case is because we are

allocating them on the heap. So even if

the vector gets reallocated with all the

entities, even if we move around the

world, the address of the emitters is

going to be stable.

When we create the rocket, we create the

emitters and we wire them up. And when

we update the rocket, we're going to

move them with the rocket. So pretty

straightforward. And this is pretty much

all the code we need for that demo. So

that said, let's see how fast this is.

So let's do a little bit of a round of

benchmarks.

So the machine I'm going to use is not

the tablet I'm presenting on. This is my

desktop machine. It's fairly beefy. It

has a Intel Core i9 which was

top-of-the-line processor a few years

ago. Pretty fast DDR5 RAM and I compiled

using Clank++ with 03. I disabled

rendering for these metrics, but I'm

also going to talk about rendering here

and there as changing the data layout

can also be beneficial for rendering and

not just for updates.

So with 200,000 entities, we get 2.3

milliseconds, 400k 6.6,

600k 12 milliseconds, 800k 16.6, and

1,21.6

milliseconds. Now you might think this

is just milliseconds. is is fast, right?

Who cares? But if you're thinking about

targeting real time applications, the

budget in a single frame that you have

to reach 60 fps, which is pretty much

the bare minimum for a nice interactive

experience, is 16.67 milliseconds. And

you can see already at 600K, we're

already blowing up most of our budget.

And if you think about having to put in

rendering on top of that, if you think

having to put in more complicated logic

or algorithms, you don't have a lot of

room to work with. At the same time,

remember these are pretty good hardware.

So on a mobile device, this will

probably be unacceptable. And at the

same time, I feel like 60 60 fps in 2025

is pretty pathetic. Honestly, even

mobile phones nowadays, they have 120 Hz

refresh rate displays. So the bare

minimum I think for a very nice and

interactive experience that feels smooth

should be 120 144 fps and if you want to

target that particular frame rate your

budget for a single frame is less than 7

milliseconds. So we are pretty limited

here. We have almost no no budget left

only a 400k particles and we're talking

about C++ we're talking about fast

hardware. We should be able to do better

than this. Now we've seen in the demo

that changing the memory layout of

things speeds up the program

significantly. So the problem is not the

algorithm. The problem is not the

operation that we are doing. The problem

is how the data is laid out in memory.

And the CPU is actually mostly idle

here. It's not doing any work. It's just

waiting for that data to arrive and

wasting time doing nothing which is

again unacceptable if we care about

performance.

So let's take a detour. I'm going to

give you a little bit of a refresher on

memory and tell you why it is important

that we understand how it works

internally to design our software to be

efficient and make good use of our CPU.

So at a very very very high level of

abstraction you can think of the CPU and

RAM as being connected by some sort of

bus and they communicate and everything

is fine. If we peel one layer of the

onion and we go one step lower again at

a high level of abstraction, you can

think of a CPU as something like this.

We have some core where the operations

are actually being done. Some cache

which is some very small but very fast

buffer of memory and then the

communication has to go through a

hierarchy. The data that's in RAM first

has to go through the cache and then

from the cache to the core and vice

versa. So every operation you need to do

has to follow this path through memory.

In real in reality I have a slide later.

We have multiple layers of cache. We

have multiple cores. So it's more

complicated than this. But the the main

principle still applies. The memory has

to follow this hierarchy and move

forward. Now the cache you can think of

it like this is a very small but fast

collection of rows and these rows is are

called cache lines. And this is a very

important concept. A cach line is the

smallest transferable unit of memory. So

even if you care about a single bite,

you still have to transfer an entire

cache line from the RAM to the CPU. Even

if you care only about that single bite

at the same time, you can imagine that

the RAM is a large collection of cache

lines, very very big, but it's slow. So

for example, if you want to read the

data at address 18, it's not in cache.

Our cache is empty. So we can identify

the data being there in RAM and only if

we care about the data at address 18 we

still have to take the entire cache line

from RAM which is a cache miss which

means we take the whole row and we copy

that into cache and if if you imagine

that we don't care about what's at

theent to this data we don't care about

what's next to the D here then we wasted

a lot of time moving data that we don't

care about which is quite unfortunate

and uses our cache in Effectively we can

keep doing this. Maybe we want to read

some data 26. It is not in cache. Again

we identified there in RAM and we have

to take the entire cache line and move

it into the cache which again is a cache

miss and quite unfortunate.

In the best case scenario for example if

you want to read the data at address 19

and we identify that it is already in

cache. You can see it over here. Then in

this case we talk about a cache hit we

don't need to do a round trip towards

RAM and back. So we are going to be able

to read this data way more effectively.

Now why am I telling you this? What is

the impact? How much does it matter if

we have to go to RAM and how much does

it matter if we can stay in cache? So I

like to explain things visually. So I

made like a little animation of arrays

between L1 cache which is the fastest

but smallest cache in a CPU and the

core. And on the bottom we have the RAM

and the CPU. So on the top we're going

to see the best case scenario when when

our data is already in cache and in the

in the bottom we're going to see the

worst case scenario where every single

time that we need data we're going to go

back to the run. Now this slide is going

to be slowed down significantly because

we are talking about operations in the

orders of nanconds but it's going to be

to scale. So keep that in mind. 3 2 1

go.

So by the time that the data and cache

has done multiple round trips you can

see the RAM still you know getting

getting there almost at the middle and

we've already done a lot of operations.

So this is actually to scale as I

mentioned this is actually what's

happening in your program. So if you

have all your data coming from RAM

you're wasting a lot of time just

fetching that data. In the worst case

scenario it can be up to 100 times

slower. So doing a round trip in RAM

compared to L1 cache can be up to 100

times lower on modern CPUs. And just to

give you some numbers the amount of time

that the average blink of a human takes

is 100 milliseconds. The amount of time

that an L1 cache reference takes is 0.5

nconds. So by the time you blink, you

can have 200 million L1 cache

references. In RAM is going to be 100

times less than that. So it's fairly

significant. so significant that

sometimes the choice of where your data

is located is much more important than

the algorithm or data structure that you

use which can be pretty surprising

especially if you are deeply into

theoretical computer science. Sometimes

having a algorithm with worse complexity

can perform better on real hardware.

Okay. So what have we learned?

Very important. The smallest universe of

variable memory is a cache line. So no

matter how much bytes, how many bytes

you want, you're going to have to take

an entire cache line. Generally, this is

64 bytes on modern SOCPUs. Thankfully,

we have a very intuitive way of asking

for the cash line size in C++, which is

stood hardware destructive interference

size. So, you know, you will always

remember that, but you can think about

64 bytes.

All data must always traverse the memory

hierarchy. So if you need something from

RAM, it has to go through all the levels

of cache and then back if it needs to be

flushed back into RAM. Which means that

the spatial locality of data, so where

the data is actually laid out in memory

greatly affects performance and as I

mentioned sometimes even more so than

the choice in algorithm or the choice in

data structure. So we can already start

thinking about some tips, right? If I

have data that is related to to to each

other and I want to access it relatively

close in time uh you know after I access

the first one I want the second one and

the third one and so on then it is

better to store it together in memory

close to each other physically speaking.

So preferring flat and continuous

storage leverages the cache better and

maximizes the chance that if you as you

are taking those cache lines in the data

that you want is already going to be

there because it's part of the same cach

line. At the same time, there's also

something I didn't mention which is

prefetching. CPUs have this speculative

mechanism that basically figures out the

pattern in which you're accessing

memory. For example, if you're in a loop

and you're going forwards or backwards

or you're jumping every nth element to a

threaded access, the CPU can figure that

out and it will already start giving you

the cache lines that you might need in

the future even before you request them.

So also doing very predictable

operations can greatly improve the

performance of your program.

And as I mentioned you to you before I

lie to you. You know the situation is

more complicated. So if you want to be a

bit more realistic this is kind of what

what it looks like in a more modern CPU.

You might have an L3 cache shared

between multiple cores which is larger

but but um but slower. Each core might

have an L2 cache which is a bit bigger

than L1 but a bit slower. And then you

might have an L1 cache for data and L1

cache for instructions. Now I find this

interesting because you know code is

data. When you compile your program into

a binary, the actual code you generated

has to be loaded into memory. So

sometimes if you optimize your code for

size and not for speed, it might

actually be faster. And the reason is

that it might use the instruction cache

better. You might have more of your hot

loops fit into the code cache itself.

So, we're not going to cover that in

this talk, but if you go deep into this

topic, sometimes the way that the code

is aligned also really matters for your

performance. Now, this is all I'm going

to tell you about um you know hardware

and CPUs and memory and so on. I want to

refer you to this nice talk from Scott

Meyers is from 2014, but it's still very

relevant today. CPU caches and why you

care that goes deep in detail. And also

Jonathan Mueller gave a talk here at

CPPCON called cash friendly C++ which

covers the same topics and it goes quite

in detail as well. So I think if you

watch these two talks you're going to

have a very nice understanding

appreciation for modern hardware and

you're going to be able to get a nice

intuition of what can be fast or slow. I

greatly recommend watching both talks.

Just taking a little break.

Cool. So given that why is our

implementation slow and we can actually

figure that out quite easily just by

looking at the world implementation. We

have this entity uh entities vector that

is a vector of unique pointer which

means every single entity is going to be

allocated somewhere in memory likely not

close to other entities. So it's going

to be scattered around which means if we

iterate in our update if we iterate in

our draw and also in our array if in the

worst case scenario every single

iteration would be a cache miss and we

have seen how slow it is can be 100

times slower than L1 reference. So this

probably the worst case scenario for the

CPU. It's just going to stay there

waiting for data to arrive. At the same

time we have other sources of overhead.

We're using virtual dispatch. using

vtables and polymorphism. So whenever

you access the update or draw member

functions, there's going to be a vtable

lookup which had some overhead. Probably

not as important as the cache miss, but

it's something else that that we have to

care about. And finally, if you remember

the way that we spawn particles, we

actually call make unique over here and

arrays if will actually destroy those

unique pointers. So we have a frequent

churn of dynamic allocation and

deallocation. So you can already figure

out this is not going to be very

efficient. We're doing a lot of extra

work, a lot of waiting on memory and a

lot of overhead due to virtual and

allocation to the allocations. So my

question is why do we write our code

this way? Why did we jump onto the OP

hierarchy and with the virtual interface

and so on? And I think this has to do

with the objective mindset. So most

people's introduction to programming

actually is OP university books whatever

you learn about classes you learn about

inheritance you learn about you know the

usual shape based class which can be

circle rectangle or the animal based

class and so on but also it's it's it's

a natural choice like for humans it

aligns quite well with the view of the

world we think in terms of individual

objects individual things and we have

this is our relationship in our head it

just works well so This mindset sort of

works like this. I identified four

things that I think are important. We

try to model a world of autonomous

objects. We think of self-contained

agents with their own identities and

responsibilities. We have a particle.

The particle has its own data. It knows

how to update itself. It know how it

knows how to draw itself. These entities

communicate between each other and with

the rest of the program through

messages. The main loop, the world

doesn't know what the particle is

actually doing. It's just asking, could

you please update yourself? Could you

please draw yourself? We don't care

about the internals. We're just asking

those operations to be done.

The data is hidden. So, we don't care.

We don't expose the internals of these

classes. We hide them. We encapsulate

them, but we expose the behavior. We

don't care how the particle gets

updated, which might be nice sometimes

because it allows us to change the

internal representation without without

changing the behavior, but we're losing

information about something really

important about the data layout of our

particles.

And also, I feel like OP tends to

encourage people to plan for the

unknown. So, you're going to try to

figure out some sort of abstraction or

interface that not only works for

today's problems, but for any problems

you might have in the future. And I

chose the word bet here very carefully

because in my opinion it is a bet. It's

really hard to predict what kind of

requirements you're going to have in the

future. And if you get your prediction

wrong, getting out of the wrong

abstraction sometimes is more expensive

than not having done it in the first

place. So if you're lucky, you might

save some time, but more often than not,

it's really hard to predict what might

be needed in the future.

In contrast to this, if we set this

aside, let's see how the data oriented

mindset thinks about the same problems.

So we don't want to model a world of

autonomous objects. We want to model a

world of data transformation. We look at

code as a pipeline that transforms data

from one state to another. We don't

really care about this notion of an

object, of identity, of encapsulation.

It's just data. We don't have messages.

We operate directly on batches of data.

So the entity itself is not in control

anymore. It's not the individual that

matters. We have operations in bulk on

the data done from the parent object.

The world is going to be responsible for

the update and the dropping of all the

entities. It is in control. And I'm

saying in bulk here because the most

common case for this sort of

applications is not adding a single

particle or a single entity. Having many

of them is the common case. So why are

we designing it with the individual in

mind when the actual real common case is

having things in bulk? At the same time,

we don't want to hide data. Data is the

most important thing. We want to make it

transparent. We want to laid it out for

efficient processing and we want the

behavior to be centralized at a higher

level that sees all the data and is able

to figure out, oh, this is the best way

of actually processing the data that I

have. This is also something that I feel

like is more subjective, but I have the

feeling that this mindset tends to

encourage developers to plan for today.

You want to design for the problem that

you have at hand. You want to prioritize

performance and simplicity to solve that

problem. And you don't want to solve any

problem that you don't have or you might

have in the future. And sometimes this

might actually pay off. Sometimes it's

easier to change your code to adapt to a

a new problem if you if you didn't start

with the wrong abstraction to begin

with. So sometimes this might pay off

even for future extensibility.

So how do we shift our mindset? I think

you have to internalize we have to

internalize that no matter what the only

purpose of code is to transform data.

The focus should not be on modeling an

abstract world of objects that make

sense in our head but on the data's

journey. So we want to get from point A

to point B. How do we do that

efficiently and in the most simple way?

Data is the centerpiece. It's not

something that we want to hide. Why

would we hide the most important part of

your program that actually makes it work

efficiently? We actually want to make it

visible, understand it shape, it size

and access pattern. And those are the

things that are going to drive the

design of the application. The

operations that we do on the data

modern computers, even the best

supercomputer that you have, they all

thrive on simple and predictable work.

So as long as you design your program to

feed computers long straight runs of

contiguous data, you are likely going to

get good performance.

And finally, this is more of a you know

philosophical thing. You want to design

for the machine that you have. You want

to be familiar with the platform you're

targeting with the capabilities of your

hardware because an effective solution

is not aligned with the metaphor that

you have in your head of the problem

you're trying to solve, but with the

physical reality of the hardware. So

this is sort of like the shift that you

would have in your mindset if you want

to approach data oriented design instead

of OOP.

So that said, how can we start

optimizing our code to um move towards

this mindset? Maybe not fully but moving

towards this idea. So for our first

optimization path, we're going to do a

few things. We are firstly going to get

rid of individual heap allocations which

I think is the main bottleneck of our

program. We are going to get rid of

inheritance which is going to flatten

our hierarchy and we are going to

decouple the data from the logic. The

entity classes will now just be the data

and the logic is going to be one level

up. The world is going to be the one

that deals with all the behavior so that

we can see a full picture of what we

have and operate on data in bulk.

At the same time we have a problem right

before we had this nice vector of

entities where we could store things uh

homogeneously. But now what we're going

to do is we're going to have multiple

containers, one per type of entities

that we want to store. It might seem

more cumbersome, but you will see

actually makes sense as we want to

process these things differently. They

have different properties at different

behaviors.

Okay, so we're going to have our emitter

strct. We're going to have our particle

strct and our rocket strct. They're all

going to have their physical quantities.

So we have a bit of repetition, but it's

minimal. Who cares? The emitter is going

to have its own, you know, floating

point for the timer and the spawn rate.

And the particle is going to have the

same quantities as before. But now we

have a problem, right? Before we could

differentiate between fire and smoke

particles because we had this nice

hierarchy. So what do we do now? At the

moment I'm going to take the easy way

out and I'm going to do this. I'm going

to say we have an inum class called

particle type. It's going to be either

smoke or fire and I'm going to store

this information in both the emitter and

the particle. And depending on that we

will do different things. This is not

ideal. We're going to see later how we

change this. But so far this is going to

be fine. The other problem that we have

is that in the rocket we need to refer

to two emitters. We want them to be

linked together. Before we had this nice

property that we could use the address

of an emitter. It was stable and we can

use that to communicate between these

things. But now if we are removing

heapocations there is no guarantee that

the address of the emitters will be

stable. the vector might reallocate. We

might move things around in memory. So

what do we do? Usually um a there are

many solutions but a common solution for

this is using indices. So you're not

going to rely on the address stability

of the actual object in memory. You're

going to rely on the index stability on

the position of this object in the

vector it belongs to. You might need to

change the vector a little bit. We're

going to see how but is a common way of

dealing with this. Other ways might be

using some sort of a hash table where

you store the key and then you look up

the object. You might have special data

structures that are designed with DOD in

mind that help you achieve this. But

generally speaking, the point I want to

make here is that now the relationship

is data. It's just a number. So it's not

something specific to the hardware

anymore, but it's just an index.

Okay. So how do we change our world to

fit this new design? We're going to have

a vector for particles. We're going to

have a vector for rockets and an

associated add rocket function that ends

up doing any wiring that necessary with

the emitters. And then we're going to do

this which I think is quite interesting.

We're going to have a sto vector of

stood optional of emitter. The reason

why I'm using an optional here is to

guarantee index stability. So you can

think of this as being slots where an

emitter might be in or might not be in.

And by looking at the index, we can

guarantee that the emitter at index 4 is

always going to be the same emitter. If

you want to get rid of another emitter

and destroy it, we just make the slot

empty, but we don't have to shuffle

anything in memory so that the index

stability is retained. And again,

there's other ways you can deal with

this, but this is a very simple solution

that works for our use case.

To go along with this design, we're also

going to have an add emitter function

that given an emitter will put it in the

first free slot that is available and

then it will return the index of that

slot. And all these things in

conjunction they work together to

replace the relationship that was based

on address stability by something that

is datadriven. We are working with just

numbers indices and we get the same sort

of relationship behavior as before.

We're going to have the usual update,

draw, and cleanup. So, we're going to

see how they change.

The update function is going to be quite

interesting in my opinion because now we

can operate on all the particles in

bulk. We're not telling each particle,

please update yourself. We have the full

view of the particles and we just loop

over them and perform the operations.

And you can already start seeing how the

compiler has much more information here

to optimize and vectorize and do cool

things with its code.

For the emitters, we're going to loop

over all the optionals. We are going to

skip the slots that have nothing inside

them. We're going to do our updates and

then we're going to create new

particles. Now, here we are actually

going to branch on the type of the

emitter and depending on whether it's

smoke or fire, we're going to push back

something else into the vector. Again,

we're going to see u an improvement over

this later, but so far this is fine.

And finally for our rockets, we move

them. That's that's fine. And the

interesting bit is this one. When we

want to get the associated emitters, we

just look into the emitter um vector

with the index that was stored in the

rocket. We check if that optional is

valid. I think it should always be. So

maybe this should be an assertion, but

you know, I used an if here. And then we

are going to set the position of the

emitter to be the same position as the

rocket with some offset that makes it

look a little bit better so that the

particles are actually spawning out of

the rear of the rocket. Right? And we do

the same for the the fire emitter.

The last part I want to show you is the

add emitter which is the function

responsible for you know creating a new

emitter. What we do is we loop over all

the slots. If we find an empty slot that

maybe used to have an emitter but now is

empty, we can place our emitter directly

there and return that index. So we are

reusing an existing slot. If the vector

is completely full, then we just place

back we have a new slot available. This

might end up reallocating the memory

under the hood. But we don't care

because we are not relying on that. We

are relying on the indices. Now this

algorithm is linear. But you can imagine

that if you want to optimize this, you

could have a list of indices that are

free which you keep track of as you

create and remove emitters. So you just

pop an index whenever you want to create

one and use that and you just push push

it back whenever you're done. So you can

make this constant time quite easily. I

just want to kept it wanted to keep it

simple as the number of emitters in this

program is very small compared to

particles and it's not really

significant that we have this O of N

algorithm over here.

Okay, last last part which is the

cleanup. We're going to have an erasive

for the particles where we do the same

thing. We remove them if they have faded

away. We're going to have an erasive for

the rockets where we remove the rockets

if they reach the right hand side of the

screen. And at the same time inside the

predicate, we're going to take this

opportunity to also destroy the

associated emitters. So if we know that

we reached the end of the screen, we're

also going to reset those optionals that

we had in the emitters vector. so that

those loads can be reused and then we're

going to return true to tell the

algorithm. Yeah, feel free to get rid of

this stuff.

Cool. So, how does this change our

performance? Let's do another round of

benchmarks. Same hardware, same program,

same conditions, and we have very

significant improvement for 200K, even

more for 400K. And as you can see, this

trend keeps going on. And on average we

have a 70% decrease in update time just

by changing the way that we store the

data. We haven't changed an operate any

operation. We are doing the exact same

calculations on the data. We just change

the way that we store it and the way

that we uh process the data. I also am

not showing it here, but I also got an

eight times boost in rendering

performance because now that the data is

laid out in groups, I can easily take

all the particles and send them to the

GPU as as one thing in bulk. Easily take

all the rockets and send them at once.

So you you're going to see over time if

you do this, especially with graphic

develop graphics development, that the

data rental layout is actually very very

friendly to GPUs. So the more you do

this, the more you're going to be able

to send data to the GPU more effectively

um as a side benefit.

Okay. So I don't know if this is

surprising to everybody uh but you know

probably you expected this but at the

beginning of the talk I also mentioned I

want to make this not just about

performance but also about simplicity.

So let's see if this is actually true.

Let's see if it actually makes things

simpler. Now the first thing I want to

show you is let's say we have a new

requirement. For example, we want to

keep track of the numbers or number of

rockets specifically. And this is

actually something I tried to do for the

demo. I wanted to have a counter for the

number of entities in total, but I also

wanted to know how many how many of them

are rockets. And I tried, but I couldn't

because in the OP approach, it is

deceptively difficult to do this

efficiently. So the first thing I tried

in the OP approach was something like

this. I'm going to loop over all the

entities. I'm going to use dynamic cast

and if it's a rocket I'm going to

increase the number of rockets. Now I

don't like dynamic cast. I hate it as

much as everybody else but it seems like

a good use case for it. It's a

statistical metric that I just want to

have as a tangential thing and this is

like a edge case. I just want to know

how many rockets I have. This worked but

it actually showed up in my profiler. I

was losing milliseconds because of

dynamic cast. So it's unacceptable. I

would have made this OP solution even

slower than it is just to count the

number of rockets which I I was

surprised about. I thought it would have

some overhead but not very significant.

So then I realized okay maybe I can do

something like this. I can have uh my

entity have a get type virtual member

function that returns an entity type and

then I'm going to avoid the amount of

overhead that I get from the dynamic

cast. But this defeats the purpose,

right? I don't want the entity to know

which entity it is through an enum. The

point of OP is that I want to think

about entities in the abstract sense and

I don't want an entity to tell me what

type it is. Otherwise, why am I using OP

in the first place? It didn't seem right

to do this. So maybe I thought uh the

rocket itself on construction could

inform the world, hey, there's a new

rocket in town and then on destruction

it could tell that uh I'm I'm going out

of scope. But this also didn't feel

right. You have more mutation of state

hidden within the internals of a derived

class. So it's harder to see the flow of

the code. And also I feel like this is

an SRP violation. Why is the rocket

responsible for metrics? Like didn't

seem right.

the world maybe could do this then maybe

I can have an add rocket function that I

can call and I will create rockets only

through the function and we'll keep

track of the rockets and in the cleanup

I would do some bookkeeping to decrement

this but now I'm adding a specialized

function for rockets so again I want to

think about in terms of entities I want

to give the world an entity not a rocket

it defeats the purpose right so I I just

didn't like this and there's also more

complexity because of the bookkeeping so

I'm not saying this is impossible you

can make this work it's just overly

really hard for what I wanted to do. So

I gave up in the end and you you've seen

in the demo that I just have the number

of entities I like you know that's the

best I can do. So what about a data

rented approach

that's it I know how many rockets I have

because I'm storing rockets separately.

So I just have a simple function called

to the size of the vector and I can get

them for free. So I feel like many

things this is just an example right and

you might think it's artificial but over

time as I' I've as I've moved towards

datentoriented design like I'm not

saying I'm going to go full data

oriented every single time I feel like I

get this small wins more and more often

so it does make things simpler

so it is also about simplicity and

another example I want to want to give

you is imagine you are like a new member

of the team and you have to work on this

demo you have to extend the code you

have to understand it. So you're going

to go in the codebase and you're going

to start doing a little bit of code

review. You're going to try to

understand how all the moving parts

interact, how it works, and what you

need to do to change it. So you see the

entity here, and you're like, okay, this

seems simple simple, and then you see,

oh, but we have a reference to a world

and also this extra state for the alive

boolean. And then you start thinking,

now every single entity might end up

doing something that changes the state

of other entities. And to know that I

have to cross reference all the files in

the codebase to see what's happening.

You have to jump around the source code

to get the full picture. At the same

time, this doesn't really sit right with

me. This is really annoying because we

are putting the draw member function in

the

in the virtual uh hierarchy inheritance

API thingy. We have very tight coupling

with the rendering system. So if we want

to change from SFML to SDL or another

library, we're not going to do that

because we're going to have to change 20

classes and you know they're very tally

coupled together which is quite

unfortunate.

At the same time, you know, if you look

at the world, you remember, yeah, this

seems simple on surface, but actually

it's hiding anything. This update could

end up destroying entities, could end up

creating new entities. So how do I know

what's happening? I have to look at

every single entity. So I I think in in

in um in practice this makes it hard to

understand the full set of the system

you need to keep into your head all the

possible right classes that are there

and what they can do at any point in

time

and yeah if you remember this ends up

affecting the outside world. So I I I

think you get my point

for the data design. Um I find it nice

that the entities themselves are just

data like there is a clear separation

between logic and data and there's

nothing hidden there. It's completely

decoupled and what you see is what you

get like there's no special side effect

in the constructor of any member or

anything like that. It's just plain data

which is simple. At the same time, if I

look at the world, I can immediately see

what things are in the world. I know I'm

going to have emitters, particles, and

rockets, and nothing else. There's

nothing that could be added in from a

runtime from some other derived class or

something like that. You can easily see

all the types. So, the data is not

hidden. And at the same time in the

single update function, I know exactly

every mutation, every operation that's

happening. I can see the stage based

approach where everything is changed on

a you know pipeline basis. All the logic

is there and I can see the relationships

between all the entities in the same

place. So I know if a rocket is going to

affect an emitter, if an emitter is

going to spawn a particle, everything is

going to start in this update function.

Now for the coupling issue, uh you still

have some coupling here with the

rendering system. So you still have that

thing there the render target appearing

in the world but it's much looser right

if I want to change rendering system I

don't have to change n classes that end

up dealing with drawing I can just

change it in one place you know do the

right transformations and it's going to

be a bit easier to u adapt this to some

other system so I feel like this is much

easier for somebody especially new to

the program to get an idea of what's

going on

and I feel like this also opens up new

opportunities so I'm going get there.

I'm going to start by adding this new

requirement. Let's say that we want to

be able to serialize the data. Now, um

this could be useful for many things,

saving and loading, networking, and so

on. So, how do we do it in the

object-oriented approach? You could do

something like this. You could have your

entity being the extended with a

serialized and deserialized virtual

member function. The serialized take on

stream, the dializeream.

So um probably the second one shouldn't

be const but you know slideware

this might work but I feel like the

coupling is too tight. I am locked into

a specific format because at this point

I need to decide whether it's going to

be XML, JSON, binary or whatever. And

also I'm one of those people that cares

about compilation times. I have a few

talks about the subject uh here and

there. And by doing this, I'm going to

make every user of entity um virally

include IO streams, which is pretty

beefy header and it's going to slow down

compilation. I don't want that. I want

these things to be separated. I want the

serialization module to be the only

place where I include whatever

dependency I need. So one of the common

answers to this kind of problems in OP

is design patterns. We can use the

visitor pattern. We can have a nice

visitor base class that deals with all

the derived types entity can accept

visitor in this uh in this virtual

member function and then I can create a

serialization visitor dialization

visitor maybe one for JSON one for XML

and so on nice and decoupled until you

realize I'm listing all the types again

so I'm losing that benefit of OP I have

to think about every possible derived

types in the same place we lose that

individual responsibility principle

which is quite nice to be honest. We

reverse back into point to put all of

these in in the same place. We also have

more overhead. There's actually double

dispatch. We're going to have two

virtual calls every time you you call

this function. Now for serialization

might not be significant, but if you use

this pattern for other things, it might

might actually be significant.

And also you have to keep everything in

sync with all other visitors. And one

thing that I don't like is that

something that should conceptually be a

function parameter. Do stream should

only be available as a function

parameter whenever I want to serialize

or des serialize now becomes state. It

becomes a member of the class. But it

doesn't have to persist. I only need it

for the duration of the serialization

function call. So why do I have to store

it there? And even worse, then you

realize this, right? You have your

rocket and in your rocket you're using

pointers. So what do you do? It's not

impossible. Again, you can make this

work, but you have to figure out a

mapping from pointer to something that

can be serialized. And then you would

have to figure out a reverse mapping

when you dialize it. You can make this

work, but does it have to be so

complicated? I'm just trying to

serialize some state and I have to do

all this stuff.

Okay. And the DoD approach is just a

single function. You have a function

called serialize. You take your world by

construction just reading and then your

stream and then internally you do your

serialization. Of course you could split

this into separate functions. You could

have a serialized for rocket particles

and emitters but the point is you can

isolate this in its own translation

unit. Any expensive dependency can be

isolated that TU. So you don't have to

to worry about compilation times and

it's quite simple. Everything is there.

You see it and you don't have to jump

around in in your source code. So I

think it's a a nice win even in this

case. At the beginning I mentioned

opportunities. So why am I talking about

opportunities because once you do this

you realize that the entire state of the

program is just data. It's just bytes

that are meaningful. There are there is

no reliance on pointers or addresses. So

saving and loading which is something

that people dread implementing becomes

trivial. You literally just serialize

and serialize in a row and and you're

done. This leads to better testability,

debugability, and tooling. If you have

an interesting edge case that you want

to test, you just save the state and you

write a unit test to repeat that. I know

of gaming companies that do this. For

example, the people behind Factorio.

They um they store interesting edge

cases in uh with their game and then

they have this unit test where they load

a state and they run it and check that

the result is exactly what you expect.

Debugability. you have a weird bug that

it's hard to reproduce. You save the

state and you replay as many times as

you want. You can check the state at at

will. Tooling. Now everything is data.

You can write a nice UI. So during

runtime, you can change relationships.

You can change the values of your

classes. It becomes a little kind of

like in-game editor from your own

application. So you get a lot of

benefits out of this. Networking becomes

easier. You can send a snapshot of the

entire state of the wire and the other

player or user will get the exact same

view and then anything on top of that

can be a delta that's been that's been

sent on top of the last snapshot. So

having everything just being data helps

a lot and creates new opportunities for

things that are useful.

Okay, what about multi- threading? I'm

going to go a bit quicker. I don't have

that much time left. But in the op

approach, let's say we would just want

to multi-thread the particles. We care

about the particles is the bottleneck.

We don't care about other things. How do

we make it multi-threaded?

Well, it's a bit difficult, right? This

could do anything. Could create new

particles, could be an update on

something that is not a particle. So,

it's not really easy. I would say is not

truly paralyzable because also we have

rights to linked entities from the

update of other things. So you cannot

just trivally say I'm going to take the

loop and split it into chunks. That's

going to be data race help.

Um yeah, you could make this work. Maybe

you could filter all the particles in

advance and then store them in a

separate vector and then do that in

chunks. But the overhead of doing that

will likely defeat the purpose of

multiping in the first place.

With DoD, you see your update as a bulk

operation in your update function in the

world. And now this is probably the most

purely paralyzable loop you've ever

seen. You can do this in horizontal

chunks, vertical chunks. You could even

get away with just open MP and you just

stick a pragma open MP parallelize there

and it's going to do it for you. So I

think this is like the easiest thing to

paralyze. Again, this is a simple

example, but I want to show that

something that is conceptually simple

becomes complicated if you apply

principles that are meant to help you,

but actually end up hindering you.

So, my point is that data oriented

design architectures provide many side

benefits. It's not just performance, but

you're going to get better simplicity,

better flexibility uh out of it. But can

we go even further? Can we make this

even more optimized, even faster? So

we're going to do a small optimization

pass here just to get rid of branching.

What we're going to do is we're going to

try to avoid branching in hot loops. And

the way we do this is by grouping the

data beforehand.

We're also going to take an opportunity

at this point in time to try and reduce

the size of common types. One of the

things that you learn very quickly is

that small is fast. The more things are

can fit in cache, the better performance

you will have. So if you have types that

are unnecessarily big, trying to reduce

the size of types or the padding that

you have inside your types will end up

having better performance in the long

run. But we're going to still stop here.

No other major changes. We're going to

stick to this layout which is called

array of structures. Later we're going

to migrate to S SOA structure of arrays

and we're going to see how that actually

affects the performance.

So this is what we had. We had this

particle type. What we're going to do

now is we're going to get rid of that.

So, we're not going to have that enum

anymore. We're not going to have those

extra fields. And I'm going to show you

that the world is going to be

responsible for grouping the particles

depending on their type. We make that

branching implicit, if you will. At the

same time, um I realize we're never

going to have more than a few thousands

of meters. So, why are we using a size?

Sist.

But we don't need that. So, I can change

this to a unit 16. It is more limited.

You have to be a bit more careful with

these values now. But because I know the

nature of my program, I'm going to save

some space for the rocket. Maybe by

doing so, I'm going to be able to load

more rockets into the same cache line as

I'm iterating over them. Now, this is

not going to be very impactful because

rockets are not a bottleneck here. I

just want to show you the idea, the

principle. If you have something which

is in a hot loop and you can figure out

let's say constraints about the data um

of the fields of the objects changing

the data types reducing padding reducing

size can actually improve performance

significantly in some use cases.

Okay, what else?

So if we look at the world here, we have

to do a change, right? we cannot just

store particles anymore because you

don't have a way of knowing which kind

of particles they are. So we're going to

have more containers and is another

common transformation in DoD. You end up

having a lot of containers that

implicitly provide information about

some properties of those objects. So the

first container tells you everything in

here is a smoke particle. The second one

tells you everything in here is a fire

particle. And we do the same for the

emitters. And we tweak the API a little

bit. And this is how we actually

represent the various types. Of course,

if you have a lot of types, then you can

think about meta programming, generating

these things at compile time or using

even a runtime generation mechanism if

you're loading these things on demand.

But for now, because we only have two

types, it's it's fine to hardcode.

In the update member function, we're

going to use little lambda here as a

local function to avoid repetition.

Again, C++ abstractions, modern

features, and data design can coexist in

my opinion. They can be synergetic. And

I'm going to avoid repetition here and

just have two loops, one for smoke

particles and one for fire particles.

For the emitter is the same thing. And

I'm going to use a little um higher

order function here. So this lambda is

going to take another lambda called fsp

spawn which tells the emitter what kind

of particles should be spawned. And then

when I actually call this I'm going to

specify as a callback that the smoke

emitters should spawn smoke particles

and the fire emitter should should spawn

fire particles. Again, I feel like it's

a little bit of obstruction, a little

bit of modern feature usage, but it's

enough to avoid repetition and still

keep the code very linear and simple

without hiding what's happening under

the scenes. So, I think this is a

reasonable amount of abstraction.

That is it. Once we do this, we can do a

little bit more benchmarking and we're

going to see that it's not that big.

there is a consistent improvement of

around 5.73%

in terms of decreasing update time which

is small but you know we'll take it and

as I mentioned uh this particular

optimization here wasn't very impactful

because most of the bottleneck in our

program is updating the particles

themselves while the branching was not

the hot loop but depending on your

workloads doing something like this

might give you a much better performance

increase not pictured here is also uh

after doing this change I got a 12%

drawing improvement like rendering

performance improvement and the reason

is that now the data is again laid out

more nicely for the GPU I know that all

the particles that are fire particles

are going to be in one container and

they all use the same texture so I can

just send it to the GPU and the GPU is

going to be happy and you can do the

same with the smoke particles so DOD

really aligns very well with what the

GPU wants and it's it's good to keep in

mind if you're going to do graphics

devel development if you're going to use

GPUs to accelerate your programs.

Okay, last thing that we're going to do

is we're going to change data layout. So

you always probably hear this idea of S

SOA SOA structure of arrays. Is that

actually going to be beneficial? Is it

actually going to make a big difference?

So why don't we just try it and see how

that performs?

The idea behind S SOA is quite simple.

Instead of having a single strct that

represents an individual particle, you

don't do that anymore. You have a strct

that represents a collection of

particle, a strct of arrays. So each

individual field of your particle is not

going to be an individual field, but

it's going to be a collection of fields.

So you lose the explicit object, the

explicit entity. Now a vertical slice

over all these vectors just like a table

becomes your entity. So we're going to

have this particle S. also a strct that

contains positions, velocities,

accelerations, and so on. And the

implicit invariant here is that all

vectors are going to have the same size

and they're going to be um growing and

shrinking um in sync. So all of this

stuff should be in sync.

Visually speaking, looks like this. The

AOS layout, every index is a complete

object with all its fields. The S SOA

layout, as I mentioned, is like a table.

every index contains a vertical slice

which represents conceptually speaking a

particle but you don't really have that

in in your code anymore. So if you want

to do a little bit of a comparison the

AOS layout has explicit entities every

object is a real entity in the code

while the S SOA as implicit entity is a

conceptual slice across the arrays.

AOS might suffer from internal padding.

If you have alignment requirements for

your fields, the compiler might end up

adding some extra bytes in between them

that are still going to take space in

the cache. So you might be wasting cache

space because of padding. With S SOA,

because every container contains only a

single type, unless you're using some

weird type that has some alignment

requirements as a decoration, then every

single bite is going to be useful.

There's not going to be any waste of

cache space.

This is very important. If you only care

about a single field, for example, we

only care about the position of

particles. We just want to find all the

particles that are out of bounds, right?

With AOS, it doesn't matter. You still

have to load things you don't need

because position is close in proximity

in memory to velocity, acceleration,

other things. So even if you only care

about the position of the fourth

particle, you might get some data from

particle three at the end loaded in

cache and maybe the velocity

acceleration loaded in cache as well. So

if you need a subset of fields, you're

going to waste a lot of space accessing

other things. With S SOA, we have what I

like to call the surgical field loading.

If you you only get what you need. If

you only care about positions, you're

not going to pollute your cache by

putting in things that are not

positions, which is quite nice.

AOS is naturally resistant to

vectorization. The data is scattered.

SIMD instructions like when the data is,

you know, homogeneously close to each

other. It is not impossible. We're going

to see that it actually happens, but it

makes it harder and a bit more

expensive. S SOA is ideal for

vectorization. Is exactly the way that

SIMD instructions like the data to be

in. It makes it trivial for the compiler

to perform auto vectorization assuming

that you're doing operations on the data

which you know are are quite simple and

straightforward and it's not on the

slide but again I would also say S SOA

also is very nice for GPUs. GPUs like

that layout a lot compared to AOS.

So um yeah let's implement this. Let's

see how how it actually behaves. One

thing I'm going to do before

implementation is just have a little

helper function here because I hate

repeating things nine times. So I'm

going to have this function called for

all vectors. It takes some callback f

and applies that function f on top of

all the vectors at once. You can imagine

that if you want to resize or reserve

all the vectors at once instead of copy

pasting the same dot resize nine times

we can just call this function once and

provide the right operation. just a

little helper that avoids um you know

repetition of nine times the same code.

That's going to be our particle S SOA.

Then we take our world. We change our

vectors of particles to be particle S

SOAs. We could have done this for

vectors, sorry, for rockets as well or

emitters. Not very significant. The

bottleneck is particles, so I'm not

going to bother.

And in the update member function,

again, I'm going to use a little lambda

here that updates a specific particle.

SOA. I'm going to get the number of

particles. Here I'm just using

positions, but you could use any vector.

Again, the implicit invariant is that

they're all the same. And if you seen

Barry's excellent talk on u on Monday,

he showed you how to use reflection to

create S SOA automatically and stored,

you know, just a single session capacity

to avoid that waste. I just wanted to

keep it a bit simpler, but I recommend

seeing that talk if you are interested

in that. And I'm just going to loop over

all the particles via index and I'm

going to perform these operations. So

now you might be thinking, does it make

sense to have a single loop? Is it

better to have five loops and each one

individually just loads the thing that

it needs? I've tried it. In all

scenarios, having a fused loop was uh

faster. So again depends on your cache

size uh platform specific um let's say

constraints and requirements. In my

situation having a single fuse loop was

always faster.

The last thing that I want to show you

is the emitter update because this is

not nice. You're going to end up doing

this. So whenever you want to spawn for

example a smoke particle, this is what

it looks like. you're going to have nine

calls to push back where everybody is

going to be a single field. And

honestly, uh I cannot defend this. It's

horrible. But that's that's what we get.

Yeah. So, it's pretty much the same. I'm

not going to bother seeing the other

things. The cleanup is the other

interesting part because you're going to

find some friction with standard library

algorithms. They're not designed with

SOA in mind. So what I did here is I

defined this has negative opacity uh

helper predicate which doesn't just take

the index but also the S SOA it belongs

to and it tells me whether that nth

particle in that S SOA has faded away

and then I wrote my own erasive for S

SOA and I'm going to pass in my particle

S SOA and my predicate to remove those

particles.

Now what is what does that look like? Um

just to show you give you the idea. It's

kind of like a partition algorithm. I'm

going to start from the left and the

right. I'm going to skip over all the

particles that don't need to be removed.

And then I'm going to move everything

that needs to be removed towards the

right hand side by using the for each

vector to performs the swaps

simultaneously on all nine vectors at

once. And then at the very end, I'm

going to resize all the vectors at once.

And you can probably see the value of

that small helper function here.

Otherwise, you would have to use nine

calls to resize and so on, which is not

great and very easy. uh to make a

mistake.

So how much does it matter? How much is

the improvement here? And it's quite big

actually. We have around

32% decrease in update time. And at the

same time, I promised I would tell you

about rendering performance two times

faster. And the reason why it's two

times faster is because I can simply map

the fields to GPU buffers and I can just

send everything continuously at once

which is very very efficient for the

GPU. But if you think about it, why are

we getting this nice big boost of

performance? This is probably the worst

case scenario for S SOA. I mentioned

before S SOA is very nice when you want

to get subset of fields like if you only

care about position, if you only care

about scale. But here for every single

particle we're using all the fields. So

it should be you know comparable to AOS.

We're all we're still loading everything

in cache. Why are we getting this huge

boost even though we're not leveraging

some of the nice properties of S SOA?

And of course to figure this out I went

to the assembly and I was taking a look

at the AOS versus SOA. You can see that

in both cases you get sim instruction.

We get a fuse multiply add for the

integration steps which is probably the

best thing you can do here with floating

points uh to perform that you know

integration of velocity and so on. But

for AOS you have a few extra operations

here. You have something called gather

and scatter. So what gad does is it

takes data that is laid out in a strided

way and it loads the addresses of that

data into a single cmd register so that

everything is continuous and then the

scatter does the opposite. It takes that

data and then it scatters it back to the

place where it beca where it came from.

So there is some overhead when applying

SIMD due to the fact that the data is

not laid out continuously. It first need

to be shuffled around a bit for SID to

be applied on top of it and then it

needs to be put back where it came from.

So this is probably the biggest overhead

that we get here is because of

vectorization. There might be other um

other factors here that uh affect the

performance difference because the

particle data in AOS is all continuous

per particle. You might have some cases

where the same particle lies on the

boundary between two cache lines. So to

read one particle, you need two cache

lines. Prefetchure might have a bit of a

harder time with stred access. I don't

know, just a theory. And this is

actually something that might be very

impactful because when we're doing the

gather and scatter, we also need to use

some registers to store those values

after they be gathered. Then you might

have a bit of extra register pressure.

you might have fewer register you can

use for meaningful operations as you're

doing the sim this stuff in here. So

this is how I explain the performance

difference. If you have any you know

other insightful ideas you can let me

know afterwards.

What about simplicity? Did we sacrifice

simplicity? And my answer is with this

current design absolutely yes. I don't

want to write nine push backs. It's not

simple. It's ugly. So the problem is

that C++ has no real native support for

the S SOA layout. The language and the

standard library are all built all

around AOS. I would have lied to use

stood erasive, but I could not have done

that. Maybe you can write some sort of

proxy iterator that when you dreference

and stuff like that, it just scatters

through all the vectors. Don't want to

do that. It doesn't sound simple. So, it

would have been painful. We've seen how

some functional programming patterns

help. Just simple things like a local

lambda or passing a function to another

function can help reduce repetition

significantly with a very small effort.

So that's good. But can we do any

better? And I think you know it's time

to reflect on that.

So we have this right. I like this

simple understandable. I don't like

this. This is a manual transformation

that I don't really want to do. I would

like to work with a particle as it is on

the left hand side and get all the

benefits of S SOA. So it would be cool

if I add this magical S SOA type that is

a template. I can give it a particle and

internally it builds up that structure

that we mentioned before all the vectors

for every single field would be really

cool. And then if I had this I could

call push back and I could provide all

my fields as a particle and internally

the the S SOA wrapper would scatter them

to their targets. Would be so much nicer

to write that compared to nine push back

calls.

Maybe I could have a with all interface

that I could use to loop over the data.

This would expose all the fields in

order as if they were an actual object

but they are still going to be loaded

from separate vectors. So it gives me

that nice aos-like way of thinking, but

the layout is still S SOA. But more

importantly, maybe I could have

something like this. I could say, look,

I only care about the scale and the

scale change, the rate in which the

particle changes its its size. So please

give me those two fields only. Load only

that memory and then execute this lambda

over here. So maybe I could have this

width function. I specify the fields

that I want with some pointers to

members and then I provide the lambda

expression to actually work on those

fields. Would be cool, right?

This is valid 17. You don't need to wait

until C++ 26. You don't need reflection

to do this. You can do this in CSA 17.

And I've done it using boost pfr which

is a very nice library underutilized

which allows you to do reflection on

aggregates. There are limitations. is

not full-fledged reflection. I'm still e

extremely excited and happy that we're

getting reflection in 26. But I think

that people underestimate how much you

can do even before that. T+ 17 gives you

tools that you can abuse to get

reflection for aggregates and this kind

of transformation is something you can

actually implement yourself which is

quite nice and I will uh I will uh link

you the code at the very end. At the

same time, can we do better? If we had

C++ 26 reflection, we could do this. You

don't have to specify those things

twice. You don't have to say that you

want the address of the scale field and

the address of the scale change.

The reflection system could just look at

the names of the parameters and the

types of the parameters you specify in

your lambda and from those names, it

could figure out which fields to load

from the S SOA. I don't know how I feel

about this. I think it's just, you know,

the the initial impact that having names

of things being so significant is a bit

scary, but it is the future that we're

going to have. Reflection is going to

enable APIs and interfaces that heavily

rely on names and things like that. So,

you better get used to it. Now, I'm not

going to implement this because Barry

again had a fantastic talk on Monday,

really good in its own regards that I

strongly recommend you go and see that

deals with this problem specifically. It

shows you how to use reflection in 26 to

transform an AOS data layout into S SOA

automatically. I've shown you that you

can do that even in 17, but with 26

reflection, I think it's way more

straightforward, allows you to do more

things, and I'm excited to see what

people come up with in the future. So,

go check out this talk and you're going

to get all the information you want

about reflection.

Cool. Almost done. Just a few more

things. SOA is not always the answer. I

think there is a misconception and

people think data oriented design means

S SOA

not really data oriented design means

something more important than that. It

means that you design your architecture

with a major focus on the layout of your

data. The access pattern that you're

going to use to operate on your data

chooses the layout, not the other way

around. So you have to ask yourself

questions. Which fields are frequently

accessed together? Maybe those things

should be together in memory. Are there

any cold fields that I might use only

once in a while? Maybe you can store

them separately from the rest of the

fields for the same object because

they're going to just waste cache space.

What is the sim length size on this

platform? Your architectural choices

might depend on what you're targeting.

You're targeting a PlayStation, you're

targeting a laptop might be different,

right? But this is where probably I

would um disagree with the rest of the

gamedev community. I like abstractions.

I like being able to say SOA of

particle. And if I have that and if I

have maybe AOS SOA, which is like a

hybrid layout that does the same thing,

I can experiment. I can see which one is

better on this platform. And just by

switching my S SOA to AOS or to some

other template that does something

similar, just by doing that compile time

switch, I can keep the rest of the code

exactly the same and see how the memory

layout uh affects the performance. I

like being able to do that. So switching

between layouts at compile time will be

something you can do and 26 will make it

easier. With reflection, you can

implement any sort of transformation

that you want that allows you to

experiment and try out new layouts and

see how it changes your performance.

And also, DOD is not always the answer.

I don't hate OP. I think that OP um is

problematic when it's used as the wrong

granularity.

If you use OP at a higher level course

of granularity, I think is a great tool.

If you have an op particle manager with

a nice API that does encapsulation,

gives you private and public access, has

nice environments and so on and

internally you use DOD for speed, by all

means do that. And if you do so, you can

also swap strategies at runtime. You can

have a particle manager virtual

interface that then you implement for a

you know AOS implementation, SA

implementation and so on. And because

the virtual calls are going to be just a

few compared to a few million, it

doesn't matter. It's nice to have that

uh possibility and that sort of

flexibility thanks to the OP principles.

So OP is not bad. It's just the is

sometimes used at the wrong level of

granularity.

Also, if you're working in teams and

with other people, having clear

boundaries and the SRP single

responsibility principle can boost

collaboration.

If you have large teams with no novice

developers, having invariance and access

control can help people understand how

to modify the code. Also, we I've worked

in places where um you know we can agree

on an interface beforehand and then we

implement that interface in parallel for

two different things and then we merge

it later on. Having that sort of like

layer of a structure is really useful

for collaboration. You can use polymer

for plug-in systems, but what I

recommend is a hybrid approach. OP is

the shell and DOD is the engine. So if

you manage to find a nice reasonable way

of combining those at the right levels

of granularity, you're going to get the

best of both worlds. Almost done. Some

final takeaways.

Design for performance from the start.

If it is a requirement, it must be an

architectural priority. It's not

permature optimization. If speed is a

requirement like you need to target a

certain uh frame rate or something like

that you must take account of it in your

design after you do that then you can

profile and see what the bottlenecks are

in that particular design but don't

think about the parameter optimization

is something important for the success

of your program.

Flatten and shrink your data. Prefer

vector of t to vector unit pointer of t.

You're going to avoid all that extra

indirection and have stuff more

contiguous. Minimize set of structure

using hot loops and steer away from

polymorphis trees for things at a small

granularity. For high granularity stuff

might be a good idea to use them.

Depends on your use case. Group data by

usage. Instead of having a single

container and a lot of flags and

booleans that you branch upon. Use more

containers. Make those flags and

booleans implicit by pre-arranging your

data. is not just good for performance

but also helps you see everything that

you have already laid out. It helps with

simplicity and understanding of your

code. And if you have bulk processing

which might benefit from SOA, consider

it. But don't think that DoD always

means SOA. As we mentioned before, it's

a mindset. It's a way of thinking.

Data drives design. Target the machine.

It's a spectrum not dogma. Embrace

Monto+. helps and pragmatism always

wins. So I just want to leave you with

this. Think data first. Your code will

be faster and simpler. And thank you so

much. I will take questions.

>> Okay. I have a question.

>> Yes.

>> So you you asked the first question in

my talk and I appreciated that. I think

this talk was perfect. You've actually

done what allocators can do by hand in

the sense that you've laid everything

out using vectors. Vectors don't have

the same problem as other kinds of

containers because as you delete them

and and put things back because there's

no internal part to it, you don't get

the kind of diffusion you would had you

designed in in a composite world. If you

were not designing at this level, but in

an intermediate level, can you see that

allocators could recover much of the

loss? Could you talk about that?

>> Yes. So I it's a good question. I

consider talking about allocators for

this talk. I wanted to keep it a bit

simpler and a bit more philosophical in

a sense. But I do think that if you end

up in a situation where a pragmatic way

of recovering performance out of a

design that is has been OP and it's

really hard to change because of you

know change friction and stuff like

that. Allocators can help you

definitely. I would be surprised if I

didn't get a huge speed up by applying

an allocator for those entities that I

had before. So it would be something

interesting to try out would be

something interesting to benchmark. But

I do suspect allocators are a good, you

know, pragmatic intermediate solution to

this problem.

Sorry.

>> Um oh there um I think the an

object-oriented programmer would say

you've lost some of the relationship

between some of those objects. like

there was a very sort of

clear relationship between the OB uh say

the rocket and the emitter that you sort

of lose in that sort of data layout. So

I see there's benefits but what would

you say to that?

>> So I would say by looking specifically

at the data itself I can see the point.

Um one one one comment I have to make

like is that just use a plain integer

for the index but you could use handles

you could use strong type defs that have

more semantic meaning in terms of

relation to the express. On the other

hand, I feel like that relationship

becomes much more visible when you are

operating on the data. If you look at

the update loop, you actually see what

is being done with that emitter with

that rocket. You don't only see that it

is that there is a relationship, but you

actually see what the relationship is

for. So I think you you gain more than

you lose in that particular aspect.

>> Oh, sorry. Yeah, I have to get used to

this.

>> Hi, thank you for the talk. I really

like this kind of design. I I was

wondering how would you compare the the

testability in between an OOP design and

a DOD design? Is it easier?

>> What kind of stability is

>> the the testability?

>> Testability. Yeah.

>> So I think from DoD point of view, you

get some nice benefits. For example, as

I mentioned before, because your problem

is just data, right? You can literally

store an interesting test case as data,

load it and try different things on

that. I mentioned there are gaming

companies that do this with complicated

scenarios that I want to test out. they

just store a save file and they load it

and so on. With OP that becomes a bit

harder because you don't have that easy

serialization aspect that I mentioned

before. OP might be nicer in some

aspects. For example, mocking is one of

the places I was thinking of or

dependency injection. So I think as long

as you apply the OP at the right level

like the higher level components you you

should use OP, right? I'm not saying

never use OP. I think it's just using it

at the wrong level causes problems. So

if you are doing some business logic

where you have a database you know um

abstraction that could definitely be an

OP abstraction then you could mock that

into your own system but then the the

processing you might be doing on the

data you get from the database if you

represent every row as a heap allocation

that's where you lose the performance

benefit so I think they both have their

own strengths and cons and if you apply

them at the right level you can use them

also in testing let's put it this way

make sense

>> thanks really uh evocative and tight

example. Um

are there any tools or techniques to

find where I have that memory pressure

and where I should be thinking about

applying these types of transformations?

>> Yeah, so I had um quite a good

experience with the Btune profiler from

Intel. It was able to tell me um what

the bottlenecks were, how often my CPU

was idle, and if I was using the cache

properly. I'm sure other profilers can

do similar stuff. In the past, I've used

I think it was part of the val grind

suite. There is a coal grind tool as

well that does something similar. So

definitely profilers can identify

how much you're using even perf you know

tells you how much percentage is being

spent on actual work on cachm and stuff

like that. So that might be an initial

step then something like von which has

been more precise might be something

that I I would recommend. Yeah,

>> thank you for the nice talk for the

structure of Aries approach. What are

your experiences with view views? For

example, if you have a particle view

that just has references on the data

members, does this imp impact

performance or can the um optimizer

optimize this array?

>> So, my experience is limited. I've tried

some stuff out with ranges and things

like that. One of the things I've

noticed is that for example when using

ranges the zipu and things like that

without optimizations enabled you get

very significant overhead and also quite

significant compile time overhead which

I wasn't a fan of. So I think that if

you are able to always compile with

optimizations and you can ensure that

the compiler is in lining those things

it might be as efficient as a

handwritten version. I find that a more

functional approach where you just

provide the callback leads to you know

having less pressure on the optimizer

and the code being efficient even if you

are not able to optimize for example for

debugging easiness and stuff like that.

So I would like to see a world where I

can recommend hey use this stuff all the

time but there are practical concerns

with debug performance and compile times

that makes me a bit wary of doing that.

Maybe if you

if you with reflection if you unrolled a

very flat view that doesn't have many

layers of abraction templates then it

might be significantly

you know less problematic even when you

don't haveation but stuff like that so

again I would think that civil 26 will

enable this to be more much more

effective with reflection make sense

>> thanks

>> okay uh thank you for the talk it was

really illuminating thank you uh the

question I have is I kind of got a

feeling that having a batch kind of

processing is really important to get

the to reap all the benefits that a DoD

uh uh architecture would give you.

What are your thoughts on uh non-batch

uh accesses like if it's random access

or you're processing only few entities

that are spread out and not aligned with

the caches and yeah so in general again

it depends on your requirements right as

I mentioned before there have been very

successful games and applications that

don't care about any of this right they

just allocate all over the place it

works they don't have to reach high

frame rates and stuff like that so they

can get away with it if you are in a

situation where your data is naturally

graph shaped or something like that. It

becomes a bit trickier to do this,

right? Because the pattern that you're

accessing the data in is not as as uh

intuitive as simple as we seen in the

demo. But there are layouts. For

example, if you know that you're going

to do a level order transfer over a tree

or a depth first transfer, you can still

lay out that data in a flat array in

such a way that as you're performing the

transfer, the the leaves that you the

nodes that you're going to access after

the pattern that you're currently at are

going to come right after. So again, my

my recommendation would be if

performance is a requirement and you

want to make sure that you use your

architecture efficiently, think about

the access pattern and think about

potentially not perfectly aligning the

data in such a way that it is following

the access pattern, but at least

maximizes the chance that it's going to

get in cache as you're loading it. Does

that make sense?

>> Hi, nice talk. Um I just had a question

on on uh when when you flattened all the

data you actually duplicated the data

versus just deriving from the strct and

just using the leaf truck. Is there a

downside to do that because then you

have no duplication and seems like the

memory layout would be identical.

>> Yes. So I I consider doing this. Um

there is no drawback in performance. You

are correct in saying that the memory is

going to be identical. So it's going to

be performing the same way. It's a bit

of a choice in how explicit you want to

be. Like if I look at this strct, I want

to be able to see, hey, this thing has

exactly this fields. If you derive from

that, you also have to look one layer

layer up when you're reading the code.

It's not a big deal, right? But it is

one extra step you have to take. The

other thing I'm concerned about is once

you get that inheritance, even if it's

not virtual, if it's not polymorphic,

somebody's eventually is going to start

relying on that. Somebody's going to

see, okay, I can use an entity reference

here to refer to something. And then if

you want to change perhaps the way that

you're storing physical information in

the meter, you realize, hey, I don't

actually need the velocity in the meter.

It's always going to be synced to the

rocket. Then you won't be able to do

that. So it creates a bit of extra

coupling in that situation. So

>> the upside is if you add one, you just

add it to one place, right?

>> Yes. Yes. So I think there's pros and

cons. Definitely something you have to

consider. I don't think it's a bad idea

to do so. I think there's again some

pros and cons to both approaches. If the

repetition is significant, I would very

likely do this, but for just three

fields, you know, small, right?

>> Also still would work with uh the strct

of array, right? Because you end up

>> Yeah, you get the same layout anyway.

Yeah. So inheritance which is

nonpolymorphic inheritance is a good

tool reduce repetition for sure. I would

apply the same logic that I would do for

functions. Small piece of code just a

few times maybe not worth it.

>> Big piece of code used a lot of times

definitely extract it. That's what I

think.

>> Thank you.

>> You're welcome.

>> Hello. Thanks for the talk. Um I'm

imagining your example as more of an

engine uh than just a standalone thing.

uh say a user wants to add an update

where the rockets move around, change

color, rotate, whatever. How do you bake

that in to where you are still only

passing the data over once per frame per

update?

>> So, are you referring particularly to

changing the data layout to support more

requirements or is it about the

rendering? I'm not sure I get the

question.

um more require I guess. Um

>> yeah, more requirements. Uh yeah.

>> Yeah. So I I I think in the end like if

the requirements change, you're going to

have to change the code in a way or

another, right? With OP, if you had a

nice abstraction that managed to predict

correctly what you needed to do, you

might need to do less work to apply

those things. But it depends on your

initial abstraction. In this case, I

don't see a major problem. Right? If you

want those requirements to be hardcoded

in the binary like you want you say okay

version two now is going to support

differently colored rockets then you

might start adding a rock a color field

to the rocket see if that is good enough

or you could consider splitting rockets

by groups like we've seen before having

a container per color or something like

that if you want something more dynamic

like saying users can uh dynamically at

runtime change properties of things then

I think that's where we'll get into

scripting languages so I would probably

provide provide the basis of the

entities in in the code base. I would

add support for all the major sort of

like fields like color size and so on

and then I would have a scripting engine

that runs after the update that would

end up allowing the user to do stuff

like that. So I think that's probably

what I would go for if you need full

flexibility. Make sense? Thank you.

>> First off, great presentation. Love the

way you presented all that materials.

Very easy to take in, very informative.

Uh, so one of the things I take a lot of

flack for in my merge requests is using

callbacks quite a bit.

>> Um, and the way that I use them though

is because I prefer the machine to know

what task it has to do as opposed to

going through like a function pointer

and then having to determine if this is

here, if this is here, do this. I prefer

to skip all that and say either do your

task or do these two things first. The

classic example being let's say you

enable logging on your service and then

the function pointer approach would be

execute this function check if the

logging Q exists because it's a pointer

to it. If it doesn't exist then move on.

If it does exist log first then do your

task. Whereas I preferred to say hey on

program startup I want you to take in a

lambda that just uh captures all these

things and then sets the call back to

either log or not log off a jump. So

that way whenever you hit your callback

section you just go okay do my task and

not check if this if this then do task.

>> Is there like really a problem with

doing it in that way when it comes to

thinking of uh the speed of your code

and like avoiding lambda captures and

whatnot because I know lambda captures

tend to get a bit of a bad rep if

they're done poorly especially with

lifetime issues but for performance

issues is that really much of a concern

uh these days compared to everything

else. So it's it's an interesting

question. I think I would have to get a

bit more context on the application

performance require and stuff like that.

So when you use this lambda is like

stored as a ST function like do you have

the polymorphism? Yeah. So if function

has some overhead associated with

invocation, right? Because you have like

the virtual dispatch and all the stuff.

So if you're calling a S function in a

very hot loop might be a problem, right?

But if you're calling it in a few places

might not be a problem. It depends on

you know the

whether you're in a hot place or not in

your in your codebase. I don't know

about your use case. I would have to see

that but it doesn't sound unreasonable.

If you are in a situation where you can

afford the over the function it solves

the problem cleanly. I think it's fine.

You you've seen in my in my demo I did

use a lot of callbacks also with auto to

avoid that overhead. I tend to prefer

callbacks that are for lack of a better

world word uh local like in the sense

you you you define a callback but the

place where you invoke it is close like

you can see where it is. I've worked in

code bases where callbacks can call each

other from very different places and it

becomes pretty hard to figure out where

that comes from. So I think that's again

there's a little bit of art involved.

There's not there's not a hard rule. uh

depending on the requirement unit of

performance still function might make

sense or not depending on your

architecture how far away the callback

is in terms of like cognitive overhead

might be a good idea to use it or might

be a good idea to use another technique

so if you want to discuss this further

I'd be happy to later on maybe you can

show me some more some more context and

maybe I can give you better advice

>> yeah good answer thank you welcome

>> cool I think we are done if there are no

more questions thank you so much for

Cheers.

Loading...

Loading video analysis...