More Speed & Simplicity: Practical Data-Oriented Design in C++ - Vittorio Romeo - CppCon 2025
By CppCon
Summary
## Key takeaways - **Data-Oriented Design for Performance**: Prioritizing data layout over traditional OOP can yield significant performance gains by improving cache efficiency, making code faster and simpler to reason about. [01:13], [04:09] - **Cache Lines: The Smallest Memory Unit**: CPUs fetch data in cache lines (typically 64 bytes). Accessing related data together maximizes cache hits, as the CPU is often idle waiting for data from slower RAM. [20:37], [21:15] - **OOP vs. Data-Oriented Mindset**: OOP models autonomous objects and encapsulation, while DoD focuses on data transformation pipelines. DoD prioritizes data layout and access patterns for efficient processing. [30:00], [31:57] - **Structure of Arrays (SOA) Benefits**: SOA layouts, where data fields are in separate contiguous arrays, are ideal for vectorization and GPU processing, offering performance benefits over Array of Structures (AoS) by minimizing cache pollution. [43:52], [58:40] - **Simplicity Through Data Grouping**: Grouping data by usage (e.g., separate containers for different particle types) makes code simpler and easier to understand than using flags and branching within a single container. [44:44], [49:25] - **Hybrid Approach: OOP Shell, DOD Engine**: A hybrid approach, using OOP for high-level abstractions and DOD for performance-critical inner loops, offers the best of both worlds, combining flexibility with efficiency. [57:57], [01:21:48]
Topics Covered
- Why traditional OOP designs can be a performance trap?
- Shift your mindset: Data is transformed, not encapsulated.
- Data-driven designs unlock simpler serialization and tooling.
- SoA: Surgical field loading for maximal cache efficiency.
- Combine OOP and DoD: Shell for API, engine for speed.
Full Transcript
Okay, thank you everybody. Thank you so
much. Jason, as you said, it's my first
keynote. I'm equally excited and
anxious. You know, I usually give niche
talks. I know people are going to come
in the room knowing what to expect. So,
speaking to a large audience is uh
different for me, but I hope you're
going to enjoy the keynote. Today we're
going to talk about uh data entered
design, practical data design. And I
want to start with a photo. This is when
it gets there. This is a photo of Mike
Actton giving a keynote at CVON 2014.
That was my first ever conference. And
at the time, Mike was at Insomniac Games
and he came on stage and he challenged
some of the core beliefs of C++
developers. And this was my personal
start of my own data oriented design
journey. I don't know if you recognize
this individual. That is that is me a
long time ago. I don't look like that
anymore, right? I look a bit different.
But yeah, um I was asking Michael a
question here and for me it was a career
changing moment because I hadn't
realized at the time that there was
another entire way of thinking about
code about software that was different
from OP. So even if nowadays I don't
agree with all the conclusions that were
in that keynote, I would still like to
thank Mike for opening our eyes and I
would still recommend you to check out
his keynote. It's very very good. Okay,
a little bit about myself before we
begin. Uh right now I am an independent
C++ consultant, trainer and mentor. My
business focuses on delivering bespoke
C++ training and one-to-one mentorship.
Um I am available for tech training and
guest speaking. So if you're interested,
feel free to reach out afterwards or
check out my website. I spent the last
10 years at Bloomberg. I've been working
on high performance C++ backend
development mostly. I worked on our own
microser infrastructure team and around
market data analytics team and I've also
been doing technical training in terms
of both modern C++ and topics such as
meta programming and multi-threading for
Bloomberg engineering. I am involved in
standardization. If you heard of sto
function ref, I am partially responsible
for that. So you can love me or hate it
for it. And that's going to be in C++
26. You might also know me for the epoch
proposals which I feel like partially
lives on with profiles nowadays. So
that's also something else you might
have heard before. I am quite passionate
about games and game development and
open source in general. I'm part of the
SFML team. I was the person that led the
modernization of the library to target
C++ 17. So I strongly advise you to
check it out. We're going to use it
today for our demos. But I also
contribute to other libraries including
SDL and I have two commercial games
released on Steam which are also open
source. So you can check how the sausage
is made. And I also like virtual
reality. I contribute to several mods
for games like HalfLife 2 and Quake.
Last but not least, I am the co-author
of this book, Embracing Motors Bus
Safely. I wrote this book with my
friends and colleagues, John Leos,
Russell Lapenov, and Alistair Meredith.
It's a reference of all MoS+ features
that covers all the pros and cons in
detail. So if you're interested in that,
again, feel free to ask me afterwards.
But enough about me. Let's take a look
at today's goals, today's keynote, and
what we're going to do. So the goals for
today are I want you to discover a new
mindset. I don't know how many of you
here are familiar with it oriented
programming. Can we have a show of
hands? Okay, so quite a bit of you and
quite a bit of you are unfamiliar with
it, which is good. So this this might be
something new for you. You're going to
learn how thinking in a data first way
changes the way you design a system. At
the same time, we're going to have a bit
of a refresher on how hardware works,
especially memory, and how it
communicates with the CPU and why that
matters for the design of your software.
We're going to do this somewhat
interactively. We're going to, you know,
build this little demo together. I'm
going to show the demo first and we're
going to gradually build up to it. And
we're going to see how changing the way
that the data is laid out without
actually changing the operations we're
doing on the data can really make a
difference in performance. At the same
time, I don't really want to just focus
on the performance aspect of DoD, but I
also want to make a point that it can
make your code much simpler and much
more maintainable than OP, which might
sound counterintuitive, but I'm going to
show you some examples where I found
that to be true. And also, this is maybe
where I feel a bit differently from the
rest of the gamedev crowd. I think that
there still are places where OP makes
sense. We're going to discuss that
later. And I think that the use of high
level C++ abstractions can actually help
achieve your goals and there are places
where when used judiciously um can help
you get better performance and more
maintainability of your code. Okay, so
I'm just going to dive straight into the
demo. I'm going to show you uh the thing
that we're going to implement and I'm
going to give you an idea of what we're
going to do. So for this keynote I chose
something which is quite simple and you
know somewhat visually interesting. So
we're going to have this demo where we
have these rockets flying around. Let me
just yeah zoom it in. And every single
one of these rockets and smoke particles
and fire particles you see on the screen
everything is its own entity. It's
simulated. It has its own physical
quantities and you can refer to every
particular single thing that you see on
the screen through some sort of
identity. So to make the requirements a
bit more explicit, we want everything to
follow laws of physics. So we have basic
motion model with position, velocity and
acceleration.
We should be able to customize these
effects, particles and emitters of
different types with different opacity,
scale and rotation that change over
time.
Everything is an actor. So if I care
about a particular rocket or a
particular part particle, I should be
able to be able to um to have a handle
to it and do something with it. As an
example here, every rocket entity has
some attached emitters that are kept in
sync and spawn the particles. And
finally,
I want to design this in a way that it
is extensible. It should be easy to add
new effects, should be easy to add new
actors, new particle types, and so on.
So it is somewhat of a toy program but I
try to make it a bit more realistic by
having this requirement of being able to
attach entity to water entities and I
feel like it isolates the performance
principles quite well and the lessons
you learn even with this toy program are
applicable to real world code. Now if I
jump here and I open up this menu you
will see that we have some metrics
regarding update and draw performance.
And if I actually start spawning some
new rockets where the rendering is out
of sync for some reason, you will see
that as the rocket spawn, the frame rate
kind of tanks and it doesn't really
become playable anymore. I can zoom out
as well. I can show you that we have a
lot of them. And here you can select the
data layout that we're going to use.
This is the OP implementation. But if we
switch to something like AOS over here,
you will see that we have a dramatically
uh nicer frame rate and it just becomes
much more playable. And you can play
around with this stuff. You can try
other memory layouts and so on. But the
point is we have this demo. We can play
around with it. The operations we are
doing are the same. What changes is just
the memory layout that we use. And as
you can see, it makes a dramatic
difference. And now we're going to see
uh how that actually works.
Cool. So, excuse me. That said, let's
just get to work and let's start
implementing this. So, we're going to
start designing this according to OP
principles. So we have our own let's say
real world model interpretation of the
problem and we're going to try to
transfer that into code. We know that
we're going to have multiple entities.
So we can start with an entity base
class. We're going to have an emitter
class that produces particles. We're
going to have a rocket class that flies
around and has some emitters associated
with it. And we're going to have a
particle class, excuse me.
The emitter can either be a smoke
emitter or a fire emitter in this case.
And the particle in the same way can be
a smoke particle or fire particle. What
you see in light blue is a base class.
What you see in dark blue is a concrete
class. Now I like this. You know it's a
nice and simple uh hierarchy. If I want
to add something new, I can just derive
from entity. If I want to add a new kind
of a mirror or particle, I just derive
from those things. It fulfills all
requirements. Seems easily extensible.
So I don't see any particular problem
with this. Let's try to implement it and
let's see how the code actually looks
when we transfer this into C++.
So we're going to have a strct entity
here. I'm going to use strct just to
keep the slides concise but you know in
reality you would use class and private
and public and so on. I'm just going to
try to focus on the performance and
memory layout aspect of things at this
point in time because this is going to
be a polymorphic base class. We are
going to have a virtual destructor. In
this case we can default it. And then
we're going to have a virtual update
member function that takes the delta
time. So the amount of time that passed
between frames and uses that to advance
the state of the entity forward. We're
also going to have a draw member
function which is virtual. It takes this
thing called the render target which is
from the SFML library and it's just an
abstraction that allows you to draw
stuff either on a texture or a window.
In our case, it's going to be the window
of the application. So not very
important in this case. Now, because
everything needs to follow the laws of
physics, we're going to have a bunch of
vector 2fs here, which are just 2D
vectors of floats for position,
velocity, and acceleration. And if you
want to see how that actually looks in
practice is just two floats x and y with
some nice operator overloads that allow
you to basically multiply, add, and so
on. And again, this is from SFML. It's a
nice and convenient abstraction over a
2D vector.
And that is pretty much it. If we want
to up implement the update part of the
entity here, we can move the position by
the velocity scaled by the delta time
and we can move the velocity by the
acceleration scale by delta time. So
very simple physical integration not the
most accurate but again we want to focus
mostly on the memory layout not not on
the accuracy of the simulation. We have
a problem though right? Entities might
need to know the state of other
entities. they might need to create
other entities on demand. So we need
some sort of way of making that
possible. So one of the things I came up
with and I've seen uh being used in many
other places is something like this. The
entity knows the world it belongs to and
through that world reference it can
query the state of any other entity or
it can create new entities. It can
interact with the state of the world.
So, we're going to do something like
this for our program. And we're also
going to have this little boolean called
alive, which is going to be responsible
uh to tell the world, hey, this entity
is done. Please get rid of it, recycle
the memory, and use that memory for
something else. So, it's just going to
be used to communicate to the world that
already can be that the entity can be
cleaned up. The reason why we use a
boolean is that we want to be able to
communicate that through the update
member function. there might be some
conditions in the update that make the
entity um eligible for cleanup.
Okay, so that's the entity. Just to show
you some more code, we have the
particle. It's going to have its own
scale, opacity, and rotation.
They can change over time. So, we also
have the, you know, rate of change as as
data members over here. We can override
the update member function use what we
wrote in the entity again dry principle
we don't repeat ourselves and then do
the same sort of idea of physical
integration for scale opacity and
rotation. The only interesting thing
here is that we set the boolean uh alive
to be true only if the opacity is
greater than zero which basically means
our condition for getting rid of
particles is when they fade out because
every single particle is going to
eventually fade out in our demo. This is
going to be good enough to clean them up
when we're done with them.
And just to show you what the smoke
particle looks like, we just derive from
particle. We get everything for free
basically, which is quite nice. And the
only thing that we have to change is we
want to override the draw member
function and specify that we want to use
that data to draw something with a smoke
texture. And you can imagine this can be
done for the fire particle and any other
one that you want to add in the future.
Okay, the world is probably the most
interesting part. We have three kinds of
entities here already on the screen. We
want to store them in such a way that
it's convenient for us to work with
them. So, we're going to do this. We're
going to have a stood vector of stood
unique pointer of entity that contains
all the entities. So, those of you here
that are somewhat familiar with the
rented design might already be thinking,
hey, nobody writes code like this.
There's going to be an allocation for
every entity. It's not going to be very
cache friendly. But uh trust me, I've
seen this in production many many times.
If you look this up on GitHub, even with
share pointers sometimes instead of
unique pointer, you have hundreds and
hundreds of hits and you can read
stories about this happening in AAA
studios. People have successfully
shipped very good software that was
highly successful with this design. So
I'm not saying you cannot do this and
you cannot be successful. But if you
know that the drawbacks of the design,
you you also should know that this has
actually been done in practice. I just
want to know like how many of you have
written code like this or seen code like
this in practice. Okay, so pretty much
everybody, you know, this is not uh toy
code. This actually happens. And I also
want to say very popular libraries and
engines do stuff very similar to this
including the SFML library. So it's
something that is used quite all over
the place.
The update and draw member functions are
quite straightforward. We just iterate
over all the entities and we call update
and draw. You might see the nice aspect
of this. We don't care what update does.
We don't care what draw does. We just
delegate that to the entity. So it's
quite nice. And finally, we're going to
do uh a little bit of cleanup at the
end. We have this call to erase if which
is a quite recent function in the silver
standard. It takes a container and a
predicate and it efficiently rearranges
the items in the container in such a way
that they can be removed in constant
time.
Okay, what else? The emitter uh we're
going to do something like this. We're
going to have a timer that keeps track
of how often we want to spawn a
particle. We might have different spawn
rates. And then we're going to do
something interesting. We're going to
define this pure virtual function called
spawn particle that is going to be
overridden by the actual emitter types.
So the update function of the emitter
base class will periodically call spawn
particle but then is going to be the
derived emitter that actually overrides
that and decides what it means to spawn
a particle. The smoke emitter will
internally allocate a new smoke particle
using make unique and then it will push
that back into the world. So you can
also see here how we are referring to
the world from within the update of an
entity in order to affect his state and
create new things on the fly.
Okay, last thing that I want to show you
which is somewhat interesting is this. I
mentioned that the rocket needs to have
some associated emitters. So what we're
going to do here is we're going to use
raw pointers. We're going to say okay
I'm the rocket. I need to know what
smoke emitter is associated with me and
what fire emitterter is associated with
it with me. Now because we have this
implicit knowledge that the world is the
owner of the entities and as long as the
rocket is alive those emitters will be
alive. Using a row pointer is fine here.
We we thought about our lifetime. So
this works in our model and this is
something that you also commonly see
even in large scale programs. The way
you refer between components of the same
program is by using pointers. you're
relying on the address stability of
these objects. The reason why you can do
that in this case is because we are
allocating them on the heap. So even if
the vector gets reallocated with all the
entities, even if we move around the
world, the address of the emitters is
going to be stable.
When we create the rocket, we create the
emitters and we wire them up. And when
we update the rocket, we're going to
move them with the rocket. So pretty
straightforward. And this is pretty much
all the code we need for that demo. So
that said, let's see how fast this is.
So let's do a little bit of a round of
benchmarks.
So the machine I'm going to use is not
the tablet I'm presenting on. This is my
desktop machine. It's fairly beefy. It
has a Intel Core i9 which was
top-of-the-line processor a few years
ago. Pretty fast DDR5 RAM and I compiled
using Clank++ with 03. I disabled
rendering for these metrics, but I'm
also going to talk about rendering here
and there as changing the data layout
can also be beneficial for rendering and
not just for updates.
So with 200,000 entities, we get 2.3
milliseconds, 400k 6.6,
600k 12 milliseconds, 800k 16.6, and
1,21.6
milliseconds. Now you might think this
is just milliseconds. is is fast, right?
Who cares? But if you're thinking about
targeting real time applications, the
budget in a single frame that you have
to reach 60 fps, which is pretty much
the bare minimum for a nice interactive
experience, is 16.67 milliseconds. And
you can see already at 600K, we're
already blowing up most of our budget.
And if you think about having to put in
rendering on top of that, if you think
having to put in more complicated logic
or algorithms, you don't have a lot of
room to work with. At the same time,
remember these are pretty good hardware.
So on a mobile device, this will
probably be unacceptable. And at the
same time, I feel like 60 60 fps in 2025
is pretty pathetic. Honestly, even
mobile phones nowadays, they have 120 Hz
refresh rate displays. So the bare
minimum I think for a very nice and
interactive experience that feels smooth
should be 120 144 fps and if you want to
target that particular frame rate your
budget for a single frame is less than 7
milliseconds. So we are pretty limited
here. We have almost no no budget left
only a 400k particles and we're talking
about C++ we're talking about fast
hardware. We should be able to do better
than this. Now we've seen in the demo
that changing the memory layout of
things speeds up the program
significantly. So the problem is not the
algorithm. The problem is not the
operation that we are doing. The problem
is how the data is laid out in memory.
And the CPU is actually mostly idle
here. It's not doing any work. It's just
waiting for that data to arrive and
wasting time doing nothing which is
again unacceptable if we care about
performance.
So let's take a detour. I'm going to
give you a little bit of a refresher on
memory and tell you why it is important
that we understand how it works
internally to design our software to be
efficient and make good use of our CPU.
So at a very very very high level of
abstraction you can think of the CPU and
RAM as being connected by some sort of
bus and they communicate and everything
is fine. If we peel one layer of the
onion and we go one step lower again at
a high level of abstraction, you can
think of a CPU as something like this.
We have some core where the operations
are actually being done. Some cache
which is some very small but very fast
buffer of memory and then the
communication has to go through a
hierarchy. The data that's in RAM first
has to go through the cache and then
from the cache to the core and vice
versa. So every operation you need to do
has to follow this path through memory.
In real in reality I have a slide later.
We have multiple layers of cache. We
have multiple cores. So it's more
complicated than this. But the the main
principle still applies. The memory has
to follow this hierarchy and move
forward. Now the cache you can think of
it like this is a very small but fast
collection of rows and these rows is are
called cache lines. And this is a very
important concept. A cach line is the
smallest transferable unit of memory. So
even if you care about a single bite,
you still have to transfer an entire
cache line from the RAM to the CPU. Even
if you care only about that single bite
at the same time, you can imagine that
the RAM is a large collection of cache
lines, very very big, but it's slow. So
for example, if you want to read the
data at address 18, it's not in cache.
Our cache is empty. So we can identify
the data being there in RAM and only if
we care about the data at address 18 we
still have to take the entire cache line
from RAM which is a cache miss which
means we take the whole row and we copy
that into cache and if if you imagine
that we don't care about what's at
theent to this data we don't care about
what's next to the D here then we wasted
a lot of time moving data that we don't
care about which is quite unfortunate
and uses our cache in Effectively we can
keep doing this. Maybe we want to read
some data 26. It is not in cache. Again
we identified there in RAM and we have
to take the entire cache line and move
it into the cache which again is a cache
miss and quite unfortunate.
In the best case scenario for example if
you want to read the data at address 19
and we identify that it is already in
cache. You can see it over here. Then in
this case we talk about a cache hit we
don't need to do a round trip towards
RAM and back. So we are going to be able
to read this data way more effectively.
Now why am I telling you this? What is
the impact? How much does it matter if
we have to go to RAM and how much does
it matter if we can stay in cache? So I
like to explain things visually. So I
made like a little animation of arrays
between L1 cache which is the fastest
but smallest cache in a CPU and the
core. And on the bottom we have the RAM
and the CPU. So on the top we're going
to see the best case scenario when when
our data is already in cache and in the
in the bottom we're going to see the
worst case scenario where every single
time that we need data we're going to go
back to the run. Now this slide is going
to be slowed down significantly because
we are talking about operations in the
orders of nanconds but it's going to be
to scale. So keep that in mind. 3 2 1
go.
So by the time that the data and cache
has done multiple round trips you can
see the RAM still you know getting
getting there almost at the middle and
we've already done a lot of operations.
So this is actually to scale as I
mentioned this is actually what's
happening in your program. So if you
have all your data coming from RAM
you're wasting a lot of time just
fetching that data. In the worst case
scenario it can be up to 100 times
slower. So doing a round trip in RAM
compared to L1 cache can be up to 100
times lower on modern CPUs. And just to
give you some numbers the amount of time
that the average blink of a human takes
is 100 milliseconds. The amount of time
that an L1 cache reference takes is 0.5
nconds. So by the time you blink, you
can have 200 million L1 cache
references. In RAM is going to be 100
times less than that. So it's fairly
significant. so significant that
sometimes the choice of where your data
is located is much more important than
the algorithm or data structure that you
use which can be pretty surprising
especially if you are deeply into
theoretical computer science. Sometimes
having a algorithm with worse complexity
can perform better on real hardware.
Okay. So what have we learned?
Very important. The smallest universe of
variable memory is a cache line. So no
matter how much bytes, how many bytes
you want, you're going to have to take
an entire cache line. Generally, this is
64 bytes on modern SOCPUs. Thankfully,
we have a very intuitive way of asking
for the cash line size in C++, which is
stood hardware destructive interference
size. So, you know, you will always
remember that, but you can think about
64 bytes.
All data must always traverse the memory
hierarchy. So if you need something from
RAM, it has to go through all the levels
of cache and then back if it needs to be
flushed back into RAM. Which means that
the spatial locality of data, so where
the data is actually laid out in memory
greatly affects performance and as I
mentioned sometimes even more so than
the choice in algorithm or the choice in
data structure. So we can already start
thinking about some tips, right? If I
have data that is related to to to each
other and I want to access it relatively
close in time uh you know after I access
the first one I want the second one and
the third one and so on then it is
better to store it together in memory
close to each other physically speaking.
So preferring flat and continuous
storage leverages the cache better and
maximizes the chance that if you as you
are taking those cache lines in the data
that you want is already going to be
there because it's part of the same cach
line. At the same time, there's also
something I didn't mention which is
prefetching. CPUs have this speculative
mechanism that basically figures out the
pattern in which you're accessing
memory. For example, if you're in a loop
and you're going forwards or backwards
or you're jumping every nth element to a
threaded access, the CPU can figure that
out and it will already start giving you
the cache lines that you might need in
the future even before you request them.
So also doing very predictable
operations can greatly improve the
performance of your program.
And as I mentioned you to you before I
lie to you. You know the situation is
more complicated. So if you want to be a
bit more realistic this is kind of what
what it looks like in a more modern CPU.
You might have an L3 cache shared
between multiple cores which is larger
but but um but slower. Each core might
have an L2 cache which is a bit bigger
than L1 but a bit slower. And then you
might have an L1 cache for data and L1
cache for instructions. Now I find this
interesting because you know code is
data. When you compile your program into
a binary, the actual code you generated
has to be loaded into memory. So
sometimes if you optimize your code for
size and not for speed, it might
actually be faster. And the reason is
that it might use the instruction cache
better. You might have more of your hot
loops fit into the code cache itself.
So, we're not going to cover that in
this talk, but if you go deep into this
topic, sometimes the way that the code
is aligned also really matters for your
performance. Now, this is all I'm going
to tell you about um you know hardware
and CPUs and memory and so on. I want to
refer you to this nice talk from Scott
Meyers is from 2014, but it's still very
relevant today. CPU caches and why you
care that goes deep in detail. And also
Jonathan Mueller gave a talk here at
CPPCON called cash friendly C++ which
covers the same topics and it goes quite
in detail as well. So I think if you
watch these two talks you're going to
have a very nice understanding
appreciation for modern hardware and
you're going to be able to get a nice
intuition of what can be fast or slow. I
greatly recommend watching both talks.
Just taking a little break.
Cool. So given that why is our
implementation slow and we can actually
figure that out quite easily just by
looking at the world implementation. We
have this entity uh entities vector that
is a vector of unique pointer which
means every single entity is going to be
allocated somewhere in memory likely not
close to other entities. So it's going
to be scattered around which means if we
iterate in our update if we iterate in
our draw and also in our array if in the
worst case scenario every single
iteration would be a cache miss and we
have seen how slow it is can be 100
times slower than L1 reference. So this
probably the worst case scenario for the
CPU. It's just going to stay there
waiting for data to arrive. At the same
time we have other sources of overhead.
We're using virtual dispatch. using
vtables and polymorphism. So whenever
you access the update or draw member
functions, there's going to be a vtable
lookup which had some overhead. Probably
not as important as the cache miss, but
it's something else that that we have to
care about. And finally, if you remember
the way that we spawn particles, we
actually call make unique over here and
arrays if will actually destroy those
unique pointers. So we have a frequent
churn of dynamic allocation and
deallocation. So you can already figure
out this is not going to be very
efficient. We're doing a lot of extra
work, a lot of waiting on memory and a
lot of overhead due to virtual and
allocation to the allocations. So my
question is why do we write our code
this way? Why did we jump onto the OP
hierarchy and with the virtual interface
and so on? And I think this has to do
with the objective mindset. So most
people's introduction to programming
actually is OP university books whatever
you learn about classes you learn about
inheritance you learn about you know the
usual shape based class which can be
circle rectangle or the animal based
class and so on but also it's it's it's
a natural choice like for humans it
aligns quite well with the view of the
world we think in terms of individual
objects individual things and we have
this is our relationship in our head it
just works well so This mindset sort of
works like this. I identified four
things that I think are important. We
try to model a world of autonomous
objects. We think of self-contained
agents with their own identities and
responsibilities. We have a particle.
The particle has its own data. It knows
how to update itself. It know how it
knows how to draw itself. These entities
communicate between each other and with
the rest of the program through
messages. The main loop, the world
doesn't know what the particle is
actually doing. It's just asking, could
you please update yourself? Could you
please draw yourself? We don't care
about the internals. We're just asking
those operations to be done.
The data is hidden. So, we don't care.
We don't expose the internals of these
classes. We hide them. We encapsulate
them, but we expose the behavior. We
don't care how the particle gets
updated, which might be nice sometimes
because it allows us to change the
internal representation without without
changing the behavior, but we're losing
information about something really
important about the data layout of our
particles.
And also, I feel like OP tends to
encourage people to plan for the
unknown. So, you're going to try to
figure out some sort of abstraction or
interface that not only works for
today's problems, but for any problems
you might have in the future. And I
chose the word bet here very carefully
because in my opinion it is a bet. It's
really hard to predict what kind of
requirements you're going to have in the
future. And if you get your prediction
wrong, getting out of the wrong
abstraction sometimes is more expensive
than not having done it in the first
place. So if you're lucky, you might
save some time, but more often than not,
it's really hard to predict what might
be needed in the future.
In contrast to this, if we set this
aside, let's see how the data oriented
mindset thinks about the same problems.
So we don't want to model a world of
autonomous objects. We want to model a
world of data transformation. We look at
code as a pipeline that transforms data
from one state to another. We don't
really care about this notion of an
object, of identity, of encapsulation.
It's just data. We don't have messages.
We operate directly on batches of data.
So the entity itself is not in control
anymore. It's not the individual that
matters. We have operations in bulk on
the data done from the parent object.
The world is going to be responsible for
the update and the dropping of all the
entities. It is in control. And I'm
saying in bulk here because the most
common case for this sort of
applications is not adding a single
particle or a single entity. Having many
of them is the common case. So why are
we designing it with the individual in
mind when the actual real common case is
having things in bulk? At the same time,
we don't want to hide data. Data is the
most important thing. We want to make it
transparent. We want to laid it out for
efficient processing and we want the
behavior to be centralized at a higher
level that sees all the data and is able
to figure out, oh, this is the best way
of actually processing the data that I
have. This is also something that I feel
like is more subjective, but I have the
feeling that this mindset tends to
encourage developers to plan for today.
You want to design for the problem that
you have at hand. You want to prioritize
performance and simplicity to solve that
problem. And you don't want to solve any
problem that you don't have or you might
have in the future. And sometimes this
might actually pay off. Sometimes it's
easier to change your code to adapt to a
a new problem if you if you didn't start
with the wrong abstraction to begin
with. So sometimes this might pay off
even for future extensibility.
So how do we shift our mindset? I think
you have to internalize we have to
internalize that no matter what the only
purpose of code is to transform data.
The focus should not be on modeling an
abstract world of objects that make
sense in our head but on the data's
journey. So we want to get from point A
to point B. How do we do that
efficiently and in the most simple way?
Data is the centerpiece. It's not
something that we want to hide. Why
would we hide the most important part of
your program that actually makes it work
efficiently? We actually want to make it
visible, understand it shape, it size
and access pattern. And those are the
things that are going to drive the
design of the application. The
operations that we do on the data
modern computers, even the best
supercomputer that you have, they all
thrive on simple and predictable work.
So as long as you design your program to
feed computers long straight runs of
contiguous data, you are likely going to
get good performance.
And finally, this is more of a you know
philosophical thing. You want to design
for the machine that you have. You want
to be familiar with the platform you're
targeting with the capabilities of your
hardware because an effective solution
is not aligned with the metaphor that
you have in your head of the problem
you're trying to solve, but with the
physical reality of the hardware. So
this is sort of like the shift that you
would have in your mindset if you want
to approach data oriented design instead
of OOP.
So that said, how can we start
optimizing our code to um move towards
this mindset? Maybe not fully but moving
towards this idea. So for our first
optimization path, we're going to do a
few things. We are firstly going to get
rid of individual heap allocations which
I think is the main bottleneck of our
program. We are going to get rid of
inheritance which is going to flatten
our hierarchy and we are going to
decouple the data from the logic. The
entity classes will now just be the data
and the logic is going to be one level
up. The world is going to be the one
that deals with all the behavior so that
we can see a full picture of what we
have and operate on data in bulk.
At the same time we have a problem right
before we had this nice vector of
entities where we could store things uh
homogeneously. But now what we're going
to do is we're going to have multiple
containers, one per type of entities
that we want to store. It might seem
more cumbersome, but you will see
actually makes sense as we want to
process these things differently. They
have different properties at different
behaviors.
Okay, so we're going to have our emitter
strct. We're going to have our particle
strct and our rocket strct. They're all
going to have their physical quantities.
So we have a bit of repetition, but it's
minimal. Who cares? The emitter is going
to have its own, you know, floating
point for the timer and the spawn rate.
And the particle is going to have the
same quantities as before. But now we
have a problem, right? Before we could
differentiate between fire and smoke
particles because we had this nice
hierarchy. So what do we do now? At the
moment I'm going to take the easy way
out and I'm going to do this. I'm going
to say we have an inum class called
particle type. It's going to be either
smoke or fire and I'm going to store
this information in both the emitter and
the particle. And depending on that we
will do different things. This is not
ideal. We're going to see later how we
change this. But so far this is going to
be fine. The other problem that we have
is that in the rocket we need to refer
to two emitters. We want them to be
linked together. Before we had this nice
property that we could use the address
of an emitter. It was stable and we can
use that to communicate between these
things. But now if we are removing
heapocations there is no guarantee that
the address of the emitters will be
stable. the vector might reallocate. We
might move things around in memory. So
what do we do? Usually um a there are
many solutions but a common solution for
this is using indices. So you're not
going to rely on the address stability
of the actual object in memory. You're
going to rely on the index stability on
the position of this object in the
vector it belongs to. You might need to
change the vector a little bit. We're
going to see how but is a common way of
dealing with this. Other ways might be
using some sort of a hash table where
you store the key and then you look up
the object. You might have special data
structures that are designed with DOD in
mind that help you achieve this. But
generally speaking, the point I want to
make here is that now the relationship
is data. It's just a number. So it's not
something specific to the hardware
anymore, but it's just an index.
Okay. So how do we change our world to
fit this new design? We're going to have
a vector for particles. We're going to
have a vector for rockets and an
associated add rocket function that ends
up doing any wiring that necessary with
the emitters. And then we're going to do
this which I think is quite interesting.
We're going to have a sto vector of
stood optional of emitter. The reason
why I'm using an optional here is to
guarantee index stability. So you can
think of this as being slots where an
emitter might be in or might not be in.
And by looking at the index, we can
guarantee that the emitter at index 4 is
always going to be the same emitter. If
you want to get rid of another emitter
and destroy it, we just make the slot
empty, but we don't have to shuffle
anything in memory so that the index
stability is retained. And again,
there's other ways you can deal with
this, but this is a very simple solution
that works for our use case.
To go along with this design, we're also
going to have an add emitter function
that given an emitter will put it in the
first free slot that is available and
then it will return the index of that
slot. And all these things in
conjunction they work together to
replace the relationship that was based
on address stability by something that
is datadriven. We are working with just
numbers indices and we get the same sort
of relationship behavior as before.
We're going to have the usual update,
draw, and cleanup. So, we're going to
see how they change.
The update function is going to be quite
interesting in my opinion because now we
can operate on all the particles in
bulk. We're not telling each particle,
please update yourself. We have the full
view of the particles and we just loop
over them and perform the operations.
And you can already start seeing how the
compiler has much more information here
to optimize and vectorize and do cool
things with its code.
For the emitters, we're going to loop
over all the optionals. We are going to
skip the slots that have nothing inside
them. We're going to do our updates and
then we're going to create new
particles. Now, here we are actually
going to branch on the type of the
emitter and depending on whether it's
smoke or fire, we're going to push back
something else into the vector. Again,
we're going to see u an improvement over
this later, but so far this is fine.
And finally for our rockets, we move
them. That's that's fine. And the
interesting bit is this one. When we
want to get the associated emitters, we
just look into the emitter um vector
with the index that was stored in the
rocket. We check if that optional is
valid. I think it should always be. So
maybe this should be an assertion, but
you know, I used an if here. And then we
are going to set the position of the
emitter to be the same position as the
rocket with some offset that makes it
look a little bit better so that the
particles are actually spawning out of
the rear of the rocket. Right? And we do
the same for the the fire emitter.
The last part I want to show you is the
add emitter which is the function
responsible for you know creating a new
emitter. What we do is we loop over all
the slots. If we find an empty slot that
maybe used to have an emitter but now is
empty, we can place our emitter directly
there and return that index. So we are
reusing an existing slot. If the vector
is completely full, then we just place
back we have a new slot available. This
might end up reallocating the memory
under the hood. But we don't care
because we are not relying on that. We
are relying on the indices. Now this
algorithm is linear. But you can imagine
that if you want to optimize this, you
could have a list of indices that are
free which you keep track of as you
create and remove emitters. So you just
pop an index whenever you want to create
one and use that and you just push push
it back whenever you're done. So you can
make this constant time quite easily. I
just want to kept it wanted to keep it
simple as the number of emitters in this
program is very small compared to
particles and it's not really
significant that we have this O of N
algorithm over here.
Okay, last last part which is the
cleanup. We're going to have an erasive
for the particles where we do the same
thing. We remove them if they have faded
away. We're going to have an erasive for
the rockets where we remove the rockets
if they reach the right hand side of the
screen. And at the same time inside the
predicate, we're going to take this
opportunity to also destroy the
associated emitters. So if we know that
we reached the end of the screen, we're
also going to reset those optionals that
we had in the emitters vector. so that
those loads can be reused and then we're
going to return true to tell the
algorithm. Yeah, feel free to get rid of
this stuff.
Cool. So, how does this change our
performance? Let's do another round of
benchmarks. Same hardware, same program,
same conditions, and we have very
significant improvement for 200K, even
more for 400K. And as you can see, this
trend keeps going on. And on average we
have a 70% decrease in update time just
by changing the way that we store the
data. We haven't changed an operate any
operation. We are doing the exact same
calculations on the data. We just change
the way that we store it and the way
that we uh process the data. I also am
not showing it here, but I also got an
eight times boost in rendering
performance because now that the data is
laid out in groups, I can easily take
all the particles and send them to the
GPU as as one thing in bulk. Easily take
all the rockets and send them at once.
So you you're going to see over time if
you do this, especially with graphic
develop graphics development, that the
data rental layout is actually very very
friendly to GPUs. So the more you do
this, the more you're going to be able
to send data to the GPU more effectively
um as a side benefit.
Okay. So I don't know if this is
surprising to everybody uh but you know
probably you expected this but at the
beginning of the talk I also mentioned I
want to make this not just about
performance but also about simplicity.
So let's see if this is actually true.
Let's see if it actually makes things
simpler. Now the first thing I want to
show you is let's say we have a new
requirement. For example, we want to
keep track of the numbers or number of
rockets specifically. And this is
actually something I tried to do for the
demo. I wanted to have a counter for the
number of entities in total, but I also
wanted to know how many how many of them
are rockets. And I tried, but I couldn't
because in the OP approach, it is
deceptively difficult to do this
efficiently. So the first thing I tried
in the OP approach was something like
this. I'm going to loop over all the
entities. I'm going to use dynamic cast
and if it's a rocket I'm going to
increase the number of rockets. Now I
don't like dynamic cast. I hate it as
much as everybody else but it seems like
a good use case for it. It's a
statistical metric that I just want to
have as a tangential thing and this is
like a edge case. I just want to know
how many rockets I have. This worked but
it actually showed up in my profiler. I
was losing milliseconds because of
dynamic cast. So it's unacceptable. I
would have made this OP solution even
slower than it is just to count the
number of rockets which I I was
surprised about. I thought it would have
some overhead but not very significant.
So then I realized okay maybe I can do
something like this. I can have uh my
entity have a get type virtual member
function that returns an entity type and
then I'm going to avoid the amount of
overhead that I get from the dynamic
cast. But this defeats the purpose,
right? I don't want the entity to know
which entity it is through an enum. The
point of OP is that I want to think
about entities in the abstract sense and
I don't want an entity to tell me what
type it is. Otherwise, why am I using OP
in the first place? It didn't seem right
to do this. So maybe I thought uh the
rocket itself on construction could
inform the world, hey, there's a new
rocket in town and then on destruction
it could tell that uh I'm I'm going out
of scope. But this also didn't feel
right. You have more mutation of state
hidden within the internals of a derived
class. So it's harder to see the flow of
the code. And also I feel like this is
an SRP violation. Why is the rocket
responsible for metrics? Like didn't
seem right.
the world maybe could do this then maybe
I can have an add rocket function that I
can call and I will create rockets only
through the function and we'll keep
track of the rockets and in the cleanup
I would do some bookkeeping to decrement
this but now I'm adding a specialized
function for rockets so again I want to
think about in terms of entities I want
to give the world an entity not a rocket
it defeats the purpose right so I I just
didn't like this and there's also more
complexity because of the bookkeeping so
I'm not saying this is impossible you
can make this work it's just overly
really hard for what I wanted to do. So
I gave up in the end and you you've seen
in the demo that I just have the number
of entities I like you know that's the
best I can do. So what about a data
rented approach
that's it I know how many rockets I have
because I'm storing rockets separately.
So I just have a simple function called
to the size of the vector and I can get
them for free. So I feel like many
things this is just an example right and
you might think it's artificial but over
time as I' I've as I've moved towards
datentoriented design like I'm not
saying I'm going to go full data
oriented every single time I feel like I
get this small wins more and more often
so it does make things simpler
so it is also about simplicity and
another example I want to want to give
you is imagine you are like a new member
of the team and you have to work on this
demo you have to extend the code you
have to understand it. So you're going
to go in the codebase and you're going
to start doing a little bit of code
review. You're going to try to
understand how all the moving parts
interact, how it works, and what you
need to do to change it. So you see the
entity here, and you're like, okay, this
seems simple simple, and then you see,
oh, but we have a reference to a world
and also this extra state for the alive
boolean. And then you start thinking,
now every single entity might end up
doing something that changes the state
of other entities. And to know that I
have to cross reference all the files in
the codebase to see what's happening.
You have to jump around the source code
to get the full picture. At the same
time, this doesn't really sit right with
me. This is really annoying because we
are putting the draw member function in
the
in the virtual uh hierarchy inheritance
API thingy. We have very tight coupling
with the rendering system. So if we want
to change from SFML to SDL or another
library, we're not going to do that
because we're going to have to change 20
classes and you know they're very tally
coupled together which is quite
unfortunate.
At the same time, you know, if you look
at the world, you remember, yeah, this
seems simple on surface, but actually
it's hiding anything. This update could
end up destroying entities, could end up
creating new entities. So how do I know
what's happening? I have to look at
every single entity. So I I think in in
in um in practice this makes it hard to
understand the full set of the system
you need to keep into your head all the
possible right classes that are there
and what they can do at any point in
time
and yeah if you remember this ends up
affecting the outside world. So I I I
think you get my point
for the data design. Um I find it nice
that the entities themselves are just
data like there is a clear separation
between logic and data and there's
nothing hidden there. It's completely
decoupled and what you see is what you
get like there's no special side effect
in the constructor of any member or
anything like that. It's just plain data
which is simple. At the same time, if I
look at the world, I can immediately see
what things are in the world. I know I'm
going to have emitters, particles, and
rockets, and nothing else. There's
nothing that could be added in from a
runtime from some other derived class or
something like that. You can easily see
all the types. So, the data is not
hidden. And at the same time in the
single update function, I know exactly
every mutation, every operation that's
happening. I can see the stage based
approach where everything is changed on
a you know pipeline basis. All the logic
is there and I can see the relationships
between all the entities in the same
place. So I know if a rocket is going to
affect an emitter, if an emitter is
going to spawn a particle, everything is
going to start in this update function.
Now for the coupling issue, uh you still
have some coupling here with the
rendering system. So you still have that
thing there the render target appearing
in the world but it's much looser right
if I want to change rendering system I
don't have to change n classes that end
up dealing with drawing I can just
change it in one place you know do the
right transformations and it's going to
be a bit easier to u adapt this to some
other system so I feel like this is much
easier for somebody especially new to
the program to get an idea of what's
going on
and I feel like this also opens up new
opportunities so I'm going get there.
I'm going to start by adding this new
requirement. Let's say that we want to
be able to serialize the data. Now, um
this could be useful for many things,
saving and loading, networking, and so
on. So, how do we do it in the
object-oriented approach? You could do
something like this. You could have your
entity being the extended with a
serialized and deserialized virtual
member function. The serialized take on
stream, the dializeream.
So um probably the second one shouldn't
be const but you know slideware
this might work but I feel like the
coupling is too tight. I am locked into
a specific format because at this point
I need to decide whether it's going to
be XML, JSON, binary or whatever. And
also I'm one of those people that cares
about compilation times. I have a few
talks about the subject uh here and
there. And by doing this, I'm going to
make every user of entity um virally
include IO streams, which is pretty
beefy header and it's going to slow down
compilation. I don't want that. I want
these things to be separated. I want the
serialization module to be the only
place where I include whatever
dependency I need. So one of the common
answers to this kind of problems in OP
is design patterns. We can use the
visitor pattern. We can have a nice
visitor base class that deals with all
the derived types entity can accept
visitor in this uh in this virtual
member function and then I can create a
serialization visitor dialization
visitor maybe one for JSON one for XML
and so on nice and decoupled until you
realize I'm listing all the types again
so I'm losing that benefit of OP I have
to think about every possible derived
types in the same place we lose that
individual responsibility principle
which is quite nice to be honest. We
reverse back into point to put all of
these in in the same place. We also have
more overhead. There's actually double
dispatch. We're going to have two
virtual calls every time you you call
this function. Now for serialization
might not be significant, but if you use
this pattern for other things, it might
might actually be significant.
And also you have to keep everything in
sync with all other visitors. And one
thing that I don't like is that
something that should conceptually be a
function parameter. Do stream should
only be available as a function
parameter whenever I want to serialize
or des serialize now becomes state. It
becomes a member of the class. But it
doesn't have to persist. I only need it
for the duration of the serialization
function call. So why do I have to store
it there? And even worse, then you
realize this, right? You have your
rocket and in your rocket you're using
pointers. So what do you do? It's not
impossible. Again, you can make this
work, but you have to figure out a
mapping from pointer to something that
can be serialized. And then you would
have to figure out a reverse mapping
when you dialize it. You can make this
work, but does it have to be so
complicated? I'm just trying to
serialize some state and I have to do
all this stuff.
Okay. And the DoD approach is just a
single function. You have a function
called serialize. You take your world by
construction just reading and then your
stream and then internally you do your
serialization. Of course you could split
this into separate functions. You could
have a serialized for rocket particles
and emitters but the point is you can
isolate this in its own translation
unit. Any expensive dependency can be
isolated that TU. So you don't have to
to worry about compilation times and
it's quite simple. Everything is there.
You see it and you don't have to jump
around in in your source code. So I
think it's a a nice win even in this
case. At the beginning I mentioned
opportunities. So why am I talking about
opportunities because once you do this
you realize that the entire state of the
program is just data. It's just bytes
that are meaningful. There are there is
no reliance on pointers or addresses. So
saving and loading which is something
that people dread implementing becomes
trivial. You literally just serialize
and serialize in a row and and you're
done. This leads to better testability,
debugability, and tooling. If you have
an interesting edge case that you want
to test, you just save the state and you
write a unit test to repeat that. I know
of gaming companies that do this. For
example, the people behind Factorio.
They um they store interesting edge
cases in uh with their game and then
they have this unit test where they load
a state and they run it and check that
the result is exactly what you expect.
Debugability. you have a weird bug that
it's hard to reproduce. You save the
state and you replay as many times as
you want. You can check the state at at
will. Tooling. Now everything is data.
You can write a nice UI. So during
runtime, you can change relationships.
You can change the values of your
classes. It becomes a little kind of
like in-game editor from your own
application. So you get a lot of
benefits out of this. Networking becomes
easier. You can send a snapshot of the
entire state of the wire and the other
player or user will get the exact same
view and then anything on top of that
can be a delta that's been that's been
sent on top of the last snapshot. So
having everything just being data helps
a lot and creates new opportunities for
things that are useful.
Okay, what about multi- threading? I'm
going to go a bit quicker. I don't have
that much time left. But in the op
approach, let's say we would just want
to multi-thread the particles. We care
about the particles is the bottleneck.
We don't care about other things. How do
we make it multi-threaded?
Well, it's a bit difficult, right? This
could do anything. Could create new
particles, could be an update on
something that is not a particle. So,
it's not really easy. I would say is not
truly paralyzable because also we have
rights to linked entities from the
update of other things. So you cannot
just trivally say I'm going to take the
loop and split it into chunks. That's
going to be data race help.
Um yeah, you could make this work. Maybe
you could filter all the particles in
advance and then store them in a
separate vector and then do that in
chunks. But the overhead of doing that
will likely defeat the purpose of
multiping in the first place.
With DoD, you see your update as a bulk
operation in your update function in the
world. And now this is probably the most
purely paralyzable loop you've ever
seen. You can do this in horizontal
chunks, vertical chunks. You could even
get away with just open MP and you just
stick a pragma open MP parallelize there
and it's going to do it for you. So I
think this is like the easiest thing to
paralyze. Again, this is a simple
example, but I want to show that
something that is conceptually simple
becomes complicated if you apply
principles that are meant to help you,
but actually end up hindering you.
So, my point is that data oriented
design architectures provide many side
benefits. It's not just performance, but
you're going to get better simplicity,
better flexibility uh out of it. But can
we go even further? Can we make this
even more optimized, even faster? So
we're going to do a small optimization
pass here just to get rid of branching.
What we're going to do is we're going to
try to avoid branching in hot loops. And
the way we do this is by grouping the
data beforehand.
We're also going to take an opportunity
at this point in time to try and reduce
the size of common types. One of the
things that you learn very quickly is
that small is fast. The more things are
can fit in cache, the better performance
you will have. So if you have types that
are unnecessarily big, trying to reduce
the size of types or the padding that
you have inside your types will end up
having better performance in the long
run. But we're going to still stop here.
No other major changes. We're going to
stick to this layout which is called
array of structures. Later we're going
to migrate to S SOA structure of arrays
and we're going to see how that actually
affects the performance.
So this is what we had. We had this
particle type. What we're going to do
now is we're going to get rid of that.
So, we're not going to have that enum
anymore. We're not going to have those
extra fields. And I'm going to show you
that the world is going to be
responsible for grouping the particles
depending on their type. We make that
branching implicit, if you will. At the
same time, um I realize we're never
going to have more than a few thousands
of meters. So, why are we using a size?
Sist.
But we don't need that. So, I can change
this to a unit 16. It is more limited.
You have to be a bit more careful with
these values now. But because I know the
nature of my program, I'm going to save
some space for the rocket. Maybe by
doing so, I'm going to be able to load
more rockets into the same cache line as
I'm iterating over them. Now, this is
not going to be very impactful because
rockets are not a bottleneck here. I
just want to show you the idea, the
principle. If you have something which
is in a hot loop and you can figure out
let's say constraints about the data um
of the fields of the objects changing
the data types reducing padding reducing
size can actually improve performance
significantly in some use cases.
Okay, what else?
So if we look at the world here, we have
to do a change, right? we cannot just
store particles anymore because you
don't have a way of knowing which kind
of particles they are. So we're going to
have more containers and is another
common transformation in DoD. You end up
having a lot of containers that
implicitly provide information about
some properties of those objects. So the
first container tells you everything in
here is a smoke particle. The second one
tells you everything in here is a fire
particle. And we do the same for the
emitters. And we tweak the API a little
bit. And this is how we actually
represent the various types. Of course,
if you have a lot of types, then you can
think about meta programming, generating
these things at compile time or using
even a runtime generation mechanism if
you're loading these things on demand.
But for now, because we only have two
types, it's it's fine to hardcode.
In the update member function, we're
going to use little lambda here as a
local function to avoid repetition.
Again, C++ abstractions, modern
features, and data design can coexist in
my opinion. They can be synergetic. And
I'm going to avoid repetition here and
just have two loops, one for smoke
particles and one for fire particles.
For the emitter is the same thing. And
I'm going to use a little um higher
order function here. So this lambda is
going to take another lambda called fsp
spawn which tells the emitter what kind
of particles should be spawned. And then
when I actually call this I'm going to
specify as a callback that the smoke
emitters should spawn smoke particles
and the fire emitter should should spawn
fire particles. Again, I feel like it's
a little bit of obstruction, a little
bit of modern feature usage, but it's
enough to avoid repetition and still
keep the code very linear and simple
without hiding what's happening under
the scenes. So, I think this is a
reasonable amount of abstraction.
That is it. Once we do this, we can do a
little bit more benchmarking and we're
going to see that it's not that big.
there is a consistent improvement of
around 5.73%
in terms of decreasing update time which
is small but you know we'll take it and
as I mentioned uh this particular
optimization here wasn't very impactful
because most of the bottleneck in our
program is updating the particles
themselves while the branching was not
the hot loop but depending on your
workloads doing something like this
might give you a much better performance
increase not pictured here is also uh
after doing this change I got a 12%
drawing improvement like rendering
performance improvement and the reason
is that now the data is again laid out
more nicely for the GPU I know that all
the particles that are fire particles
are going to be in one container and
they all use the same texture so I can
just send it to the GPU and the GPU is
going to be happy and you can do the
same with the smoke particles so DOD
really aligns very well with what the
GPU wants and it's it's good to keep in
mind if you're going to do graphics
devel development if you're going to use
GPUs to accelerate your programs.
Okay, last thing that we're going to do
is we're going to change data layout. So
you always probably hear this idea of S
SOA SOA structure of arrays. Is that
actually going to be beneficial? Is it
actually going to make a big difference?
So why don't we just try it and see how
that performs?
The idea behind S SOA is quite simple.
Instead of having a single strct that
represents an individual particle, you
don't do that anymore. You have a strct
that represents a collection of
particle, a strct of arrays. So each
individual field of your particle is not
going to be an individual field, but
it's going to be a collection of fields.
So you lose the explicit object, the
explicit entity. Now a vertical slice
over all these vectors just like a table
becomes your entity. So we're going to
have this particle S. also a strct that
contains positions, velocities,
accelerations, and so on. And the
implicit invariant here is that all
vectors are going to have the same size
and they're going to be um growing and
shrinking um in sync. So all of this
stuff should be in sync.
Visually speaking, looks like this. The
AOS layout, every index is a complete
object with all its fields. The S SOA
layout, as I mentioned, is like a table.
every index contains a vertical slice
which represents conceptually speaking a
particle but you don't really have that
in in your code anymore. So if you want
to do a little bit of a comparison the
AOS layout has explicit entities every
object is a real entity in the code
while the S SOA as implicit entity is a
conceptual slice across the arrays.
AOS might suffer from internal padding.
If you have alignment requirements for
your fields, the compiler might end up
adding some extra bytes in between them
that are still going to take space in
the cache. So you might be wasting cache
space because of padding. With S SOA,
because every container contains only a
single type, unless you're using some
weird type that has some alignment
requirements as a decoration, then every
single bite is going to be useful.
There's not going to be any waste of
cache space.
This is very important. If you only care
about a single field, for example, we
only care about the position of
particles. We just want to find all the
particles that are out of bounds, right?
With AOS, it doesn't matter. You still
have to load things you don't need
because position is close in proximity
in memory to velocity, acceleration,
other things. So even if you only care
about the position of the fourth
particle, you might get some data from
particle three at the end loaded in
cache and maybe the velocity
acceleration loaded in cache as well. So
if you need a subset of fields, you're
going to waste a lot of space accessing
other things. With S SOA, we have what I
like to call the surgical field loading.
If you you only get what you need. If
you only care about positions, you're
not going to pollute your cache by
putting in things that are not
positions, which is quite nice.
AOS is naturally resistant to
vectorization. The data is scattered.
SIMD instructions like when the data is,
you know, homogeneously close to each
other. It is not impossible. We're going
to see that it actually happens, but it
makes it harder and a bit more
expensive. S SOA is ideal for
vectorization. Is exactly the way that
SIMD instructions like the data to be
in. It makes it trivial for the compiler
to perform auto vectorization assuming
that you're doing operations on the data
which you know are are quite simple and
straightforward and it's not on the
slide but again I would also say S SOA
also is very nice for GPUs. GPUs like
that layout a lot compared to AOS.
So um yeah let's implement this. Let's
see how how it actually behaves. One
thing I'm going to do before
implementation is just have a little
helper function here because I hate
repeating things nine times. So I'm
going to have this function called for
all vectors. It takes some callback f
and applies that function f on top of
all the vectors at once. You can imagine
that if you want to resize or reserve
all the vectors at once instead of copy
pasting the same dot resize nine times
we can just call this function once and
provide the right operation. just a
little helper that avoids um you know
repetition of nine times the same code.
That's going to be our particle S SOA.
Then we take our world. We change our
vectors of particles to be particle S
SOAs. We could have done this for
vectors, sorry, for rockets as well or
emitters. Not very significant. The
bottleneck is particles, so I'm not
going to bother.
And in the update member function,
again, I'm going to use a little lambda
here that updates a specific particle.
SOA. I'm going to get the number of
particles. Here I'm just using
positions, but you could use any vector.
Again, the implicit invariant is that
they're all the same. And if you seen
Barry's excellent talk on u on Monday,
he showed you how to use reflection to
create S SOA automatically and stored,
you know, just a single session capacity
to avoid that waste. I just wanted to
keep it a bit simpler, but I recommend
seeing that talk if you are interested
in that. And I'm just going to loop over
all the particles via index and I'm
going to perform these operations. So
now you might be thinking, does it make
sense to have a single loop? Is it
better to have five loops and each one
individually just loads the thing that
it needs? I've tried it. In all
scenarios, having a fused loop was uh
faster. So again depends on your cache
size uh platform specific um let's say
constraints and requirements. In my
situation having a single fuse loop was
always faster.
The last thing that I want to show you
is the emitter update because this is
not nice. You're going to end up doing
this. So whenever you want to spawn for
example a smoke particle, this is what
it looks like. you're going to have nine
calls to push back where everybody is
going to be a single field. And
honestly, uh I cannot defend this. It's
horrible. But that's that's what we get.
Yeah. So, it's pretty much the same. I'm
not going to bother seeing the other
things. The cleanup is the other
interesting part because you're going to
find some friction with standard library
algorithms. They're not designed with
SOA in mind. So what I did here is I
defined this has negative opacity uh
helper predicate which doesn't just take
the index but also the S SOA it belongs
to and it tells me whether that nth
particle in that S SOA has faded away
and then I wrote my own erasive for S
SOA and I'm going to pass in my particle
S SOA and my predicate to remove those
particles.
Now what is what does that look like? Um
just to show you give you the idea. It's
kind of like a partition algorithm. I'm
going to start from the left and the
right. I'm going to skip over all the
particles that don't need to be removed.
And then I'm going to move everything
that needs to be removed towards the
right hand side by using the for each
vector to performs the swaps
simultaneously on all nine vectors at
once. And then at the very end, I'm
going to resize all the vectors at once.
And you can probably see the value of
that small helper function here.
Otherwise, you would have to use nine
calls to resize and so on, which is not
great and very easy. uh to make a
mistake.
So how much does it matter? How much is
the improvement here? And it's quite big
actually. We have around
32% decrease in update time. And at the
same time, I promised I would tell you
about rendering performance two times
faster. And the reason why it's two
times faster is because I can simply map
the fields to GPU buffers and I can just
send everything continuously at once
which is very very efficient for the
GPU. But if you think about it, why are
we getting this nice big boost of
performance? This is probably the worst
case scenario for S SOA. I mentioned
before S SOA is very nice when you want
to get subset of fields like if you only
care about position, if you only care
about scale. But here for every single
particle we're using all the fields. So
it should be you know comparable to AOS.
We're all we're still loading everything
in cache. Why are we getting this huge
boost even though we're not leveraging
some of the nice properties of S SOA?
And of course to figure this out I went
to the assembly and I was taking a look
at the AOS versus SOA. You can see that
in both cases you get sim instruction.
We get a fuse multiply add for the
integration steps which is probably the
best thing you can do here with floating
points uh to perform that you know
integration of velocity and so on. But
for AOS you have a few extra operations
here. You have something called gather
and scatter. So what gad does is it
takes data that is laid out in a strided
way and it loads the addresses of that
data into a single cmd register so that
everything is continuous and then the
scatter does the opposite. It takes that
data and then it scatters it back to the
place where it beca where it came from.
So there is some overhead when applying
SIMD due to the fact that the data is
not laid out continuously. It first need
to be shuffled around a bit for SID to
be applied on top of it and then it
needs to be put back where it came from.
So this is probably the biggest overhead
that we get here is because of
vectorization. There might be other um
other factors here that uh affect the
performance difference because the
particle data in AOS is all continuous
per particle. You might have some cases
where the same particle lies on the
boundary between two cache lines. So to
read one particle, you need two cache
lines. Prefetchure might have a bit of a
harder time with stred access. I don't
know, just a theory. And this is
actually something that might be very
impactful because when we're doing the
gather and scatter, we also need to use
some registers to store those values
after they be gathered. Then you might
have a bit of extra register pressure.
you might have fewer register you can
use for meaningful operations as you're
doing the sim this stuff in here. So
this is how I explain the performance
difference. If you have any you know
other insightful ideas you can let me
know afterwards.
What about simplicity? Did we sacrifice
simplicity? And my answer is with this
current design absolutely yes. I don't
want to write nine push backs. It's not
simple. It's ugly. So the problem is
that C++ has no real native support for
the S SOA layout. The language and the
standard library are all built all
around AOS. I would have lied to use
stood erasive, but I could not have done
that. Maybe you can write some sort of
proxy iterator that when you dreference
and stuff like that, it just scatters
through all the vectors. Don't want to
do that. It doesn't sound simple. So, it
would have been painful. We've seen how
some functional programming patterns
help. Just simple things like a local
lambda or passing a function to another
function can help reduce repetition
significantly with a very small effort.
So that's good. But can we do any
better? And I think you know it's time
to reflect on that.
So we have this right. I like this
simple understandable. I don't like
this. This is a manual transformation
that I don't really want to do. I would
like to work with a particle as it is on
the left hand side and get all the
benefits of S SOA. So it would be cool
if I add this magical S SOA type that is
a template. I can give it a particle and
internally it builds up that structure
that we mentioned before all the vectors
for every single field would be really
cool. And then if I had this I could
call push back and I could provide all
my fields as a particle and internally
the the S SOA wrapper would scatter them
to their targets. Would be so much nicer
to write that compared to nine push back
calls.
Maybe I could have a with all interface
that I could use to loop over the data.
This would expose all the fields in
order as if they were an actual object
but they are still going to be loaded
from separate vectors. So it gives me
that nice aos-like way of thinking, but
the layout is still S SOA. But more
importantly, maybe I could have
something like this. I could say, look,
I only care about the scale and the
scale change, the rate in which the
particle changes its its size. So please
give me those two fields only. Load only
that memory and then execute this lambda
over here. So maybe I could have this
width function. I specify the fields
that I want with some pointers to
members and then I provide the lambda
expression to actually work on those
fields. Would be cool, right?
This is valid 17. You don't need to wait
until C++ 26. You don't need reflection
to do this. You can do this in CSA 17.
And I've done it using boost pfr which
is a very nice library underutilized
which allows you to do reflection on
aggregates. There are limitations. is
not full-fledged reflection. I'm still e
extremely excited and happy that we're
getting reflection in 26. But I think
that people underestimate how much you
can do even before that. T+ 17 gives you
tools that you can abuse to get
reflection for aggregates and this kind
of transformation is something you can
actually implement yourself which is
quite nice and I will uh I will uh link
you the code at the very end. At the
same time, can we do better? If we had
C++ 26 reflection, we could do this. You
don't have to specify those things
twice. You don't have to say that you
want the address of the scale field and
the address of the scale change.
The reflection system could just look at
the names of the parameters and the
types of the parameters you specify in
your lambda and from those names, it
could figure out which fields to load
from the S SOA. I don't know how I feel
about this. I think it's just, you know,
the the initial impact that having names
of things being so significant is a bit
scary, but it is the future that we're
going to have. Reflection is going to
enable APIs and interfaces that heavily
rely on names and things like that. So,
you better get used to it. Now, I'm not
going to implement this because Barry
again had a fantastic talk on Monday,
really good in its own regards that I
strongly recommend you go and see that
deals with this problem specifically. It
shows you how to use reflection in 26 to
transform an AOS data layout into S SOA
automatically. I've shown you that you
can do that even in 17, but with 26
reflection, I think it's way more
straightforward, allows you to do more
things, and I'm excited to see what
people come up with in the future. So,
go check out this talk and you're going
to get all the information you want
about reflection.
Cool. Almost done. Just a few more
things. SOA is not always the answer. I
think there is a misconception and
people think data oriented design means
S SOA
not really data oriented design means
something more important than that. It
means that you design your architecture
with a major focus on the layout of your
data. The access pattern that you're
going to use to operate on your data
chooses the layout, not the other way
around. So you have to ask yourself
questions. Which fields are frequently
accessed together? Maybe those things
should be together in memory. Are there
any cold fields that I might use only
once in a while? Maybe you can store
them separately from the rest of the
fields for the same object because
they're going to just waste cache space.
What is the sim length size on this
platform? Your architectural choices
might depend on what you're targeting.
You're targeting a PlayStation, you're
targeting a laptop might be different,
right? But this is where probably I
would um disagree with the rest of the
gamedev community. I like abstractions.
I like being able to say SOA of
particle. And if I have that and if I
have maybe AOS SOA, which is like a
hybrid layout that does the same thing,
I can experiment. I can see which one is
better on this platform. And just by
switching my S SOA to AOS or to some
other template that does something
similar, just by doing that compile time
switch, I can keep the rest of the code
exactly the same and see how the memory
layout uh affects the performance. I
like being able to do that. So switching
between layouts at compile time will be
something you can do and 26 will make it
easier. With reflection, you can
implement any sort of transformation
that you want that allows you to
experiment and try out new layouts and
see how it changes your performance.
And also, DOD is not always the answer.
I don't hate OP. I think that OP um is
problematic when it's used as the wrong
granularity.
If you use OP at a higher level course
of granularity, I think is a great tool.
If you have an op particle manager with
a nice API that does encapsulation,
gives you private and public access, has
nice environments and so on and
internally you use DOD for speed, by all
means do that. And if you do so, you can
also swap strategies at runtime. You can
have a particle manager virtual
interface that then you implement for a
you know AOS implementation, SA
implementation and so on. And because
the virtual calls are going to be just a
few compared to a few million, it
doesn't matter. It's nice to have that
uh possibility and that sort of
flexibility thanks to the OP principles.
So OP is not bad. It's just the is
sometimes used at the wrong level of
granularity.
Also, if you're working in teams and
with other people, having clear
boundaries and the SRP single
responsibility principle can boost
collaboration.
If you have large teams with no novice
developers, having invariance and access
control can help people understand how
to modify the code. Also, we I've worked
in places where um you know we can agree
on an interface beforehand and then we
implement that interface in parallel for
two different things and then we merge
it later on. Having that sort of like
layer of a structure is really useful
for collaboration. You can use polymer
for plug-in systems, but what I
recommend is a hybrid approach. OP is
the shell and DOD is the engine. So if
you manage to find a nice reasonable way
of combining those at the right levels
of granularity, you're going to get the
best of both worlds. Almost done. Some
final takeaways.
Design for performance from the start.
If it is a requirement, it must be an
architectural priority. It's not
permature optimization. If speed is a
requirement like you need to target a
certain uh frame rate or something like
that you must take account of it in your
design after you do that then you can
profile and see what the bottlenecks are
in that particular design but don't
think about the parameter optimization
is something important for the success
of your program.
Flatten and shrink your data. Prefer
vector of t to vector unit pointer of t.
You're going to avoid all that extra
indirection and have stuff more
contiguous. Minimize set of structure
using hot loops and steer away from
polymorphis trees for things at a small
granularity. For high granularity stuff
might be a good idea to use them.
Depends on your use case. Group data by
usage. Instead of having a single
container and a lot of flags and
booleans that you branch upon. Use more
containers. Make those flags and
booleans implicit by pre-arranging your
data. is not just good for performance
but also helps you see everything that
you have already laid out. It helps with
simplicity and understanding of your
code. And if you have bulk processing
which might benefit from SOA, consider
it. But don't think that DoD always
means SOA. As we mentioned before, it's
a mindset. It's a way of thinking.
Data drives design. Target the machine.
It's a spectrum not dogma. Embrace
Monto+. helps and pragmatism always
wins. So I just want to leave you with
this. Think data first. Your code will
be faster and simpler. And thank you so
much. I will take questions.
>> Okay. I have a question.
>> Yes.
>> So you you asked the first question in
my talk and I appreciated that. I think
this talk was perfect. You've actually
done what allocators can do by hand in
the sense that you've laid everything
out using vectors. Vectors don't have
the same problem as other kinds of
containers because as you delete them
and and put things back because there's
no internal part to it, you don't get
the kind of diffusion you would had you
designed in in a composite world. If you
were not designing at this level, but in
an intermediate level, can you see that
allocators could recover much of the
loss? Could you talk about that?
>> Yes. So I it's a good question. I
consider talking about allocators for
this talk. I wanted to keep it a bit
simpler and a bit more philosophical in
a sense. But I do think that if you end
up in a situation where a pragmatic way
of recovering performance out of a
design that is has been OP and it's
really hard to change because of you
know change friction and stuff like
that. Allocators can help you
definitely. I would be surprised if I
didn't get a huge speed up by applying
an allocator for those entities that I
had before. So it would be something
interesting to try out would be
something interesting to benchmark. But
I do suspect allocators are a good, you
know, pragmatic intermediate solution to
this problem.
Sorry.
>> Um oh there um I think the an
object-oriented programmer would say
you've lost some of the relationship
between some of those objects. like
there was a very sort of
clear relationship between the OB uh say
the rocket and the emitter that you sort
of lose in that sort of data layout. So
I see there's benefits but what would
you say to that?
>> So I would say by looking specifically
at the data itself I can see the point.
Um one one one comment I have to make
like is that just use a plain integer
for the index but you could use handles
you could use strong type defs that have
more semantic meaning in terms of
relation to the express. On the other
hand, I feel like that relationship
becomes much more visible when you are
operating on the data. If you look at
the update loop, you actually see what
is being done with that emitter with
that rocket. You don't only see that it
is that there is a relationship, but you
actually see what the relationship is
for. So I think you you gain more than
you lose in that particular aspect.
>> Oh, sorry. Yeah, I have to get used to
this.
>> Hi, thank you for the talk. I really
like this kind of design. I I was
wondering how would you compare the the
testability in between an OOP design and
a DOD design? Is it easier?
>> What kind of stability is
>> the the testability?
>> Testability. Yeah.
>> So I think from DoD point of view, you
get some nice benefits. For example, as
I mentioned before, because your problem
is just data, right? You can literally
store an interesting test case as data,
load it and try different things on
that. I mentioned there are gaming
companies that do this with complicated
scenarios that I want to test out. they
just store a save file and they load it
and so on. With OP that becomes a bit
harder because you don't have that easy
serialization aspect that I mentioned
before. OP might be nicer in some
aspects. For example, mocking is one of
the places I was thinking of or
dependency injection. So I think as long
as you apply the OP at the right level
like the higher level components you you
should use OP, right? I'm not saying
never use OP. I think it's just using it
at the wrong level causes problems. So
if you are doing some business logic
where you have a database you know um
abstraction that could definitely be an
OP abstraction then you could mock that
into your own system but then the the
processing you might be doing on the
data you get from the database if you
represent every row as a heap allocation
that's where you lose the performance
benefit so I think they both have their
own strengths and cons and if you apply
them at the right level you can use them
also in testing let's put it this way
make sense
>> thanks really uh evocative and tight
example. Um
are there any tools or techniques to
find where I have that memory pressure
and where I should be thinking about
applying these types of transformations?
>> Yeah, so I had um quite a good
experience with the Btune profiler from
Intel. It was able to tell me um what
the bottlenecks were, how often my CPU
was idle, and if I was using the cache
properly. I'm sure other profilers can
do similar stuff. In the past, I've used
I think it was part of the val grind
suite. There is a coal grind tool as
well that does something similar. So
definitely profilers can identify
how much you're using even perf you know
tells you how much percentage is being
spent on actual work on cachm and stuff
like that. So that might be an initial
step then something like von which has
been more precise might be something
that I I would recommend. Yeah,
>> thank you for the nice talk for the
structure of Aries approach. What are
your experiences with view views? For
example, if you have a particle view
that just has references on the data
members, does this imp impact
performance or can the um optimizer
optimize this array?
>> So, my experience is limited. I've tried
some stuff out with ranges and things
like that. One of the things I've
noticed is that for example when using
ranges the zipu and things like that
without optimizations enabled you get
very significant overhead and also quite
significant compile time overhead which
I wasn't a fan of. So I think that if
you are able to always compile with
optimizations and you can ensure that
the compiler is in lining those things
it might be as efficient as a
handwritten version. I find that a more
functional approach where you just
provide the callback leads to you know
having less pressure on the optimizer
and the code being efficient even if you
are not able to optimize for example for
debugging easiness and stuff like that.
So I would like to see a world where I
can recommend hey use this stuff all the
time but there are practical concerns
with debug performance and compile times
that makes me a bit wary of doing that.
Maybe if you
if you with reflection if you unrolled a
very flat view that doesn't have many
layers of abraction templates then it
might be significantly
you know less problematic even when you
don't haveation but stuff like that so
again I would think that civil 26 will
enable this to be more much more
effective with reflection make sense
>> thanks
>> okay uh thank you for the talk it was
really illuminating thank you uh the
question I have is I kind of got a
feeling that having a batch kind of
processing is really important to get
the to reap all the benefits that a DoD
uh uh architecture would give you.
What are your thoughts on uh non-batch
uh accesses like if it's random access
or you're processing only few entities
that are spread out and not aligned with
the caches and yeah so in general again
it depends on your requirements right as
I mentioned before there have been very
successful games and applications that
don't care about any of this right they
just allocate all over the place it
works they don't have to reach high
frame rates and stuff like that so they
can get away with it if you are in a
situation where your data is naturally
graph shaped or something like that. It
becomes a bit trickier to do this,
right? Because the pattern that you're
accessing the data in is not as as uh
intuitive as simple as we seen in the
demo. But there are layouts. For
example, if you know that you're going
to do a level order transfer over a tree
or a depth first transfer, you can still
lay out that data in a flat array in
such a way that as you're performing the
transfer, the the leaves that you the
nodes that you're going to access after
the pattern that you're currently at are
going to come right after. So again, my
my recommendation would be if
performance is a requirement and you
want to make sure that you use your
architecture efficiently, think about
the access pattern and think about
potentially not perfectly aligning the
data in such a way that it is following
the access pattern, but at least
maximizes the chance that it's going to
get in cache as you're loading it. Does
that make sense?
>> Hi, nice talk. Um I just had a question
on on uh when when you flattened all the
data you actually duplicated the data
versus just deriving from the strct and
just using the leaf truck. Is there a
downside to do that because then you
have no duplication and seems like the
memory layout would be identical.
>> Yes. So I I consider doing this. Um
there is no drawback in performance. You
are correct in saying that the memory is
going to be identical. So it's going to
be performing the same way. It's a bit
of a choice in how explicit you want to
be. Like if I look at this strct, I want
to be able to see, hey, this thing has
exactly this fields. If you derive from
that, you also have to look one layer
layer up when you're reading the code.
It's not a big deal, right? But it is
one extra step you have to take. The
other thing I'm concerned about is once
you get that inheritance, even if it's
not virtual, if it's not polymorphic,
somebody's eventually is going to start
relying on that. Somebody's going to
see, okay, I can use an entity reference
here to refer to something. And then if
you want to change perhaps the way that
you're storing physical information in
the meter, you realize, hey, I don't
actually need the velocity in the meter.
It's always going to be synced to the
rocket. Then you won't be able to do
that. So it creates a bit of extra
coupling in that situation. So
>> the upside is if you add one, you just
add it to one place, right?
>> Yes. Yes. So I think there's pros and
cons. Definitely something you have to
consider. I don't think it's a bad idea
to do so. I think there's again some
pros and cons to both approaches. If the
repetition is significant, I would very
likely do this, but for just three
fields, you know, small, right?
>> Also still would work with uh the strct
of array, right? Because you end up
>> Yeah, you get the same layout anyway.
Yeah. So inheritance which is
nonpolymorphic inheritance is a good
tool reduce repetition for sure. I would
apply the same logic that I would do for
functions. Small piece of code just a
few times maybe not worth it.
>> Big piece of code used a lot of times
definitely extract it. That's what I
think.
>> Thank you.
>> You're welcome.
>> Hello. Thanks for the talk. Um I'm
imagining your example as more of an
engine uh than just a standalone thing.
uh say a user wants to add an update
where the rockets move around, change
color, rotate, whatever. How do you bake
that in to where you are still only
passing the data over once per frame per
update?
>> So, are you referring particularly to
changing the data layout to support more
requirements or is it about the
rendering? I'm not sure I get the
question.
um more require I guess. Um
>> yeah, more requirements. Uh yeah.
>> Yeah. So I I I think in the end like if
the requirements change, you're going to
have to change the code in a way or
another, right? With OP, if you had a
nice abstraction that managed to predict
correctly what you needed to do, you
might need to do less work to apply
those things. But it depends on your
initial abstraction. In this case, I
don't see a major problem. Right? If you
want those requirements to be hardcoded
in the binary like you want you say okay
version two now is going to support
differently colored rockets then you
might start adding a rock a color field
to the rocket see if that is good enough
or you could consider splitting rockets
by groups like we've seen before having
a container per color or something like
that if you want something more dynamic
like saying users can uh dynamically at
runtime change properties of things then
I think that's where we'll get into
scripting languages so I would probably
provide provide the basis of the
entities in in the code base. I would
add support for all the major sort of
like fields like color size and so on
and then I would have a scripting engine
that runs after the update that would
end up allowing the user to do stuff
like that. So I think that's probably
what I would go for if you need full
flexibility. Make sense? Thank you.
>> First off, great presentation. Love the
way you presented all that materials.
Very easy to take in, very informative.
Uh, so one of the things I take a lot of
flack for in my merge requests is using
callbacks quite a bit.
>> Um, and the way that I use them though
is because I prefer the machine to know
what task it has to do as opposed to
going through like a function pointer
and then having to determine if this is
here, if this is here, do this. I prefer
to skip all that and say either do your
task or do these two things first. The
classic example being let's say you
enable logging on your service and then
the function pointer approach would be
execute this function check if the
logging Q exists because it's a pointer
to it. If it doesn't exist then move on.
If it does exist log first then do your
task. Whereas I preferred to say hey on
program startup I want you to take in a
lambda that just uh captures all these
things and then sets the call back to
either log or not log off a jump. So
that way whenever you hit your callback
section you just go okay do my task and
not check if this if this then do task.
>> Is there like really a problem with
doing it in that way when it comes to
thinking of uh the speed of your code
and like avoiding lambda captures and
whatnot because I know lambda captures
tend to get a bit of a bad rep if
they're done poorly especially with
lifetime issues but for performance
issues is that really much of a concern
uh these days compared to everything
else. So it's it's an interesting
question. I think I would have to get a
bit more context on the application
performance require and stuff like that.
So when you use this lambda is like
stored as a ST function like do you have
the polymorphism? Yeah. So if function
has some overhead associated with
invocation, right? Because you have like
the virtual dispatch and all the stuff.
So if you're calling a S function in a
very hot loop might be a problem, right?
But if you're calling it in a few places
might not be a problem. It depends on
you know the
whether you're in a hot place or not in
your in your codebase. I don't know
about your use case. I would have to see
that but it doesn't sound unreasonable.
If you are in a situation where you can
afford the over the function it solves
the problem cleanly. I think it's fine.
You you've seen in my in my demo I did
use a lot of callbacks also with auto to
avoid that overhead. I tend to prefer
callbacks that are for lack of a better
world word uh local like in the sense
you you you define a callback but the
place where you invoke it is close like
you can see where it is. I've worked in
code bases where callbacks can call each
other from very different places and it
becomes pretty hard to figure out where
that comes from. So I think that's again
there's a little bit of art involved.
There's not there's not a hard rule. uh
depending on the requirement unit of
performance still function might make
sense or not depending on your
architecture how far away the callback
is in terms of like cognitive overhead
might be a good idea to use it or might
be a good idea to use another technique
so if you want to discuss this further
I'd be happy to later on maybe you can
show me some more some more context and
maybe I can give you better advice
>> yeah good answer thank you welcome
>> cool I think we are done if there are no
more questions thank you so much for
Cheers.
Loading video analysis...