They cut Node.js Memory in half 👀
By Theo - t3․gg
Summary
Topics Covered
- Pointer Compression Halves Node Memory
- NodeCage Enables One-Line Swap
- Smaller Heaps Slash GC Pauses
- Double Density Via Memory Halving
Full Transcript
Cut your Noode.js memory usage in half with this one simple trick. Okay, the
cut the clickbait. This is actually a really cool topic. As I'm sure most of you all know, JavaScript is not exactly memory efficient. That's why we see
memory efficient. That's why we see hilarious things like cursor using 3.38 terab of RAM. JavaScript's a great language that does a lot of really cool things, but sadly memory management has
never been one of the ones it is strongest with. Which is why I was
strongest with. Which is why I was really excited to see Matteo, who is one of the main contributors to Node, come out with a really cool post about how we cut Node memory usage in half with a oneline Docker image swap. To be clear,
it's not one line of code changing here.
A Docker image swap is a pretty big deal depending on where you're doing it, but the things that led to this performance win are genuinely really fascinating.
Feel like most engineers don't appreciate just how much hard engineering work goes into JavaScript because they're quick to dismiss it as JavaScript. But V8 is one of the most
JavaScript. But V8 is one of the most complex and impressive projects ever built, and Node is not too far from it.
The effort that's going into the C++ that is powering all of this stuff is hard to put into words, and some of it's really cool. I'm constantly surprised by
really cool. I'm constantly surprised by how many hidden flags and features there are inside of V8 in Node that can fundamentally change the performance characteristics. This is one I did not
characteristics. This is one I did not know about that I think is really, really cool, and I can't wait to tell you all about it after a quick word from today's sponsor. If you don't have any
today's sponsor. If you don't have any users, you can skip this section, but for those who do, it's important to understand what they're actually doing with your service. And that's why today's sponsor is so important to me.
HostG users are doing. I know this looks like an operating system, but they're actually a suite of product tools, things that you need to build real products. The main thing that you're
products. The main thing that you're going to use them for is analytics, and they are the best analytics provider.
Open source, too, but it doesn't really matter cuz their hosting is absurdly cheap. Their free tier gives you a
cheap. Their free tier gives you a million events, 5,000 session recordings, a million requests to feature flags, 1,500 survey results, 100,000 errors and exceptions, and so much more. The data warehouse is one of
much more. The data warehouse is one of the coolest parts, though, because you can integrate with other services.
Integrations like Superbase and Stripe are super useful. Believe me, I spent way too much time in the Stripe dashboard. Post Hogs is significantly
dashboard. Post Hogs is significantly better, and it's like two clicks to set up. I was signing in to show you guys
up. I was signing in to show you guys quick and I just their vibe is hilarious. Just the fact that they have
hilarious. Just the fact that they have a special Valentine's Day little thing is just that's how they are. Are you
kidding?
Post hog. I want to just show off the data and your the vibes are immaculate. If you want a data company that doesn't take themselves too seriously, but is a serious contender for managing all of your data, look no further than
soyb.link/postthog.
soyb.link/postthog.
Now, let's dive into how Matteo was able to cut Node's memory usage in half, as well as any potential negative side effects of this. V8, the C++ engine under the proverbial hood of JavaScript,
includes a feature that many Node developers aren't familiar with. This
feature, pointer compression, is a method for using smaller memory references in the JavaScript heap, reducing each pointer from 64 bits to 32 bits. The net is that you wind up using
bits. The net is that you wind up using about 50% less memory for the same app without having to change any code.
Pretty great, right? Well, almost.
Node.js does not enable pointer compression by default for two historical reasons. I find this to be
historical reasons. I find this to be really interesting. There's been a
really interesting. There's been a couple times where this happened. One of
the most famous ones was when I got in the fight with that person who claimed I was lying about performance differences between Verscell and Cloudflare because when he ran a trigonometry function in a loop, it performed better on Cloudflare
than Verscell. The reason for that was
than Verscell. The reason for that was actually [ __ ] hilarious. Cloudflare
doesn't run Node. Cloudflare runs their own V8-based engine. Verscell runs Node.
It turns out there was a flag to do faster trig math that had been enabled by Cloudflare for their runtime of V8 that was not enabled by default in Node runtimes. Cloudflare were the ones who
runtimes. Cloudflare were the ones who went and found this, fixed it, and got it upstreamed into Node itself, thereby killing the gap there. But that was just like one trigonometry function that
didn't have the flag enabled to make it faster. Thanks, Pron, for that. But
faster. Thanks, Pron, for that. But
yeah, I still can't believe that my [ __ ] posting and silly benchmarks ended up making Cloudflare 3 to 10 times faster for various workloads. So cool. And a
lot of that just comes down to which flags are enabled and what values are set for them. People don't appreciate how much config there is in Node because none of us ever hit it. Like in
Typescript, you go edit the TS config, but in Node, you don't really config anything. You just tell it to run a
anything. You just tell it to run a JavaScript file. But there are so many
JavaScript file. But there are so many flags you can enable in the building and bundling of node that nobody ever touches and seeing into them with stuff like this is really fun. So let's see the historical reasons why this
particular flag has not been enabled. So
going through these reasons, first is the 4 gig cage limitation which means that enabling pointer compression required the entire node process to share a single 4 gig memory space between the main thread and all the
worker threads. This is a significant
worker threads. This is a significant issue. Cloudflare Ngalia partnered to
issue. Cloudflare Ngalia partnered to solve this so that the cage could be per isolate, an individual instance of the V8 engine. To break this down a little
V8 engine. To break this down a little bit for those who aren't as familiar with the worker model in JavaScript and V8, it is possible to spin up relatively safe isolates in V8 that have their own
pool of memory. So, if I have some work going on in the main thread, like I'm trying to make sure when people hit buttons on the keyboard, something appears as quickly as possible, but I want to do something else that is
complex computationally, like generate some UI or run some complex math. If you
do that on the main thread, other inputs and other things the user is doing are going to be blocked by that background work. Workers are a way to effectively
work. Workers are a way to effectively spin up another instance of the JS engine. It's not really a new instance.
engine. It's not really a new instance.
a sub isolate within it that has its own memory, its own runner, and can be run in parallel to what you're doing in the main thread. So that things happening in
main thread. So that things happening in the worker don't block the main thread.
But in order for that to work, it needs to have its own memory that it's operating against. You cannot share
operating against. You cannot share memory between a worker and the main thread. You have to pass events between
thread. You have to pass events between the two. As a result of this complexity,
the two. As a result of this complexity, a lot of people have just never built with workers. I don't really know many
with workers. I don't really know many JavaScript developers that have used workers in the browser. I'm not talking about Cloudflare workers. talking about
browser workers for other threads to prevent blocking main thread as you're doing things. I just don't know many who
doing things. I just don't know many who have used it that way. Cloudflare uses
them very heavily. The reason the platform is called Cloudflare workers is because you don't get your own instance of V8 when you use Cloudflare. The V8
instance has workers spun up for every request. So the same way I could in the
request. So the same way I could in the browser spin up a worker to do something else while the user is still using the site, Cloudflare has the one main V8 instance with tons of workers that are
spun up in order to do these other things so that they can run on the same instance of V8 thereby massively reducing Cloudflare's costs for hosting and also allowing them to be as cheap as they are relatively speaking. And yes,
Nean, web workers are the right term for the version for the web, but what the [ __ ] would be called the ones in node if that was the case? The point I'm trying to make here is that Cloudflare has invested a lot in doing things to
prevent weird memory characteristics in workers because they want this to work as well as possible for the stuff they are hosting. And the problem here was
are hosting. And the problem here was that the entire node process was sharing a single 4 gig memory space and whenever a new worker came up, it got to reserve some portion of that because it couldn't
reuse the other things. So what's the other problem that prevented this? Some
worried that compressing and decompressing pointers on each heap access would introduce performance overhead. And I understand this concern.
overhead. And I understand this concern.
I actually would have had the same concern that if you have to do something extra, like one more cycle, one more step before you could even access the values in memory. That sounds really bad. And that's why Cloudflare, Agalia,
bad. And that's why Cloudflare, Agalia, and the Node.js project all collaborated to determine exactly what kind of overhead existed and assess whether it would impact real world apps. To test
this, they created Node Caged, a Node 25 Docker image with pointer compression turned on. Because you can't just pass a
turned on. Because you can't just pass a flag to node. You can't just be like node- compression or something. it has
to be built with this on. So they made a custom build of node 25, put it in docker so you could easily swap it without having to reinstall node on your existing instance and then run production level benchmarks with this on
AWS EKS. In short, we achieved 50%
AWS EKS. In short, we achieved 50% memory savings with only a 2 to 4% increase in average latency across real world workloads and a reduced P99 latency by 7%. For most teams, this
trade-off is an easy choice.
Fascinating. I would expect this to become the default very very soon. Let's
talk a bit more about how all of this works. First though, uh Nean clarified
works. First though, uh Nean clarified that there are a lot of different types of workers. Cloudflare workers, web
of workers. Cloudflare workers, web workers, service workers, parenthesis, node worker threads cuz these are called worker threads, but they are the node version. Yeah, super easy to understand.
version. Yeah, super easy to understand.
Every JavaScript object is stored in V8.
Inside objects point to each other using 64-bit memory addresses on a 64-bit system. For example, an object like name
system. For example, an object like name Alice age 30 has several internal pointers. One to its hidden class, the
pointers. One to its hidden class, the shape, one to where its properties are stored, and one to the string Alice, which is on the heap. So, you might imagine all of these pointers can add up in a typical node app, taking up a lot
of valuable heap space. On a 64-bit system, each pointer uses eight bytes, even though most V8 heaps are much smaller than the huge address space they could use. For real, pointer compression
could use. For real, pointer compression takes advantage of this. Instead of
saving full 64-bit memory addresses, V8 stores 32-bit offsets relative distances from a fixed starting point called the base address. When reading from the
base address. When reading from the heap, which is the section of memory where objects are stored, it rebuilds the full pointer by adding the base and the offset. When writing, it compresses
the offset. When writing, it compresses the pointer by subtracting the base from the full address. Interesting. So, this
is basically just Unix time, but for memory. Seriously though, it makes a lot
memory. Seriously though, it makes a lot of sense. You still have the 64-bit
of sense. You still have the 64-bit positioning, but if you're only using 4 gigs of memory, you can fit the start to the end in 32 bits, makes sense to just pick a fixed starting point and work up
from there the same way that Unix time does. The trade-off is simple. Each
does. The trade-off is simple. Each
pointer goes from eight bytes to four bytes. Where structures with many
bytes. Where structures with many pointers like objects, arrays, closures, maps, and sets, this can reduce memory consumption by around 50%. Like, like
think about that. For every single key that you have in an object with a value that is something like a string or a subobject, a pointer is instantiated for all of those. And if you can cut the
size of those down by 50%, that's going to be a lot of savings. And then we have the CPU side. Each heap access now needs one extra addition for reads or a subtraction for writes. To put it into
perspective, this extra operations akin to a level one cache hit in terms of computational effort. These are
computational effort. These are incredibly fast operations and although millions of them are occurring every second, their impact ends up being minimal, akin to a gentle ripple and a vast ocean of processing tasks. And also
to be clear, this additional plus or minus is not happening in the JavaScript world. This is happening in the C++ code
world. This is happening in the C++ code that powers all of this. And then we have the heap limit. 32-bit offsets can only reach 4 gigs of memory per V8 isolate, a separate instance of the JavaScript engine with its own memory
and execution state. For most node services, which usually use less than a gig, this isn't a problem. Looking at
you, cursor. Looking at you. But for
everybody else, yeah, that's totally fine. If I see a node service using more
fine. If I see a node service using more than a gig of memory, I just assume it's memory leaking and something is broken.
And usually it is. That is a good call.
I didn't even know this. Apparently,
Chrome has had this on since 2020, which would make sense for Chrome for sure because any tab using more than a gig of RAM is terrifying. more than four gigs is just throw the computer off a cliff and start again from scratch. Like if
you have a single tab using four gigs of RAM, we've already lost the plot long ago. So Crow adding this on makes a ton
ago. So Crow adding this on makes a ton of sense. Previously using this feature
of sense. Previously using this feature required setting a flag at compile time, which often felt like an expert only option for many devs. Yep, totally agree. As I said before, this is a thing
agree. As I said before, this is a thing you had to build in, so you had to compile with it. Let's just do a poll and be honest with your answer, guys.
How do you get node? Comes with my DRO.
install it with brew, etc. Some type of package manager, download from node sites or compile from scratch. I have a feeling I know how this one's going to go. NVM I would consider a package
go. NVM I would consider a package manager. I probably should have put that
manager. I probably should have put that in there. I'm personally on FNM. The
in there. I'm personally on FNM. The
point here I'm trying to make is that nobody is compiling Node from scratch. I
happen to know that compiling Node from scratch takes a long time. No one would want to do that. Oh, we got our one person who clicked the button. You get
the point, though. This is why they introduced node caged. The introduction
of node cage has transformed this enabling pointer compression with a simple oneline image swap. Again,
assuming that you're not doing your own custom docker image. You can extend the one they provide, but like that's that's a huge shift. This substantial
simplification opens the door for a much broader audience to experiment with the feature more immediately. Here's what
changed. Isolate groups. Poor
compression has been part of V8 for years. Node didn't use it before, not
years. Node didn't use it before, not because of the CPU overhead, but because of the memory cage limitations. This
will be fun seeing more about how the memory works within node. B8's pointer
compression made every isolate in a process share a single pointer cage which was a 4 gig block of memory for all compressed pointers. This meant that the main thread and all worker threads had to fit in the same 4 gigs. In
Chrome, where each type has its own process, that was fine. But for Node, where workers share a process, it was a bigger deal. Back in November of 2024,
bigger deal. Back in November of 2024, James Snell, who I've seen contributing everything from Cloudflare to Node to many other things, he was one of the ones that came in to help with the chaos I caused with the benchmarks. He
initiated this endeavor to address the challenge. I might be misremembering. I
challenge. I might be misremembering. I
thought he was also the creator of DC, which was a better fetch implementation for Node that became the official fetch implementation for Node. Might not have been him. I'm struggling to find any
been him. I'm struggling to find any real information on that. Regardless,
James has been involved in a ton of really important things in the JavaScript and Node world. There's a
reason he's at Cloudflare and there's a reason he's the one who proposed this.
He knows what he's doing. Cloudflare
also sponsored Agalia engines Andy and Dmitri to introduce new FIA features for this isolate groups which give each pointer its own compression cage. So
instead of everybody's workers being stuck in the same 4 gig cage all sharing that each one can have its own. And this
was for V8. To be clear, there's a change in V8 that they wanted because they could do this on Cloudflare to make their isolates better and also could be used on Node as well. And the C++ change
for V8 is simple. Now you can use the isolate new group field like you can pass a group to this name space and it works fine where previously you wouldn't do that and that will give each thread its own 4 gig keep. The only limit is
the systems available memory. This
change despite being started in November of 2024 took up until October of 2025.
62 lines across eight files. It took a year for 62 lines to merge. I want to just think about this for a second. This
is kind of crazy and I don't think people appreciate just how hard it is to ship stuff like this. A proposal that Cloudflare needed took a year to go from
proposal to shipped and all it was was 62 lines changed that enables the reduction of memory used by Node and other V8 apps by over 50% in many cases.
That is insane that it took that long, but also that they had the wherewithal to push through and make it happen. This
is the type of open source work that nobody sees normally that is super important and that I'm really pumped to like have happen but also see publicly here. Shout out to James and Cloudflare
here. Shout out to James and Cloudflare for making all of this happen as well as the people from Agalia who were sponsored on it too. Apparently the
pointer compression itself had been broken since Node2 and they fixed that as part of all of this. Matteo tested
the changes with a real world Nex.js SSR app and confirmed the 50% reduction in heap usage before they approved. So
let's see more about these experiments.
I'm excited. They made a mock Next Nex.js JS e-commerce app that had it was a trading card app with 10,000 cards, 100,000 listings, all the fun things like serverside rendering, search, and a simulated database delay all on a Kubernetes cluster. They just had four
Kubernetes cluster. They just had four setups all using the same hardware and app code. Standard node, node caged, and
app code. Standard node, node caged, and then their platform, which is Watt. It's
like a similar thing to Cloudflare workers. As far as I understand, it's
workers. As far as I understand, it's their open source node application server. These four tests, let's see what
server. These four tests, let's see what the usage looked like. They had a bunch of mock traffic. 20% was homepage, which was SSR. 25% was search. 20% was card details. 15% game category pages, yada
details. 15% game category pages, yada yada. Let's see what the results look
yada. Let's see what the results look like. For plain node, average latency
like. For plain node, average latency went from 39.7 milliseconds to 40.7 milliseconds, which is a bump. That's
2.5%. The P90 got meaningfully worse from 78 to 82%. That's a 5% bump. So 5%
slower for the 10% slowest stuff, but much crazier is the 7.8% 8% faster for P99. So the top 1% slowest got way
P99. So the top 1% slowest got way faster. The top 10% slowest got slightly
faster. The top 10% slowest got slightly slower. Everything else was pretty
slower. Everything else was pretty close. Fascinating. A smaller heap means
close. Fascinating. A smaller heap means the garbage collector has less work to do. So there's fewer and shorter garbage
do. So there's fewer and shorter garbage collection pauses. That's a really good
collection pauses. That's a really good point. If you're not familiar, since
point. If you're not familiar, since JavaScript doesn't have you manually creating and freeing memory, the V8 engine has to handle that for you. When
it detects that values are no longer being used or tracked, it will find them and go clean them up eventually. But it
has to choose to go do that and it's not going to block your stuff for the most part. But when it is time to clean up
part. But when it is time to clean up that stuff, that's called garbage collection. And the garbage collection
collection. And the garbage collection process will block things when it's happening. When GC is going on, other
happening. When GC is going on, other things can't be. And it's not going to take a super long time, but it will take actual time. And that's what the P99 is
actual time. And that's what the P99 is hitting. 99% of requests don't get
hitting. 99% of requests don't get blocked by garbage collection, but the 1% that do now can have the collection happen much faster, which is a huge win.
So, how' this work in their runtime? It
actually looks like it was worse for average P90 and P99. The P99 went up a ton, but the max, which is beyond P90, it's like literally the worst case on each, was down 20%. Fascinating. I'm
guessing they're just having garbage collection happen less, so they're not getting the same benefit initially. I
wish they would acknowledge this plus 18.1. I'm so curious about that. If you
18.1. I'm so curious about that. If you
compare standard node with watt, which is their runtime, plus the pointer compression stuff, the gap is pretty meaningful. The P99 is 42% faster on
meaningful. The P99 is 42% faster on their thing. Seems cool. I should
their thing. Seems cool. I should
definitely look into Wattmore.
Apparently, they had published benchmarks before that were not looking good. It showed a 56% overhead when they
good. It showed a 56% overhead when they enabled this. And that was a simple
enabled this. And that was a simple hello world nexjs starter app. A simple
hello world SSR page mostly does V8 internal work. Compiling templates,
internal work. Compiling templates, diffing the virtual DOM, and joining strings. There's no IO, no data loading,
strings. There's no IO, no data loading, and no real app logic. Every operation
goes through pointer decompression. Real
apps are different. Typical request
spends most of his time on things like IO weight, data marshalling, framework overhead like routing, middleware chains, header processing, and then OS and network stuff like scheduling with TCP, TLS, all those things. I agree for
the most part. I've tried to push this a lot with people. If you're running a script that just curls an HTML page being generated by React, probably not the best benchmark. There's a ton of other things that matter. I still think
that a 56% decrease in performance on a thing that doesn't do all this work is really bad. And I want more info on that
really bad. And I want more info on that case. Hopefully, we get more of that
case. Hopefully, we get more of that here. I like this framing. As the ratio
here. I like this framing. As the ratio of real work to pure V8 pointer chasing increases, the overhead of pointer compression shrinks proportionally. Our
e-commerce app includes simulated database delays of 1 to 5 milliseconds, JSON parsing of data sets with 10,000 plus records, search filtering, pageionation, and full SSR rendering with React. In that context, the pointer
with React. In that context, the pointer decompression overheads round to noise.
Yes. Takeaway being that you should always use realistic workloads for benchmarking. Yeah, I tried this myself
benchmarking. Yeah, I tried this myself when I was getting in all of the fights about running a [ __ ] trig function in a loop. I I am proud. I feel like I had
a loop. I I am proud. I feel like I had some very small but meaningful impact on this with the benchmarks I did in the past, getting Cloudflare to like kick up their focus on performance stuff. I'm
proud of the impact I may or may not have had here. And I'm just pumped to see it all shipping honestly. And to be clear, the impact I had was not I did something great. It's that I made
something great. It's that I made something stupid and that turned into other people making something great, which is really cool. There's a very fun deep dive here about how the garbage collection stuff works, which I'm excited to read through. There are
several types of garbage collection within V8, and these are the two that matter primarily. There's the minor GC,
matter primarily. There's the minor GC, which is scavenge. It will copy a live object from the young generation. Time
is proportional to the number of live objects and their size. So, this is, as I understand it, when a new object is generated, it needs to be put somewhere.
So, a tiny bit of garbage collection will be done just to figure out what space is available so I can drop it in wherever that fits. And then we have the major GC, the mark sweep compact. This
is the thing that actually goes through everything in memory, identifies all of the reachable objects, sweeps the dead ones, and then optionally compacts. The
time for this depends on the total heap size and level of fragmentation. To
quickly simplify what this is saying, you have a bunch of stuff in memory.
Things point from one thing to another.
If there's nothing pointing to an object anymore, it doesn't need to be held in memory anymore and it can be disposed of. There's a lot of edges where V8
of. There's a lot of edges where V8 isn't necessarily able to perfectly know if the thing is still relevant or not. I
did a video about weird V8 quirks forever ago that I dove in on all this with. The way this works is going
with. The way this works is going through finding anything that doesn't have things pointing at it and then cleaning that up, but has to go through everything in memory to do this. So, it
takes actual time. We're talking
milliseconds, but a number of milliseconds. It means something. It's
milliseconds. It means something. It's
also worth noting that garbage collection is a much bigger issue in languages that are, I'll be real, faster. Like managing your own memory in
faster. Like managing your own memory in JavaScript wouldn't make much sense because the runtime is still going to be the major bottleneck. Whereas in
something like Go, the garbage collection is the only thing that is slow. So, it will often end up becoming
slow. So, it will often end up becoming a blocker in certain scenarios. That's
why companies like Discord are moving some of their backend stuff off of Go to Rust. Even though Go is just as fast as
Rust. Even though Go is just as fast as Rust for most things, the fact that the garbage collection can make some requests slower caused unreliable traffic patterns. Like if it does
traffic patterns. Like if it does garbage collection when you're making a request, your traffic takes longer than it does for other people. That's why
they moved to Rust because they wanted to reduce the amount of times that garbage collection would interfere with real traffic. Dancing around garbage
real traffic. Dancing around garbage collection is an art and I'm so thankful that us in the JavaScript world don't have to do it ourselves. that the much smarter people building Node and V8 for us have solved so many of these things.
But yeah, the the major GC is the thing that matters here for the numbers. And
if the size of the pointers is half of the size, you end up having way less data to deal with when you go do the garbage collection. Because of the
garbage collection. Because of the pointer compression, every object is smaller. This is domino effects. Things
smaller. This is domino effects. Things
like the objects fit in fewer cache lines because the compressed object that fits in a single 64- cache instead of two, that makes garbage collection much easier. If a piece of data takes up
easier. If a piece of data takes up twice as many slots that you have to check, you have to check twice as much [ __ ] when you do the garbage collection.
More importantly though, the young generation, which is their primary, like these are things that were recently instantiated that we probably want to keep in like immediate memory that will fill more slowly because smaller objects
are taking up less space. The major GC has less to scan. A 1 gig heap with compressed pointers contains the same logical data as a 2 gig heap without. So
the scan actually takes less time of course and the compaction is moving things around less because when you do the compaction you're taking the things that are gone and flattening to the front. I guess we do need Excal after
front. I guess we do need Excal after all. If we have our memory and in our
all. If we have our memory and in our memory we have stuff like let's say we have object one, object two. Object two
is actually quite big. Object three. And
then we'll round this out with object four. And now we have this new object,
four. And now we have this new object, object five. And we want to instantiate
object five. And we want to instantiate this and put this into memory. There's a
problem though. There's no [ __ ] space. So, this triggers a garbage
space. So, this triggers a garbage collection. Let's say object one points
collection. Let's say object one points to three, object three points to four, and nothing's pointing to two. What ends
up happening with the garbage collection pass? Two gets deleted, and then these
pass? Two gets deleted, and then these get moved. You might be questioning why
get moved. You might be questioning why move the things. Well, I'll show you.
Let's say it was actually like this. We
had object three, object one, object four, and we have this space between the two. But this object is too big to fit
two. But this object is too big to fit in either of those spaces. This is
called fragmentation. So the reason that we move everything is the same reason we defrag our drives. It's so all of the free space is at the end. So it's easier to add in new [ __ ] without having to
unfurl it and break it out across so many things. It's also why virtualizing
many things. It's also why virtualizing memory is so valuable because if the OS can handle the fragmentation part and you're just dropping pointers directly to things that aren't real but are in
your space is great. And obviously you could understand why things being half the size means less movement has to happen. Thereby this process ends up
happen. Thereby this process ends up being cheaper and faster. And as Matteo says here, the end result is that GC pauses are both shorter and less frequent. This corresponds to what we
frequent. This corresponds to what we saw in the P99 and max latency numbers.
When a longtail request lines up with a GC pause, the pause is now shorter. They
drop some fun cost numbers here, too, which are very interesting. If your
current node on Kubernetes takes 2 gigs of memory per pod, this compression can knock you down to one gig. You get the same app in performance but can run twice as many pods per node or use half as many nodes. What would having pod memory do to your bill? Take a moment to
calculate the potential savings base on your current setup. For them, this example was based on a six node M52x large, which is at 38 cents per hour
roughly. That's about $16,600
roughly. That's about $16,600 a year. If you can go from six nodes to
a year. If you can go from six nodes to three because you need half as much RAM, you're saving $8,300 a year. If you have a real production fleet with 50 or more of these nodes, the savings could be
between 80 and $100,000 a year without changing any code. That's pretty huge.
You know what would actually benefit a lot from this? OpenClaw. Someone should
make an OpenClaw Docker image based on this one because that compression is going to help it a ton. You'll be able to run OpenClaw on way smaller boxes as a result of this. Like genuinely, that's really cool. This also means you can
really cool. This also means you can double your tenant density. Multi-tenant
SAS platforms where each tenant runs in their own isolated node process often will hit memory limits as the binding constraint for density. Yeah, absolutely
agree. If you have five customers and all of them are running on the same box and each one needs 2 gigs of memory, even if they're not doing processing all the time, like you're not bottlenecked on CPU, you're bottlenecked on RAM. If
you can cut the amount of RAM each of those instances needs by half, you can put twice as many things on that box.
That's a big deal for a lot of memory constrained work which is a lot of these like multi-runner cases. This also will help unlock edge deployments. I didn't
even thought of that. That's a big deal.
If you don't need as much RAM, you are much less blocked when you try to deploy things at edge. Lambda at edge, cloudy workers and dino deploy have strict memory limits between 128 megs to 512 megs. The reason Cloudflare sponsored
megs. The reason Cloudflare sponsored this work is because the worker runtime needed pointer compression to support more isolates. Yeah, they can now put
more isolates. Yeah, they can now put twice as many things in when they're not using CPU. That's crazy. In a world
using CPU. That's crazy. In a world where 128 gigs of server RAM is $2,500, this means a lot. Yeah, I didn't even I should have thought about that before.
The fact that RAM is so expensive makes this a really really good welltimed change. This is another one I hadn't
change. This is another one I hadn't thought of. For websocket apps, you
thought of. For websocket apps, you often end up blocked on memory because if you're maintaining all of these connections and don't have anything to send yet, you're not using much CPU, but you are using a shitload of RAM because
each of those connections is conservatively taking 10 kilobytes per connection on the heap. That puts you at 500 megabytes of RAM with 50,000 connections on one box. Now you're down to 250 meg. That's huge. So you can have
a 100,000 connections on the server that could previously only do 50,000. Huge.
Like if some random service was promising me this, I wouldn't believe them. But this is Matteo Kina, who is my
them. But this is Matteo Kina, who is my go-to reference point for things going on in Node. One of the best C++ engineers I've ever met. Matteo is
really good. He is writing this as a maintainer of Node, not as a person trying to sell you something. So I do believe him fully with this. They do
call out the fact that if you do actually need more than 4 gigs of RAM per isolate, then this doesn't work.
This is not going to help. This is
probably going to cause problems. It's not an option. But only the VHJS heap is what's living inside the cage. Native
add-on allocations and array buffer do not count against that limit. That's
good to know. Like I don't think any of the apps using that much RAM aren't offloading things like that. So almost
certainly good. But there's one more compatibility constraint. Native add-ons
compatibility constraint. Native add-ons built with the legacy NAN, the native abstractions for Node, won't work with pointer compression enabled. NAN exposes
V8 internals directly, and pointer compression changes the internal representation of those objects. When
you recompile, the AI is different.
Add-ons built on the Node API, which was formerly an API, are unaffected because node checks away those pointer layers.
Apparently, the most popular packages have all long since migrated. Sharp,
brypt, canvas, SQLite 3, level down, buffer, util, UTF8, validate, all are good. Apparently, NodeGit is one of the
good. Apparently, NodeGit is one of the few things that is still using nan. If
you run npm lsn, if nothing shows up, you're good. This will actually be fun.
you're good. This will actually be fun.
Let's try that. A T3 chat codebase.
Woo! T3 chat is clear. We can turn this on. That's actually really exciting. Our
on. That's actually really exciting. Our
memory usage is not great. That'll be a huge win. So, if you can run this
huge win. So, if you can run this command and nothing happens and you're deploying your node apps via Docker, I'm with him. You absolutely should try
with him. You absolutely should try this. There isn't much to lose. I'm
this. There isn't much to lose. I'm
going to try turning this on for a few things that I'm working on right now.
I'm actually really curious how this goes. Also, how it performs compared to
goes. Also, how it performs compared to bun. This was a phenomenal article.
bun. This was a phenomenal article.
Thank you, Matteo, for writing this.
This is a super cool deep dive. I hope
that y'all learned something as I read this. And if you haven't, go give Matteo
this. And if you haven't, go give Matteo a follow. He's putting so much work in
a follow. He's putting so much work in for all of this and is not compensated fairly for it. I can promise you guys that much. He's one of those quiet
that much. He's one of those quiet people behind the scenes keeping the internet functioning as we know it. Give
him a follow. Support him for the work he does. I appreciate him immensely and
he does. I appreciate him immensely and I hope you guys do too. Are you guys as excited as I am about using less RAM or do you just want to save money now that it's all so expensive? Let me know how you feel in the comments.
Loading video analysis...