LongCut logo

They cut Node.js Memory in half 👀

By Theo - t3․gg

Summary

Topics Covered

  • Pointer Compression Halves Node Memory
  • NodeCage Enables One-Line Swap
  • Smaller Heaps Slash GC Pauses
  • Double Density Via Memory Halving

Full Transcript

Cut your Noode.js memory usage in half with this one simple trick. Okay, the

cut the clickbait. This is actually a really cool topic. As I'm sure most of you all know, JavaScript is not exactly memory efficient. That's why we see

memory efficient. That's why we see hilarious things like cursor using 3.38 terab of RAM. JavaScript's a great language that does a lot of really cool things, but sadly memory management has

never been one of the ones it is strongest with. Which is why I was

strongest with. Which is why I was really excited to see Matteo, who is one of the main contributors to Node, come out with a really cool post about how we cut Node memory usage in half with a oneline Docker image swap. To be clear,

it's not one line of code changing here.

A Docker image swap is a pretty big deal depending on where you're doing it, but the things that led to this performance win are genuinely really fascinating.

Feel like most engineers don't appreciate just how much hard engineering work goes into JavaScript because they're quick to dismiss it as JavaScript. But V8 is one of the most

JavaScript. But V8 is one of the most complex and impressive projects ever built, and Node is not too far from it.

The effort that's going into the C++ that is powering all of this stuff is hard to put into words, and some of it's really cool. I'm constantly surprised by

really cool. I'm constantly surprised by how many hidden flags and features there are inside of V8 in Node that can fundamentally change the performance characteristics. This is one I did not

characteristics. This is one I did not know about that I think is really, really cool, and I can't wait to tell you all about it after a quick word from today's sponsor. If you don't have any

today's sponsor. If you don't have any users, you can skip this section, but for those who do, it's important to understand what they're actually doing with your service. And that's why today's sponsor is so important to me.

HostG users are doing. I know this looks like an operating system, but they're actually a suite of product tools, things that you need to build real products. The main thing that you're

products. The main thing that you're going to use them for is analytics, and they are the best analytics provider.

Open source, too, but it doesn't really matter cuz their hosting is absurdly cheap. Their free tier gives you a

cheap. Their free tier gives you a million events, 5,000 session recordings, a million requests to feature flags, 1,500 survey results, 100,000 errors and exceptions, and so much more. The data warehouse is one of

much more. The data warehouse is one of the coolest parts, though, because you can integrate with other services.

Integrations like Superbase and Stripe are super useful. Believe me, I spent way too much time in the Stripe dashboard. Post Hogs is significantly

dashboard. Post Hogs is significantly better, and it's like two clicks to set up. I was signing in to show you guys

up. I was signing in to show you guys quick and I just their vibe is hilarious. Just the fact that they have

hilarious. Just the fact that they have a special Valentine's Day little thing is just that's how they are. Are you

kidding?

Post hog. I want to just show off the data and your the vibes are immaculate. If you want a data company that doesn't take themselves too seriously, but is a serious contender for managing all of your data, look no further than

soyb.link/postthog.

soyb.link/postthog.

Now, let's dive into how Matteo was able to cut Node's memory usage in half, as well as any potential negative side effects of this. V8, the C++ engine under the proverbial hood of JavaScript,

includes a feature that many Node developers aren't familiar with. This

feature, pointer compression, is a method for using smaller memory references in the JavaScript heap, reducing each pointer from 64 bits to 32 bits. The net is that you wind up using

bits. The net is that you wind up using about 50% less memory for the same app without having to change any code.

Pretty great, right? Well, almost.

Node.js does not enable pointer compression by default for two historical reasons. I find this to be

historical reasons. I find this to be really interesting. There's been a

really interesting. There's been a couple times where this happened. One of

the most famous ones was when I got in the fight with that person who claimed I was lying about performance differences between Verscell and Cloudflare because when he ran a trigonometry function in a loop, it performed better on Cloudflare

than Verscell. The reason for that was

than Verscell. The reason for that was actually [ __ ] hilarious. Cloudflare

doesn't run Node. Cloudflare runs their own V8-based engine. Verscell runs Node.

It turns out there was a flag to do faster trig math that had been enabled by Cloudflare for their runtime of V8 that was not enabled by default in Node runtimes. Cloudflare were the ones who

runtimes. Cloudflare were the ones who went and found this, fixed it, and got it upstreamed into Node itself, thereby killing the gap there. But that was just like one trigonometry function that

didn't have the flag enabled to make it faster. Thanks, Pron, for that. But

faster. Thanks, Pron, for that. But

yeah, I still can't believe that my [ __ ] posting and silly benchmarks ended up making Cloudflare 3 to 10 times faster for various workloads. So cool. And a

lot of that just comes down to which flags are enabled and what values are set for them. People don't appreciate how much config there is in Node because none of us ever hit it. Like in

Typescript, you go edit the TS config, but in Node, you don't really config anything. You just tell it to run a

anything. You just tell it to run a JavaScript file. But there are so many

JavaScript file. But there are so many flags you can enable in the building and bundling of node that nobody ever touches and seeing into them with stuff like this is really fun. So let's see the historical reasons why this

particular flag has not been enabled. So

going through these reasons, first is the 4 gig cage limitation which means that enabling pointer compression required the entire node process to share a single 4 gig memory space between the main thread and all the

worker threads. This is a significant

worker threads. This is a significant issue. Cloudflare Ngalia partnered to

issue. Cloudflare Ngalia partnered to solve this so that the cage could be per isolate, an individual instance of the V8 engine. To break this down a little

V8 engine. To break this down a little bit for those who aren't as familiar with the worker model in JavaScript and V8, it is possible to spin up relatively safe isolates in V8 that have their own

pool of memory. So, if I have some work going on in the main thread, like I'm trying to make sure when people hit buttons on the keyboard, something appears as quickly as possible, but I want to do something else that is

complex computationally, like generate some UI or run some complex math. If you

do that on the main thread, other inputs and other things the user is doing are going to be blocked by that background work. Workers are a way to effectively

work. Workers are a way to effectively spin up another instance of the JS engine. It's not really a new instance.

engine. It's not really a new instance.

a sub isolate within it that has its own memory, its own runner, and can be run in parallel to what you're doing in the main thread. So that things happening in

main thread. So that things happening in the worker don't block the main thread.

But in order for that to work, it needs to have its own memory that it's operating against. You cannot share

operating against. You cannot share memory between a worker and the main thread. You have to pass events between

thread. You have to pass events between the two. As a result of this complexity,

the two. As a result of this complexity, a lot of people have just never built with workers. I don't really know many

with workers. I don't really know many JavaScript developers that have used workers in the browser. I'm not talking about Cloudflare workers. talking about

browser workers for other threads to prevent blocking main thread as you're doing things. I just don't know many who

doing things. I just don't know many who have used it that way. Cloudflare uses

them very heavily. The reason the platform is called Cloudflare workers is because you don't get your own instance of V8 when you use Cloudflare. The V8

instance has workers spun up for every request. So the same way I could in the

request. So the same way I could in the browser spin up a worker to do something else while the user is still using the site, Cloudflare has the one main V8 instance with tons of workers that are

spun up in order to do these other things so that they can run on the same instance of V8 thereby massively reducing Cloudflare's costs for hosting and also allowing them to be as cheap as they are relatively speaking. And yes,

Nean, web workers are the right term for the version for the web, but what the [ __ ] would be called the ones in node if that was the case? The point I'm trying to make here is that Cloudflare has invested a lot in doing things to

prevent weird memory characteristics in workers because they want this to work as well as possible for the stuff they are hosting. And the problem here was

are hosting. And the problem here was that the entire node process was sharing a single 4 gig memory space and whenever a new worker came up, it got to reserve some portion of that because it couldn't

reuse the other things. So what's the other problem that prevented this? Some

worried that compressing and decompressing pointers on each heap access would introduce performance overhead. And I understand this concern.

overhead. And I understand this concern.

I actually would have had the same concern that if you have to do something extra, like one more cycle, one more step before you could even access the values in memory. That sounds really bad. And that's why Cloudflare, Agalia,

bad. And that's why Cloudflare, Agalia, and the Node.js project all collaborated to determine exactly what kind of overhead existed and assess whether it would impact real world apps. To test

this, they created Node Caged, a Node 25 Docker image with pointer compression turned on. Because you can't just pass a

turned on. Because you can't just pass a flag to node. You can't just be like node- compression or something. it has

to be built with this on. So they made a custom build of node 25, put it in docker so you could easily swap it without having to reinstall node on your existing instance and then run production level benchmarks with this on

AWS EKS. In short, we achieved 50%

AWS EKS. In short, we achieved 50% memory savings with only a 2 to 4% increase in average latency across real world workloads and a reduced P99 latency by 7%. For most teams, this

trade-off is an easy choice.

Fascinating. I would expect this to become the default very very soon. Let's

talk a bit more about how all of this works. First though, uh Nean clarified

works. First though, uh Nean clarified that there are a lot of different types of workers. Cloudflare workers, web

of workers. Cloudflare workers, web workers, service workers, parenthesis, node worker threads cuz these are called worker threads, but they are the node version. Yeah, super easy to understand.

version. Yeah, super easy to understand.

Every JavaScript object is stored in V8.

Inside objects point to each other using 64-bit memory addresses on a 64-bit system. For example, an object like name

system. For example, an object like name Alice age 30 has several internal pointers. One to its hidden class, the

pointers. One to its hidden class, the shape, one to where its properties are stored, and one to the string Alice, which is on the heap. So, you might imagine all of these pointers can add up in a typical node app, taking up a lot

of valuable heap space. On a 64-bit system, each pointer uses eight bytes, even though most V8 heaps are much smaller than the huge address space they could use. For real, pointer compression

could use. For real, pointer compression takes advantage of this. Instead of

saving full 64-bit memory addresses, V8 stores 32-bit offsets relative distances from a fixed starting point called the base address. When reading from the

base address. When reading from the heap, which is the section of memory where objects are stored, it rebuilds the full pointer by adding the base and the offset. When writing, it compresses

the offset. When writing, it compresses the pointer by subtracting the base from the full address. Interesting. So, this

is basically just Unix time, but for memory. Seriously though, it makes a lot

memory. Seriously though, it makes a lot of sense. You still have the 64-bit

of sense. You still have the 64-bit positioning, but if you're only using 4 gigs of memory, you can fit the start to the end in 32 bits, makes sense to just pick a fixed starting point and work up

from there the same way that Unix time does. The trade-off is simple. Each

does. The trade-off is simple. Each

pointer goes from eight bytes to four bytes. Where structures with many

bytes. Where structures with many pointers like objects, arrays, closures, maps, and sets, this can reduce memory consumption by around 50%. Like, like

think about that. For every single key that you have in an object with a value that is something like a string or a subobject, a pointer is instantiated for all of those. And if you can cut the

size of those down by 50%, that's going to be a lot of savings. And then we have the CPU side. Each heap access now needs one extra addition for reads or a subtraction for writes. To put it into

perspective, this extra operations akin to a level one cache hit in terms of computational effort. These are

computational effort. These are incredibly fast operations and although millions of them are occurring every second, their impact ends up being minimal, akin to a gentle ripple and a vast ocean of processing tasks. And also

to be clear, this additional plus or minus is not happening in the JavaScript world. This is happening in the C++ code

world. This is happening in the C++ code that powers all of this. And then we have the heap limit. 32-bit offsets can only reach 4 gigs of memory per V8 isolate, a separate instance of the JavaScript engine with its own memory

and execution state. For most node services, which usually use less than a gig, this isn't a problem. Looking at

you, cursor. Looking at you. But for

everybody else, yeah, that's totally fine. If I see a node service using more

fine. If I see a node service using more than a gig of memory, I just assume it's memory leaking and something is broken.

And usually it is. That is a good call.

I didn't even know this. Apparently,

Chrome has had this on since 2020, which would make sense for Chrome for sure because any tab using more than a gig of RAM is terrifying. more than four gigs is just throw the computer off a cliff and start again from scratch. Like if

you have a single tab using four gigs of RAM, we've already lost the plot long ago. So Crow adding this on makes a ton

ago. So Crow adding this on makes a ton of sense. Previously using this feature

of sense. Previously using this feature required setting a flag at compile time, which often felt like an expert only option for many devs. Yep, totally agree. As I said before, this is a thing

agree. As I said before, this is a thing you had to build in, so you had to compile with it. Let's just do a poll and be honest with your answer, guys.

How do you get node? Comes with my DRO.

install it with brew, etc. Some type of package manager, download from node sites or compile from scratch. I have a feeling I know how this one's going to go. NVM I would consider a package

go. NVM I would consider a package manager. I probably should have put that

manager. I probably should have put that in there. I'm personally on FNM. The

in there. I'm personally on FNM. The

point here I'm trying to make is that nobody is compiling Node from scratch. I

happen to know that compiling Node from scratch takes a long time. No one would want to do that. Oh, we got our one person who clicked the button. You get

the point, though. This is why they introduced node caged. The introduction

of node cage has transformed this enabling pointer compression with a simple oneline image swap. Again,

assuming that you're not doing your own custom docker image. You can extend the one they provide, but like that's that's a huge shift. This substantial

simplification opens the door for a much broader audience to experiment with the feature more immediately. Here's what

changed. Isolate groups. Poor

compression has been part of V8 for years. Node didn't use it before, not

years. Node didn't use it before, not because of the CPU overhead, but because of the memory cage limitations. This

will be fun seeing more about how the memory works within node. B8's pointer

compression made every isolate in a process share a single pointer cage which was a 4 gig block of memory for all compressed pointers. This meant that the main thread and all worker threads had to fit in the same 4 gigs. In

Chrome, where each type has its own process, that was fine. But for Node, where workers share a process, it was a bigger deal. Back in November of 2024,

bigger deal. Back in November of 2024, James Snell, who I've seen contributing everything from Cloudflare to Node to many other things, he was one of the ones that came in to help with the chaos I caused with the benchmarks. He

initiated this endeavor to address the challenge. I might be misremembering. I

challenge. I might be misremembering. I

thought he was also the creator of DC, which was a better fetch implementation for Node that became the official fetch implementation for Node. Might not have been him. I'm struggling to find any

been him. I'm struggling to find any real information on that. Regardless,

James has been involved in a ton of really important things in the JavaScript and Node world. There's a

reason he's at Cloudflare and there's a reason he's the one who proposed this.

He knows what he's doing. Cloudflare

also sponsored Agalia engines Andy and Dmitri to introduce new FIA features for this isolate groups which give each pointer its own compression cage. So

instead of everybody's workers being stuck in the same 4 gig cage all sharing that each one can have its own. And this

was for V8. To be clear, there's a change in V8 that they wanted because they could do this on Cloudflare to make their isolates better and also could be used on Node as well. And the C++ change

for V8 is simple. Now you can use the isolate new group field like you can pass a group to this name space and it works fine where previously you wouldn't do that and that will give each thread its own 4 gig keep. The only limit is

the systems available memory. This

change despite being started in November of 2024 took up until October of 2025.

62 lines across eight files. It took a year for 62 lines to merge. I want to just think about this for a second. This

is kind of crazy and I don't think people appreciate just how hard it is to ship stuff like this. A proposal that Cloudflare needed took a year to go from

proposal to shipped and all it was was 62 lines changed that enables the reduction of memory used by Node and other V8 apps by over 50% in many cases.

That is insane that it took that long, but also that they had the wherewithal to push through and make it happen. This

is the type of open source work that nobody sees normally that is super important and that I'm really pumped to like have happen but also see publicly here. Shout out to James and Cloudflare

here. Shout out to James and Cloudflare for making all of this happen as well as the people from Agalia who were sponsored on it too. Apparently the

pointer compression itself had been broken since Node2 and they fixed that as part of all of this. Matteo tested

the changes with a real world Nex.js SSR app and confirmed the 50% reduction in heap usage before they approved. So

let's see more about these experiments.

I'm excited. They made a mock Next Nex.js JS e-commerce app that had it was a trading card app with 10,000 cards, 100,000 listings, all the fun things like serverside rendering, search, and a simulated database delay all on a Kubernetes cluster. They just had four

Kubernetes cluster. They just had four setups all using the same hardware and app code. Standard node, node caged, and

app code. Standard node, node caged, and then their platform, which is Watt. It's

like a similar thing to Cloudflare workers. As far as I understand, it's

workers. As far as I understand, it's their open source node application server. These four tests, let's see what

server. These four tests, let's see what the usage looked like. They had a bunch of mock traffic. 20% was homepage, which was SSR. 25% was search. 20% was card details. 15% game category pages, yada

details. 15% game category pages, yada yada. Let's see what the results look

yada. Let's see what the results look like. For plain node, average latency

like. For plain node, average latency went from 39.7 milliseconds to 40.7 milliseconds, which is a bump. That's

2.5%. The P90 got meaningfully worse from 78 to 82%. That's a 5% bump. So 5%

slower for the 10% slowest stuff, but much crazier is the 7.8% 8% faster for P99. So the top 1% slowest got way

P99. So the top 1% slowest got way faster. The top 10% slowest got slightly

faster. The top 10% slowest got slightly slower. Everything else was pretty

slower. Everything else was pretty close. Fascinating. A smaller heap means

close. Fascinating. A smaller heap means the garbage collector has less work to do. So there's fewer and shorter garbage

do. So there's fewer and shorter garbage collection pauses. That's a really good

collection pauses. That's a really good point. If you're not familiar, since

point. If you're not familiar, since JavaScript doesn't have you manually creating and freeing memory, the V8 engine has to handle that for you. When

it detects that values are no longer being used or tracked, it will find them and go clean them up eventually. But it

has to choose to go do that and it's not going to block your stuff for the most part. But when it is time to clean up

part. But when it is time to clean up that stuff, that's called garbage collection. And the garbage collection

collection. And the garbage collection process will block things when it's happening. When GC is going on, other

happening. When GC is going on, other things can't be. And it's not going to take a super long time, but it will take actual time. And that's what the P99 is

actual time. And that's what the P99 is hitting. 99% of requests don't get

hitting. 99% of requests don't get blocked by garbage collection, but the 1% that do now can have the collection happen much faster, which is a huge win.

So, how' this work in their runtime? It

actually looks like it was worse for average P90 and P99. The P99 went up a ton, but the max, which is beyond P90, it's like literally the worst case on each, was down 20%. Fascinating. I'm

guessing they're just having garbage collection happen less, so they're not getting the same benefit initially. I

wish they would acknowledge this plus 18.1. I'm so curious about that. If you

18.1. I'm so curious about that. If you

compare standard node with watt, which is their runtime, plus the pointer compression stuff, the gap is pretty meaningful. The P99 is 42% faster on

meaningful. The P99 is 42% faster on their thing. Seems cool. I should

their thing. Seems cool. I should

definitely look into Wattmore.

Apparently, they had published benchmarks before that were not looking good. It showed a 56% overhead when they

good. It showed a 56% overhead when they enabled this. And that was a simple

enabled this. And that was a simple hello world nexjs starter app. A simple

hello world SSR page mostly does V8 internal work. Compiling templates,

internal work. Compiling templates, diffing the virtual DOM, and joining strings. There's no IO, no data loading,

strings. There's no IO, no data loading, and no real app logic. Every operation

goes through pointer decompression. Real

apps are different. Typical request

spends most of his time on things like IO weight, data marshalling, framework overhead like routing, middleware chains, header processing, and then OS and network stuff like scheduling with TCP, TLS, all those things. I agree for

the most part. I've tried to push this a lot with people. If you're running a script that just curls an HTML page being generated by React, probably not the best benchmark. There's a ton of other things that matter. I still think

that a 56% decrease in performance on a thing that doesn't do all this work is really bad. And I want more info on that

really bad. And I want more info on that case. Hopefully, we get more of that

case. Hopefully, we get more of that here. I like this framing. As the ratio

here. I like this framing. As the ratio of real work to pure V8 pointer chasing increases, the overhead of pointer compression shrinks proportionally. Our

e-commerce app includes simulated database delays of 1 to 5 milliseconds, JSON parsing of data sets with 10,000 plus records, search filtering, pageionation, and full SSR rendering with React. In that context, the pointer

with React. In that context, the pointer decompression overheads round to noise.

Yes. Takeaway being that you should always use realistic workloads for benchmarking. Yeah, I tried this myself

benchmarking. Yeah, I tried this myself when I was getting in all of the fights about running a [ __ ] trig function in a loop. I I am proud. I feel like I had

a loop. I I am proud. I feel like I had some very small but meaningful impact on this with the benchmarks I did in the past, getting Cloudflare to like kick up their focus on performance stuff. I'm

proud of the impact I may or may not have had here. And I'm just pumped to see it all shipping honestly. And to be clear, the impact I had was not I did something great. It's that I made

something great. It's that I made something stupid and that turned into other people making something great, which is really cool. There's a very fun deep dive here about how the garbage collection stuff works, which I'm excited to read through. There are

several types of garbage collection within V8, and these are the two that matter primarily. There's the minor GC,

matter primarily. There's the minor GC, which is scavenge. It will copy a live object from the young generation. Time

is proportional to the number of live objects and their size. So, this is, as I understand it, when a new object is generated, it needs to be put somewhere.

So, a tiny bit of garbage collection will be done just to figure out what space is available so I can drop it in wherever that fits. And then we have the major GC, the mark sweep compact. This

is the thing that actually goes through everything in memory, identifies all of the reachable objects, sweeps the dead ones, and then optionally compacts. The

time for this depends on the total heap size and level of fragmentation. To

quickly simplify what this is saying, you have a bunch of stuff in memory.

Things point from one thing to another.

If there's nothing pointing to an object anymore, it doesn't need to be held in memory anymore and it can be disposed of. There's a lot of edges where V8

of. There's a lot of edges where V8 isn't necessarily able to perfectly know if the thing is still relevant or not. I

did a video about weird V8 quirks forever ago that I dove in on all this with. The way this works is going

with. The way this works is going through finding anything that doesn't have things pointing at it and then cleaning that up, but has to go through everything in memory to do this. So, it

takes actual time. We're talking

milliseconds, but a number of milliseconds. It means something. It's

milliseconds. It means something. It's

also worth noting that garbage collection is a much bigger issue in languages that are, I'll be real, faster. Like managing your own memory in

faster. Like managing your own memory in JavaScript wouldn't make much sense because the runtime is still going to be the major bottleneck. Whereas in

something like Go, the garbage collection is the only thing that is slow. So, it will often end up becoming

slow. So, it will often end up becoming a blocker in certain scenarios. That's

why companies like Discord are moving some of their backend stuff off of Go to Rust. Even though Go is just as fast as

Rust. Even though Go is just as fast as Rust for most things, the fact that the garbage collection can make some requests slower caused unreliable traffic patterns. Like if it does

traffic patterns. Like if it does garbage collection when you're making a request, your traffic takes longer than it does for other people. That's why

they moved to Rust because they wanted to reduce the amount of times that garbage collection would interfere with real traffic. Dancing around garbage

real traffic. Dancing around garbage collection is an art and I'm so thankful that us in the JavaScript world don't have to do it ourselves. that the much smarter people building Node and V8 for us have solved so many of these things.

But yeah, the the major GC is the thing that matters here for the numbers. And

if the size of the pointers is half of the size, you end up having way less data to deal with when you go do the garbage collection. Because of the

garbage collection. Because of the pointer compression, every object is smaller. This is domino effects. Things

smaller. This is domino effects. Things

like the objects fit in fewer cache lines because the compressed object that fits in a single 64- cache instead of two, that makes garbage collection much easier. If a piece of data takes up

easier. If a piece of data takes up twice as many slots that you have to check, you have to check twice as much [ __ ] when you do the garbage collection.

More importantly though, the young generation, which is their primary, like these are things that were recently instantiated that we probably want to keep in like immediate memory that will fill more slowly because smaller objects

are taking up less space. The major GC has less to scan. A 1 gig heap with compressed pointers contains the same logical data as a 2 gig heap without. So

the scan actually takes less time of course and the compaction is moving things around less because when you do the compaction you're taking the things that are gone and flattening to the front. I guess we do need Excal after

front. I guess we do need Excal after all. If we have our memory and in our

all. If we have our memory and in our memory we have stuff like let's say we have object one, object two. Object two

is actually quite big. Object three. And

then we'll round this out with object four. And now we have this new object,

four. And now we have this new object, object five. And we want to instantiate

object five. And we want to instantiate this and put this into memory. There's a

problem though. There's no [ __ ] space. So, this triggers a garbage

space. So, this triggers a garbage collection. Let's say object one points

collection. Let's say object one points to three, object three points to four, and nothing's pointing to two. What ends

up happening with the garbage collection pass? Two gets deleted, and then these

pass? Two gets deleted, and then these get moved. You might be questioning why

get moved. You might be questioning why move the things. Well, I'll show you.

Let's say it was actually like this. We

had object three, object one, object four, and we have this space between the two. But this object is too big to fit

two. But this object is too big to fit in either of those spaces. This is

called fragmentation. So the reason that we move everything is the same reason we defrag our drives. It's so all of the free space is at the end. So it's easier to add in new [ __ ] without having to

unfurl it and break it out across so many things. It's also why virtualizing

many things. It's also why virtualizing memory is so valuable because if the OS can handle the fragmentation part and you're just dropping pointers directly to things that aren't real but are in

your space is great. And obviously you could understand why things being half the size means less movement has to happen. Thereby this process ends up

happen. Thereby this process ends up being cheaper and faster. And as Matteo says here, the end result is that GC pauses are both shorter and less frequent. This corresponds to what we

frequent. This corresponds to what we saw in the P99 and max latency numbers.

When a longtail request lines up with a GC pause, the pause is now shorter. They

drop some fun cost numbers here, too, which are very interesting. If your

current node on Kubernetes takes 2 gigs of memory per pod, this compression can knock you down to one gig. You get the same app in performance but can run twice as many pods per node or use half as many nodes. What would having pod memory do to your bill? Take a moment to

calculate the potential savings base on your current setup. For them, this example was based on a six node M52x large, which is at 38 cents per hour

roughly. That's about $16,600

roughly. That's about $16,600 a year. If you can go from six nodes to

a year. If you can go from six nodes to three because you need half as much RAM, you're saving $8,300 a year. If you have a real production fleet with 50 or more of these nodes, the savings could be

between 80 and $100,000 a year without changing any code. That's pretty huge.

You know what would actually benefit a lot from this? OpenClaw. Someone should

make an OpenClaw Docker image based on this one because that compression is going to help it a ton. You'll be able to run OpenClaw on way smaller boxes as a result of this. Like genuinely, that's really cool. This also means you can

really cool. This also means you can double your tenant density. Multi-tenant

SAS platforms where each tenant runs in their own isolated node process often will hit memory limits as the binding constraint for density. Yeah, absolutely

agree. If you have five customers and all of them are running on the same box and each one needs 2 gigs of memory, even if they're not doing processing all the time, like you're not bottlenecked on CPU, you're bottlenecked on RAM. If

you can cut the amount of RAM each of those instances needs by half, you can put twice as many things on that box.

That's a big deal for a lot of memory constrained work which is a lot of these like multi-runner cases. This also will help unlock edge deployments. I didn't

even thought of that. That's a big deal.

If you don't need as much RAM, you are much less blocked when you try to deploy things at edge. Lambda at edge, cloudy workers and dino deploy have strict memory limits between 128 megs to 512 megs. The reason Cloudflare sponsored

megs. The reason Cloudflare sponsored this work is because the worker runtime needed pointer compression to support more isolates. Yeah, they can now put

more isolates. Yeah, they can now put twice as many things in when they're not using CPU. That's crazy. In a world

using CPU. That's crazy. In a world where 128 gigs of server RAM is $2,500, this means a lot. Yeah, I didn't even I should have thought about that before.

The fact that RAM is so expensive makes this a really really good welltimed change. This is another one I hadn't

change. This is another one I hadn't thought of. For websocket apps, you

thought of. For websocket apps, you often end up blocked on memory because if you're maintaining all of these connections and don't have anything to send yet, you're not using much CPU, but you are using a shitload of RAM because

each of those connections is conservatively taking 10 kilobytes per connection on the heap. That puts you at 500 megabytes of RAM with 50,000 connections on one box. Now you're down to 250 meg. That's huge. So you can have

a 100,000 connections on the server that could previously only do 50,000. Huge.

Like if some random service was promising me this, I wouldn't believe them. But this is Matteo Kina, who is my

them. But this is Matteo Kina, who is my go-to reference point for things going on in Node. One of the best C++ engineers I've ever met. Matteo is

really good. He is writing this as a maintainer of Node, not as a person trying to sell you something. So I do believe him fully with this. They do

call out the fact that if you do actually need more than 4 gigs of RAM per isolate, then this doesn't work.

This is not going to help. This is

probably going to cause problems. It's not an option. But only the VHJS heap is what's living inside the cage. Native

add-on allocations and array buffer do not count against that limit. That's

good to know. Like I don't think any of the apps using that much RAM aren't offloading things like that. So almost

certainly good. But there's one more compatibility constraint. Native add-ons

compatibility constraint. Native add-ons built with the legacy NAN, the native abstractions for Node, won't work with pointer compression enabled. NAN exposes

V8 internals directly, and pointer compression changes the internal representation of those objects. When

you recompile, the AI is different.

Add-ons built on the Node API, which was formerly an API, are unaffected because node checks away those pointer layers.

Apparently, the most popular packages have all long since migrated. Sharp,

brypt, canvas, SQLite 3, level down, buffer, util, UTF8, validate, all are good. Apparently, NodeGit is one of the

good. Apparently, NodeGit is one of the few things that is still using nan. If

you run npm lsn, if nothing shows up, you're good. This will actually be fun.

you're good. This will actually be fun.

Let's try that. A T3 chat codebase.

Woo! T3 chat is clear. We can turn this on. That's actually really exciting. Our

on. That's actually really exciting. Our

memory usage is not great. That'll be a huge win. So, if you can run this

huge win. So, if you can run this command and nothing happens and you're deploying your node apps via Docker, I'm with him. You absolutely should try

with him. You absolutely should try this. There isn't much to lose. I'm

this. There isn't much to lose. I'm

going to try turning this on for a few things that I'm working on right now.

I'm actually really curious how this goes. Also, how it performs compared to

goes. Also, how it performs compared to bun. This was a phenomenal article.

bun. This was a phenomenal article.

Thank you, Matteo, for writing this.

This is a super cool deep dive. I hope

that y'all learned something as I read this. And if you haven't, go give Matteo

this. And if you haven't, go give Matteo a follow. He's putting so much work in

a follow. He's putting so much work in for all of this and is not compensated fairly for it. I can promise you guys that much. He's one of those quiet

that much. He's one of those quiet people behind the scenes keeping the internet functioning as we know it. Give

him a follow. Support him for the work he does. I appreciate him immensely and

he does. I appreciate him immensely and I hope you guys do too. Are you guys as excited as I am about using less RAM or do you just want to save money now that it's all so expensive? Let me know how you feel in the comments.

Loading...

Loading video analysis...