Upgrading jemalloc

Paul Bone

jemalloc and Firefox

Most programs use the system-provided memory allocator.

malloc() free() realloc()
memalign() calloc() valloc()

Critical for performance, memory footprint and security of a computer program.

jemalloc and Firefox

  • Forked from jemalloc 2
  • In 2007
  • Probably selected for good reasons, in 2007.
  • Not great at multithreaded workloads, (multicore processors released in 2006)
  • jemalloc is now version 5.3.0

Could we upgrade it?

Why don't we just?

Upgrade jemalloc

Upgrade jemalloc

In June 2017 I heard a converstation about a prior effort to do this.

Upgrade jemalloc

glandium: [We] tried jemalloc 3 and 4. ... the gist of it is that RSS was something like +50% while performance was slightly better, but tweaking to get less memory overhead made the performance not so good anymore.

Upgrade jemalloc

jesup tested jemalloc 5 in 2023.

  • 4.45% faster (sp3)
  • 300% memory usage (suspect bad result)
  • Worth forther checking?
  • TODO: compare other features including security.

Why don't we just

Use the system allocator

System allocator

Bug 1805644, titled Speedometer 2 is ~5% faster with --disable-jemalloc

December 2022

System allocator

jemallocptmallocincrease
resident-peak479MB552MB 115%
resident202MB504MB 250%

Retested in 2024

pssst. Wait until the end of the presentation...

Missing features

  • Poisioning
  • Arenas
  • API for releasing memory to the system
  • System support varies by platform

Why don't we just?

Replace jemalloc

Replace it with mimalloc

jesup tested mimalloc in 2023

  • Lock-free implementation
  • Thread local free-lists
  • Few branches on fast-path
  • 3-4% performance win!

Replace it with mimalloc

jesup tested mimalloc in 2023

  • 0.25 - 1.5% with security features enabled
  • 38% additional memory usage
  • No arenas

Chrome uses partition alloc

The implementation is too tightly tied to Chrome's infrastructure to test with Firefox.

Anything else?

A lot of allocators gave a ~5% performance win (at the time) but added a lot more memory overhead.

Lets try the hard way

We'll continue to maintain and improve mozjemalloc.

Prior improvments in mozjemalloc

  • Guard pages around chunk payload.
  • Reduced fragmentation for allocations between 512 and 4,096 bytes.
  • Poisoning the first 256 botes of an object only.

Prior improvments in mozjemalloc

  • Lock free access for DOM.
  • Reduced lock contention by moving slow operations outside of locks
  • VirtualAlloc calls amortised - without affecting memory usage

mozjemalloc

2 years ago disabling jemalloc yeilded a 5% perf gain.

Today jemalloc is 2.5% faster than the system allocator. And still consumes a lot less memory.

Future

  • Unlock for purging memory (WIP).
  • More control of when memory is purged (hooks to scheduler).
  • Thread local caches.
  • Batch allocation/free.
  • More...

Thank you for listening

Thanks to: glandium, smaug, jesup, gcp, jstutte