Paul Bone

GC vs other memory management techniques

Someone asked on Reddit about different approaches to a memory management, including recent developments like Rust, and which is best. I’ve had opinions about this for some time, and so after writing my reply I discovered I had written an essay. So I thought I’d post it here for reference.

TL;DR: Rust’s approach is good but frequently unnecessary and mostly puts an unnecessary burden on the programmer. Tracing GC is usually better.

Rust has made a good contribution (on top of a lot of prior contributions within RBMM & type system research). It has inspired a lot of people to make similar systems. However it has also (unintentionally) encouraged the promotion of even more GC misinformation (there was already quite a lot of misinformation).

Reference counting

GC is one of a handful of topics that is very poorly understood by most developers. Starting with some myths we have "Reference counting is faster" and "Reference counting is pauseless", the first is trivially false, updating counters all the time is expensive, the second can be shown to be false by creating a large structure, having only a single pointer to its root, and dropping that pointer, freeing many magabytes of memory at once, and slowly! Languages that use ref counting in an attempt to avoid GC pauses, or worse avoid writing a simple collector are misguided.

Simple GC costs

Next we have beliefs like GC being necessarily slower than manual memory management. It is not always slower and is frequently faster (but not always). For example a bump pointer allocator (possible in moving GCs) is faster than a free list allocator (eg: malloc). Also a copying or compacting collector can free memory in constant time. Tracing however is linear with the amount of referenced memory. It will depend on your program as which dominates, memory being freed or memory being retained. I think the former usually dominates due to the young generation hypothesis (the idea that most objects have very short lifetimes). Speaking of the young generation hypothesis the costs of tracing are reduced dramatically via things such as generational collection, and probably garbage first collection (I haven’t read much about it yet). Finally compacting collection can reduce memory fragmentation, improving cache and TLB performance, which will improve performance for the program as a whole.

Pause times

Finally, the elephant in the room, pause times are usually what people think of when they think of GC performance. Often we care about responsiveness more than we do throughput. To reduce pause times, via incremental, concurrent and parallel collection. These techniques make the tracing work less intrusive, first by doing only a bit at a time (incremental) which creates more frequent but shorter pauses. Concurrent does tracing work concurrently with the mutator (in another thread) this creates only extremely brief pauses when the mutator and collector need to co-ordinate. Parallel collection uses multiple threads for tracing, shortening the time spent tracing. These aren’t always clear wins, they make throughput worse due to the extra work that both the collector and mutator need to do to keep things thread safe and ensure no memory is freed while it’s still in use. They may also trade off how quickly memory is reclaimed. These are difficult to get right but there is a lot of prior work in the literature. Also the Go developers have written some good articles about this (however they fail to mention that they trade-off against throughput).

That leaves making pause times predictable or deterministic. Deterministic pause times, which would be required for hard-realtime processing, are very difficult, but I think there is research out there about this. However in this case I would advise not using a collector, using one of the other approaches instead. However there are very few applications that are truly hard-realtime. However some applications such as audio processing, video games, and very large applications (Mozilla and the servo project) may be better with an approach like Rust’s rather than a "mostly-predictable" GC, My gut feeling is that even these would be better with a GC (at least saving a lot of developer time) but I don’t actually know. I hope to seek out this information some time in the next few months. Finally for predictable collectors there are a number of things that could help, concurrent collection is a big one, but also incremental collection with budgets for each slice of incremental work, and budgets could be increased when pauses can be tolerated, such as when an application is in the background. I think providing an API that allows developers to specify and modify GC tuning parameters including tolerance for pauses is a good idea.

Finally there are a handful of small optimisations that can be used to cut down pause times further, for example a concurrent collector still needs to mark the roots including the stack, fake stack frames can be created that create barriers, allowing the lower part of the stack to be traced safely while the upper part is used by the mutator. If the program returns into the lower part the barrier is executed and the thread can be paused or the collector can finish using and release the next part of the stack.

Conclusion

There are a number of misunderstandings even myths about GC that lead us to make poor choices, and there are a lot of things that can make GC live up to our expectations. Some of them are quite difficult to implement but I think that they’re worth-while. When compared with something like Rust’s type system fast & predictable GC is a burden that’s paid by the language implementers (who are few) and not by the language users (who are many): even if it’s more effort per implementer it saves more total effort because the number of users is so much greater.

Disclaimer

I work for Mozilla on SpiderMonkey’s garbage collector. These opinions are my own and do not reflect Mozilla’s, or the SpiderMonkey or Rust projects.

Addendum

A trade-off

I should have added, and was later reminded after publishing this, that a good option is probably to allow both GC and lifetimes. It’s not unusual to have a garbage collected language in which the compiler may preform escape analysis, RBMM or compile time garbage collection to attempt to determine the lifetime of some objects and either free them explicitly, stack allocate them, or re-use the memory for a new allocation. These options do not require changes to the language but do take some of the pressure off of the garbage collector.

These analyses may not have a big impact by themselves. For a bigger impact adding optional lifetime information to the language will help. Making it optional means that developers can use it in hot code but not burden the rest of the code base & maintainers with it. However, it’s important to understand that even if you never use these features of a language, they’re still part of the cognitive load of learning and reading code in that language. I think this could be a good trade-off, but I’m not sure that it is necessary.

Controlling side-effects

Rust uses memory ownership to control mutability (in addition to life-times). That’s not really the topic of this post, however I would like to acknowledge that this too could be optional, as it is in my colleague’s language Pawns. In Pawns, uniqueness information is optional, anything without an annotation is assumed to be shared and is immutable. If an annotation says that (part of) something is unique, then it can be modified. As above, the intention is that this is used in only the performance critical parts of your program. This can probably be done with optional ownership information also.

Non-memory resources

Non-memory resources have stricter requirements for finalisation. Requiring a garbage collector to clean them up is unreliable and a bad idea. I don’t mind using one of the other approaches here, or requiring developers to manually free these resources.