Sunday, March 29, 2020

G1 Collector Summary

I have been reading this amazing book on Garbage collections and java performance - Java Performance Companion. It has the best explanation of how G1 collector works, at least what I have seen so far. This is just summary for quick reference from this book about G1.

In G1 collector goal is to meet pause time requirement for bigger heaps. All previous collectors CMS, Parallel and Serial, had problem as heap size grows. CMS specially has fragmentation issues and pause times increases with the increase in the heap. Compaction has to be done with Full GC only. G1 solves these problems with the idea of Regions. Instead of traditional generational heap structure, it splits heap into multiple regions. Number of regions are chosen based on heap size requirements such that there would be roughly around 2000 regions of equal size based on configured heap size. e.g. 16 gb heap / 2000 = one region size.

Two main categories of regions:
Available Regions - Regions which are totally free for allocations.
CSets - Collection Sets. These are regions whose live objects to be collected and moved to available regions in next GC. As result, regions in CSets becomes available region at the end of collection cycle.

There are still 2 phase runs. Young generation phase run and old generation phase run.
However collection phase is all about which regions to be collected. Normally it only collects young regions but in mixed mode run it also has few regions from old region group to be collected.

Young generation phase is stop-the-world parallel collection phase. It collects entire young generation into available regions. Old generation phase is kind of similar to CMS with mix of stop-the-world and concurrent phases. But it has fewer phases compared to CMS.


When Young generation phase kicks in -
Number of regions in young generation keeps changing. This is to meet the pause time goal. Smaller the pause time target smaller the young generation regions. After every cycle based on these G1 heuristics number of eden regions to be used are set. Any new memory allocation happens into eden region taken from available regions. When number of total eden region reaches this limit, young generation collection starts.


When mix mode collection happens -
As part of every young generation, objects are getting promoted from survivor to old generation regions. There are few parameters which decides when we need to collect these old generation regions as well so that we can avoid full GC or possibly go out of memory.

-XX:InitiatingHeapOccupancyPercent - percentage of old regions to total heap. default 45%
-XX:G1MixedGCCountTarget  - how many mixed mode runs to be performed. this helps deciding number of old regions from CSet per mixed mode GC run.
-XX:G1HeapWastePercent - memory percentage of total region to be considered as a overkill for a collection phase. default 5%. meaning it is ok to stop collecting the region if collections only claims 5% of region memory.

In summary what happens is that - as old generation size increases and reaches IHOP, G1 schedules old generation phases. It is mix of stop-the-world and concurrent phases. At end of these phases what we know is that where are live objects located on old regions and which are top regions in terms of GC efficiency. It adds these regions into CSets. And G1 enters into Mixed mode collections.
Meaning in next round of collection instead of just collecting young regions, G1 will also has few regions from Old regions group as indicated by their CSet entry.  Based on above parameters G1 will keep doing mixed mode collection for next few cycles and eventually when G1 heuristics are met it goes back to normal young collection mode.


When Full GC happens -
while allocating humungous objects if there are no consecutive available regions.
If marking phase in old generation collection does not finish before it runs out of available regions.

At high level, there are 2 parts of the G1 run:
Concurrent Cycles - which involves initial marking, scanning, remark and cleanup. Please note that intial marking and remark are stop-the-world steps. Cleanup phase here is just to add free regions into available regions list.
Mixed collections - which involves freeing up CSet regions. which comprises of both young and old generation regions.