基于HotSpot Virtual Machine Garbage Collection Tuning Guide的整理( ´ー`) , jDK12
Or https://docs.oracle.com/en/java/javase/12/gctuning/index.html
Preface
The Java Platform, Standard Edition HotSpot Virtual Machine Garbage Collection
Tuning Guide describes the garbage collection methods included in the Java HotSpot
Virtual Machine (Java HotSpot VM) and helps you determine which one is the best for
your needs.
介绍HotSpot虚拟机垃圾回收算法、 以及怎么选择最合适的一个算法
Introduction to Garbage Collection Tuning
Java SE selects the most appropriate garbage collector based on the class of the computer on which the application is run.
However, this selection may not be optimal for every application.Users, developers,and administrators with strict performance goals or other requirements may need to
explicitly select the garbage collector and tune certain parameters to achieve the
desired level of performance.
通常需要选择合适的垃圾收集器、并调整某些参数以实现所需的性能级别(吞吐和停顿)
Topics
- What Is a Garbage Collector?
- Why Does the Choice of Garbage Collector Matter?
- Supported Operating Systems in Documentation
简单介绍了
- 什么是垃圾回收、以及使用什么优化技术(分代、并发、压缩)
- 为什么选择垃圾回收算法很重要
- 支持的操作系统
Ergonomics
简单介绍如何提高性能,从吞吐、响应时间、堆空间
Topics
- Garbage Collector, Heap, and Runtime Compiler Default Selections
- Behavior-Based Tuning
- Maximum Pause-Time Goal
- Throughput Goal
- Footprint
- Tuning Strategy
默认配置
- Garbage-First (G1) collector
- The maximum number of GC threads is limited by heap size and available CPU
- resources
- Initial heap size of 1/64 of physical memory
- Maximum heap size of 1/4 of physical memory
- Tiered compiler, using both C1 and C2
基于特性调整
- Maximum Pause-Time Goal
These adjustments may cause garbage collection to occur more frequently, reducing the overall throughput of the application.
command-line option -XX:MaxGCPauseMillis=
- Throughput Goal
If the throughput goal isn’t being met, then one possible action for the garbage
collector is to increase the size of the heap so that the time spent in the application
between collection pauses can be longer.command-line option -XX:GCTimeRatio=nnn (The ratio of
garbage collection time to application time is 1/ (1+nnn))
- Footprint(占用空间 -堆大小)
- If the throughput and maximum pause-time goals have been met, then the garbage
collector reduces the size of the heap until one of the goals (invariably the throughput
goal) can’t be met.
- If the throughput and maximum pause-time goals have been met, then the garbage
策略
The heap grows or shrinks to a size that supports the chosen throughput goal. Learn about heap tuning strategies such as choosing a maximum heap size, and choosing maximum pause-time goal.
Don’t choose a maximum value for the heap unless you know that you need a heap greater than the default maximum heap size
If the heap grows to its maximum size and the throughput goal isn’t being met, then
the maximum heap size is too small for the throughput goal.
If the throughput goal can be met, but pauses are too long, then select a maximum
pause-time goal
Garbage Collector Implementation
However, when garbage collection is the principal bottleneck, it’s useful to understand
some aspects of the implementation. Garbage collectors make assumptions about the
way applications use objects, and these are reflected in tunable parameters that can
be adjusted for improved performance without sacrificing the power of the abstraction.
当垃圾收集成为虚拟机瓶颈的时候,了解其具体实现细节就很重要了
Topics
- Generational Garbage Collection
- Generations
- Performance Considerations
- Throughput and Footprint Measurement
垃圾回收分代策略
The Java HotSpot VM incorporates a number of different garbage collection algorithms that all use a technique called generational collection.
每次都遍历每个可到到达对象,这种效率是不可取的,所以需要选择合适的回收策略
Typical Distribution for Lifetimes of Objects
从上图可以发现,大多数对象都是年轻对象 (x轴为当前生命周期存活的大小,y轴为对象内存分配分代位置)
分代
To optimize for this scenario, memory is managed in generations (memory pools holding objects of different ages). Garbage collection occurs in each generation when the generation fills up.
The vast majority of objects are allocated in a pool dedicated to young objects (the young generation), and most objects die there.
为了解决对象都是 朝生夕灭,这种问题,对堆内存使用了分代管理
Young
The young generation consists of eden and two survivor spaces.
One survivor space is empty at any time, and serves as the destination of live objects in eden and the other survivor space during garbage collection;
after garbage collection, eden and the source survivor space are empty
Old
Objects are copied between survivor spaces in this way until they’ve
been copied a certain number of times or there isn’t enough space left there.
These objects are copied into the old region.
性能注意事项
In general, choosing the size for a particular generation is a trade-off between these considerations. For example, a very large young generation may maximize throughput, but does so at the expense of footprint, promptness, and pause times. Young generation pauses can be minimized by using a small young generation at the expense of throughput. The sizing of one generation doesn’t affect the collection frequency and pause times for another generation.
年轻一代越大,吞吐越高,minor gc 越少,总gc时间变少
但是同时会导致 内存占用变高,单次需要收集垃圾的时间变长,响应时间变长
吞吐和占用空间测量
The command-line option -verbose:gc prints information about the heap and garbage collection at each collection.
可以通过打印gc日志,看gc情况
Factors Affecting Garbage Collection Performance
The two most important factors affecting garbage collection performance are total
available memory and proportion of the heap dedicated to the young generation.
影响垃圾收集性能的两个重要因素 堆总大小和年轻代堆堆比例
Topics
- Total Heap
- – Heap Options Affecting Generation Size
- – Default Option Values for Heap Size
- – Conserving Dynamic Footprint by Minimizing Java Heap Size
- The Young Generation
- – Young Generation Size Options
- – Survivor Space Sizing
堆总大小
Because collections occur when generations fill up, throughput is inversely proportional to the amount of memory available.
Heap Options
其中图中的Virtual 表示 -Xms 小于 -Xmx 的部分
默认配置
Default Options for 64-Bit Solaris Operating System
Option | Default Value |
---|---|
-XX:MinHeapFreeRatio |
40 |
-XX:MaxHeapFreeRatio |
70 |
-Xms |
6656 KB |
-Xmx |
calculated |
The following are general guidelines regarding heap sizes for server applications:
Unless you have problems with pauses, try granting as much memory as possible to the virtual machine. The default size is often too small.
Setting -Xms and -Xmx to the same value increases predictability by removing the most important sizing decision from the virtual machine. However, the virtual machine is then unable to compensate if you make a poor choice.
In general, increase the memory as you increase the number of processors, because allocation can be made parallel.
动态缩小堆空间
Lowering - XX:MaxHeapFreeRatio to as low as 10% and -XX:MinHeapFreeRatio has shown to successfully reduce the heap size without too much performance degradation;
In addition, you can specify -XX:-ShrinkHeapInSteps, which immediately reduces the Java heap to the target size (specified by the parameter -XX:MaxHeapFreeRatio)
年轻代
The bigger the young generation, the less often minor collections occur. However, for a bounded heap size, a larger young generation implies a smaller old generation, which will increase the frequency of major collections. The optimal choice depends on the lifetime distribution of the objects allocated by the application.
年轻代越大,需要minor GC的次数越少, major GC的次数越多 ,实际使用取决于应用内对象的生命周期分布
Option | Default Value |
---|---|
-XX:NewRatio |
2 |
-XX:NewSize |
1310 MB |
-XX:MaxNewSize |
not limited |
-XX:SurvivorRatio |
8 |
Default Option Values for Survivor Space Sizing
The following are general guidelines for server applications:
- First decide on the maximum heap size that you can afford to give the virtual machine. Then, plot your performance metric against the young generation sizes to find the best setting.
- Note that the maximum heap size should always be smaller than the amount of memory installed on the machine to avoid excessive page faults and thrashing.
- If the total heap size is fixed, then increasing the young generation size requires reducing the old generation size. Keep the old generation large enough to hold all the live data used by the application at any given time, plus some amount of slack space (10 to 20% or more).
- Subject to the previously stated constraint on the old generation:
- Grant plenty of memory to the young generation.
- Increase the young generation size as you increase the number of processors because allocation can be parallelized.
好吧,感觉这个准则在讲废话
Available Collectors
介绍垃圾收集器,和怎么挑选合适的垃圾收集器
Topics
Serial Collector
Parallel Collector
The Mostly Concurrent Collectors
Selecting a Collector
串行收集器
It’s best-suited to single processor machines because it can’t take advantage of multiprocessor hardware, although it can be useful on multiprocessors for applications with small data sets (up to approximately 100 MB).
收集过程,不会执行用户线程,一般适用于单处理器机器,也可以用于多处理器的小内存应用(100MB左右)
并行收集器
The parallel collector is also known as throughput collector, it’s a generational collector similar to the serial collector. The primary difference between the serial and parallel collectors is that the parallel collector has multiple threads that are used to speed up garbage collection.
The parallel collector is intended for applications with medium-sized to large-sized data sets that are run on multiprocessor or multithreaded hardware. You can enable it by using the -XX:+UseParallelGC option.
也称为吞吐量收集器,与串行类型,不同的是多线程并行
并发收集器
G1 garbage collector: This server-style collector is for multiprocessor machines with a large amount of memory. It meets garbage collection pause-time goals with high probability, while achieving high throughput.
G1 is selected by default on certain hardware and operating system configurations, or can be explicitly enabled using-XX:+UseG1GC .
CMS collector : This collector is for applications that prefer shorter garbage collection pauses and can afford to share processor resources with the garbage collection.
Use the option -XX:+UseConcMarkSweepGC to enable the CMS collector
The CMS collector is deprecated as of JDK 9.
并发收集器指在垃圾收集的过程同时还可以执行用户线程(会有短暂停止)
Z收集器
The Z Garbage Collector (ZGC) is a scalable low latency garbage collector. ZGC performs all expensive work concurrently, without stopping the execution of application threads.
ZGC is intended for applications which require low latency (less than 10 ms pauses) and/or use a very large heap (multi-terabytes). You can enable is by using the -XX: +UseZGC option.
ZGC is available as an experimental feature, starting with JDK 11.
适用于低延迟(10ms)和大堆的应用程序
选择收集器
Unless your application has rather strict pause-time requirements, first run your application and allow the VM to select a collector.If necessary, adjust the heap size to improve performance.
除非有严格的短暂时间要求,否则使用默认的收集器.(或者调整堆大小)
If the performance still doesn’t meet your goals, then use the following guidelines as a starting point for selecting a collector:
If the application has a small data set (up to approximately 100 MB), then select the serial collector with the option -XX:+UseSerialGC.
If the application will be run on a single processor and there are no pause-time requirements, then select the serial collector with the option -XX:+UseSerialGC.
If (a) peak application performance is the first priority and (b) there are no pause- time requirements or pauses of one second or longer are acceptable, then let the VM select the collector or select the parallel collector with -XX:+UseParallelGC.
If response time is more important than overall throughput and garbage collection pauses must be kept shorter than approximately one second, then select a mostly concurrent collector with -XX:+UseG1GC or -XX:+UseConcMarkSweepGC.
If response time is a high priority, and/or you are using a very large heap, then select a fully concurrent collector with -XX:UseZGC.
选择合适的垃圾收集器只是一个起点, 仍需要通过调整VM参数 来达到需要的性能
Collectors Implementation
The Parallel Collector
The parallel collector is enabled with the command-line option -XX:+UseParallelGC.
By default, with this option, both minor and major collections are run in parallel to
further reduce garbage collection overhead.
Topics
Number of Parallel Collector Garbage Collector Threads
Arrangement of Generations in Parallel Collectors
Parallel Collector Ergonomics
– Options to Specify Parallel Collector Behaviors
– Priority of Parallel Collector Goals
– Parallel Collector Generation Size Adjustments
– Parallel Collector Default Heap Size
* Specification of Parallel Collector Initial and Maximum Heap Sizes
Excessive Parallel Collector Time and OutOfMemoryError
Parallel Collector Measurements
The Mostly Concurrent Collectors
On an N processor system, the concurrent part of the collection uses K/N of the available processors, where 1 <= K <= ceiling*{N/4*}.
In addition to the use of processors during concurrent phases, additional overhead is incurred to enable concurrency. Thus, while garbage collection pauses are typically much shorter with the concurrent collector, application throughput also tends to be slightly lower than with the other collectors.
As N increases, the reduction in processor resources due to concurrent garbage collection becomes smaller, and the benefit from concurrent collection increases.
Because at least one processor is used for garbage collection during the concurrent phases, the concurrent collectors don’t normally provide any benefit on a uniprocessor (single-core) machine.
在多核处理器上,垃圾收集线程为 1 <= K <= ceiling*{N/4*} , 随着N的增多收益越大 , 同时注意在单核上并不适用
Concurrent Mark Sweep (CMS) Collector
The Concurrent Mark Sweep (CMS) collector is designed for applications that prefer
shorter garbage collection pauses and that can afford to share processor resources
with the garbage collector while the application is running.
Topics
- Concurrent Mark Sweep Collector Performance and Structure
- Concurrent Mode Failure
- Excessive GC Time and OutOfMemoryError
- Concurrent Mark Sweep Collector and Floating Garbage
- Concurrent Mark Sweep Collector Pauses
- Concurrent Mark Sweep Collector Concurrent Phases
- Starting a Concurrent Collection Cycle
- Scheduling Pauses
- Concurrent Mark Sweep Collector Measurements
CMS 性能和结构
标记和清除,更多细节参考如下
Concurrent Mode Failure
As described previously, in normal operation, the CMS collector does most of its
tracing and sweeping work with the application threads still running, so only brief
pauses are seen by the application threads. However, if the CMS collector is unable to
finish reclaiming the unreachable objects before the old generation fills up, or if an
allocation cannot be satisfied with the available free space blocks in the old
generation, then the application is paused and the collection is completed with all the
application threads stopped.If a concurrent collection is interrupted by an explicit garbage
collection (System.gc()) or for a garbage collection needed to provide information
for diagnostic tools, then a concurrent mode interruption is reported.
CMS在并发标记阶段收集失败时候(用户线程并发运行,所以会产生垃圾,如果这时候old区满了,会stop the world,退化成单线程) 至于为什么用单线程,这个是个历史遗留问题,传送门
如果在并发标记阶段被其他打断(System.gc()),会被报告
GC时间过长和OutOfMemoryError
The CMS collector throws an OutOfMemoryError if too much time is being spent in
garbage collection: If more than 98% of the total time is spent in garbage collection
and less than 2% of the heap is recovered, then an OutOfMemoryError is thrown.
浮动垃圾
Because application threads and the garbage collector thread run concurrently during a major collection, objects that are traced by the garbage collector thread may subsequently become unreachable by the time collection process ends. Such unreachable objects that haven’t yet been reclaimed are referred to as floating garbage.
因为存在垃圾收集线程和工作线程并发标记的阶段, 所以在这个阶段,新产生的垃圾就无法回收,称之为浮动垃圾
CMS解决这个问题的方式是通过设置 多少次cms收集后运行一次压缩
关于Remake 为什么没办法回收浮动垃圾
重新标记(Remark) 的作用在于:
之前在并发标记时,因为是 GC 和用户程序是并发执行的,可能导致一部分已经标记为 从 GC Roots 不可达 的对象,因为用户程序的(并发)运行,又可达 了,Remark 的作用就是将这部分对象又标记为 可达对象。至于 “浮动垃圾”,因为 CMS 在 并发标记 时是并发的,GC 线程和用户线程并发执行,这个过程当然可能会因为线程的交替执行而导致新产生的垃圾(即浮动垃圾)没有被标记到;而 重新标记 的作用只是修改之前 并发标记 所获得的不可达对象,所以是没有办法处理 “浮动垃圾” 的。
暂停
The CMS collector pauses an application twice during a concurrent collection cycle. The first pause is to mark as live the objects directly reachable from the roots (for example, object references from application thread stacks and registers, static objects, and so on) and from elsewhere in the heap (for example, the young generation).
This first pause is referred to as the initial mark pause. The second pause comes at the end of the concurrent tracing phase and finds objects that were missed by the concurrent tracing due to updates by the application threads of references in an object after the CMS collector had finished tracing that object. This second pause is referred to as the remark pause.
- 初始标记
- 初始标记 只是简单标记roots可以直接到达的对象,暂停时间短
- 重新标记
- 重新标记 在并发标记阶段 遗漏的对象,暂停时间会初始标记久
回收时机
There are several ways to start a concurrent collection.
Based on recent history, the CMS collector maintains estimates of the time remaining
before the old generation will be exhausted and of the time needed for a concurrent
collection cycle. Using these dynamic estimates, a concurrent collection cycle is
started with the aim of completing the collection cycle before the old generation is
exhausted. These estimates are padded for safety because concurrent mode failure
can be very costly.A concurrent collection also starts if the occupancy of the old generation exceeds an
initiating occupancy (a percentage of the old generation). The default value for this
initiating occupancy threshold is approximately 92%, but the value is subject to change
from release to release. This value can be manually adjusted using the command-line
option -XX:CMSInitiatingOccupancyFraction=, where is an integral
percentage (0 to 100) of the old generation size.
根据最近的历史记录,CMS收集器维护对旧一代用尽之前剩余时间的估计以及并发收集周期所需的时间。
Old区使用率占比
暂停调度
The pauses for the young generation collection and the old generation collection occur independently.
They don’t overlap, but may occur in quick succession such that the pause from one collection, immediately followed by one from the other collection, can appear to be a single, longer pause. To avoid this, the CMS collector attempts to schedule the remark pause roughly midway between the previous and next young generation pauses. This scheduling is currently not done for the initial mark pause, which is usually much shorter than the remark pause.
因为old区和young区的收集是独立的,所以可以同时发生,这样会导致长暂停
为了避免这个问题, remake pause 会安排在两次young gc中间
日志
[121,834s][info][gc] GC(657) Pause Initial Mark 191M->191M(485M) (121,831s, 121,834s) 3,433ms
[121,835s][info][gc] GC(657) Concurrent Mark (121,835s)
[121,889s][info][gc] GC(657) Concurrent Mark (121,835s, 121,889s) 54,330ms
[121,889s][info][gc] GC(657) Concurrent Preclean (121,889s)
[121,892s][info][gc] GC(657) Concurrent Preclean (121,889s, 121,892s) 2,781ms
[121,892s][info][gc] GC(657) Concurrent Abortable Preclean (121,892s)
[121,949s][info][gc] GC(658) Pause Young (Allocation Failure) 324M->199M(485M) (121,929s, 121,949s) 19,705ms
[122,068s][info][gc] GC(659) Pause Young (Allocation Failure) 333M->200M(485M) (122,043s, 122,068s) 24,892ms
[122,075s][info][gc] GC(657) Concurrent Abortable Preclean (121,892s, 122,075s) 182,989ms
[122,087s][info][gc] GC(657) Pause Remark 209M->209M(485M) (122,076s, 122,087s) 11,373ms
[122,087s][info][gc] GC(657) Concurrent Sweep (122,087s)
[122,193s][info][gc] GC(660) Pause Young (Allocation Failure) 301M->165M(485M) (122,181s, 122,193s) 12,151ms
[122,254s][info][gc] GC(657) Concurrent Sweep (122,087s, 122,254s) 166,758ms
[122,254s][info][gc] GC(657) Concurrent Reset (122,254s)
[122,255s][info][gc] GC(657) Concurrent Reset (122,254s, 122,255s) 0,952ms
[122,297s][info][gc] GC(661) Pause Young (Allocation Failure) 259M->128M(485M) (122,291s, 122,297s) 5,797ms
The output for the CMS collection (GC ID 657) is interspersed with the output from the minor collections (GC IDs 658, 659 and 660);
The initial mark pause is typically short relative to the minor collection pause time. The concurrent phases (concurrent mark, concurrent preclean, and concurrent sweep) normally last significantly longer than a minor collection pause, as indicated in the CMS collector output example
从gc log 可以看到young gc 是会并存的,同时cms的主要耗时是 Concurrent phases
同时注意每次stop the world 算一次full gc
Garbage-First Garbage Collector
This section describes the Garbage-First (G1) Garbage Collector (GC).
Topics
- Introduction to Garbage-First Garbage Collector
- Enabling G1
- Basic Concepts
- Garbage-First Internals
- Ergonomic Defaults for G1 GC
- Comparison to Other Collectors
与CMS类似 , 但长时间运行后受碎片化影响的CMS,G1会增量式的整理/压缩堆里的数据,避免受碎片化影响
Region
将堆分为多个Region, G1 preferentially collects regions with the least amount of live data, or “garbage first”,即收集垃圾最多的Region ,”incremental collection”
CardTable
因为G1只回收一部分Region, 所以回收的时候需要知道哪些其他Region的对象引用着自己Region的对象,因为采用的copying算法需要移动对象,所以要更新引用为对象的新地址
在普通的分代收集中也是如此,分代收集中年轻代收集需要老年代到年轻代的引用的记录,通常叫做remembered set(简称RS)
concurrent mode failure后的并行化
JEP 307: Parallel Full GC for G1
Garbage-First Garbage Collector Tuning
This section describes how to adapt Garbage-First garbage collector (G1 GC) behavior in case it does not meet your requirements.
Topics
- General Recommendations for G1
- Moving to G1 from Other Collectors
- Improving G1 Performance
The Z Garbage Collector
The Z Garbage Collector is available as an experimental feature, and is enabled with the command-line options -XX:+UnlockExperimentalVMOptions -XX:+UseZGC
.
Setting the Heap Size
In general, the more memory you give to ZGC the better. But at the same time, wasting memory is undesirable, so it’s all about finding a balance between memory usage and how often the GC needs to run.
Setting Number of Concurrent GC Threads
-XX:ConcGCThreads
This option essentially dictates how much CPU-time the GC should be given.
Other Considerations
This section covers other situations that affect garbage collection.
Topics
- Finalization and Weak, Soft, and Phantom References
- Explicit Garbage Collection
- Soft References
- Class Metadata
These features can create performance artifacts at the Java programming language level. An example of this is relying on finalization to close file descriptors, which makes an external resource (descriptors) dependent on garbage collection promptness. Relying on garbage collection to manage resources other than memory is almost always a bad idea.
In previous releases of Java Hotspot VM, the class metadata was allocated in the so-called permanent generation. Starting with JDK 8, the permanent generation was removed and the class metadata is allocated in native memory
转载请注明来源,欢迎对文章中的引用来源进行考证,欢迎指出任何有错误或不够清晰的表达。可以在下面评论区评论,也可以邮件至 951488791@qq.com