记一次生产频繁发生FullGC问题

您所在的位置：网站首页 › esp12f程序编程 › 记一次生产频繁发生FullGC问题

记一次生产频繁发生FullGC问题

2023-03-22 14:09| 来源: 网络整理| 查看: 265

问题发现

早上过来，饭都没来的及吃，运维就给我发来信息，说是某个接口调用大量超时。因为最近这个接口调用量是翻倍了，所以我就去检查了下慢SQL，发现确实是有较多的慢SQL，所以我就缩减了查询的时间范围，但是效果并不好。

在什么情况下，GC会对程序产生影响？

不管YGC还是FGC，都会造成一定程度的程序卡顿（即Stop The World问题：GC线程开始工作，其他工作线程被挂起），即使采用ParNew、CMS或者G1这些更先进的垃圾回收算法，也只是在减少卡顿时间，而并不能完全消除卡顿。

那到底什么情况下，GC会对程序产生影响呢？根据严重程度从高到底，我认为包括以下4种情况：

FGC过于频繁：FGC通常是比较慢的，少则几百毫秒，多则几秒，正常情况FGC每隔几个小时甚至几天才执行一次，对系统的影响还能接受。但是，一旦出现FGC频繁（比如几十分钟就会执行一次），这种肯定是存在问题的，它会导致工作线程频繁被停止，让系统看起来一直有卡顿现象，也会使得程序的整体性能变差。YGC耗时过长：一般来说，YGC的总耗时在几十或者上百毫秒是比较正常的，虽然会引起系统卡顿几毫秒或者几十毫秒，这种情况几乎对用户无感知，对程序的影响可以忽略不计。但是如果YGC耗时达到了1秒甚至几秒（都快赶上FGC的耗时了），那卡顿时间就会增大，加上YGC本身比较频繁，就会导致比较多的服务超时问题。FGC耗时过长：FGC耗时增加，卡顿时间也会随之增加，尤其对于高并发服务，可能导致FGC期间比较多的超时问题，可用性降低，这种也需要关注。YGC过于频繁：即使YGC不会引起服务超时，但是YGC过于频繁也会降低服务的整体性能，对于高并发服务也是需要关注的。

其中，「FGC过于频繁」和「YGC耗时过长」，这两种情况属于比较典型的GC问题，大概率会对程序的服务质量产生影响。剩余两种情况的严重程度低一些，但是对于高并发或者高可用的程序也需要关注。

过了一会发现，这个服务fullGC是有问题的，太频繁了，这个应该是导致接口超时的根本问题，因为时间也是对的上的。

这个是最近三个小时fullGC的监控图：

这个是最近三天fullGC的监控图：

对比一下，就不难发现，fullGC数量是从3月15号晚上9点开始增加的，也是这个接口对外开放的时间。

解决思路

1、首先去服务器上面下载dump文件，分析是哪里造成了内存泄漏，频繁触发fullGC。首先找出服务器内java文件的PID，然后保存dump文件，我们公司java服务是固定端口号：1

使用top命令：

然后执行命令：jmap -dump:file=202303160924.dump,format=b 1 ，保存dump文件

2、根据dump文件，分析出堆内对象的分布情况

下载一个可以分析dump文件的工具，这里我下载是Jprofiler查看大对象的分析，发现是java.lang.ApplicationShutdownHooks的hooks占用太大内存，并且得知改熟悉是一个Map

分析这个Map里面的元素引用关系，可以看到这个map里面存的都是线程对象，并且大部分都是一个名为java.util.concurrent.ScheduledThreadPoolExecutor@59648a61的线程池对象，到了这里就定位到问题代码了，是这次新加的接口里面有一个异步操作，用的guava并发包里面的一个超时等待功能的接口，具体思路就是启用一个定时任务线程池去定时去检查在规定时间内，是否返回结果。

3、看看我的业务代码是哪里出现了问题

//异步执行某个查询方法，并且在规定时间内返回查询结果 public T asyncWithTimeout(ScheduledThreadPoolExecutor executor, Callable callable, long time, TimeUnit unit) { try { ListeningExecutorService listeningExecutorService = MoreExecutors.listeningDecorator(threadPoolExecutor); ListenableFuture future = listeningExecutorService.submit(callable); //这里是创建一个定时任务线程，去定时检查是否在规定时间内查询完毕，应该就是这个去添加了钩子函数，进去看看 ScheduledExecutorService scheduledExecutorService = MoreExecutors.getExitingScheduledExecutorService(executor); return Futures.withTimeout(future, time, unit, scheduledExecutorService).get(time, unit); } catch (InterruptedException | ExecutionException | TimeoutException e) { log.warn("异步方法执行失败，error:{}", e.getMessage()); } return null; } //=======================guava并发包代码======================= @Beta @GwtIncompatible // TODO public static ScheduledExecutorService getExitingScheduledExecutorService( ScheduledThreadPoolExecutor executor) { //每次都去创建一个新的对象 return new Application().getExitingScheduledExecutorService(executor); } final ScheduledExecutorService getExitingScheduledExecutorService( ScheduledThreadPoolExecutor executor) { return getExitingScheduledExecutorService(executor, 120, TimeUnit.SECONDS); } final ScheduledExecutorService getExitingScheduledExecutorService( ScheduledThreadPoolExecutor executor, long terminationTimeout, TimeUnit timeUnit) { useDaemonThreadFactory(executor); ScheduledExecutorService service = Executors.unconfigurableScheduledExecutorService(executor); //添加构造函数的地方，进去看看 addDelayedShutdownHook(executor, terminationTimeout, timeUnit); return service; } final void addDelayedShutdownHook( final ExecutorService service, final long terminationTimeout, final TimeUnit timeUnit) { checkNotNull(service); checkNotNull(timeUnit); //继续点进去 addShutdownHook( MoreExecutors.newThread( //线程名字对上了，就在对象引用的截图里面出现过 "DelayedShutdownHook-for-" + service, new Runnable() { @Override public void run() { try { // We'd like to log progress and failures that may arise in the // following code, but unfortunately the behavior of logging // is undefined in shutdown hooks. // This is because the logging code installs a shutdown hook of its // own. See Cleaner class inside {@link LogManager}. service.shutdown(); service.awaitTermination(terminationTimeout, timeUnit); } catch (InterruptedException ignored) { // We're shutting down anyway, so just ignore. } } })); } @VisibleForTesting void addShutdownHook(Thread hook) { Runtime.getRuntime().addShutdownHook(hook); } //=======================guava并发包代码======================= public void addShutdownHook(Thread hook) { SecurityManager sm = System.getSecurityManager(); if (sm != null) { sm.checkPermission(new RuntimePermission("shutdownHooks")); } //定位到问题了，就是这里添加的钩子函数 ApplicationShutdownHooks.add(hook); } static synchronized void add(Thread hook) { if(hooks == null) throw new IllegalStateException("Shutdown in progress"); if (hook.isAlive()) throw new IllegalArgumentException("Hook already running"); if (hooks.containsKey(hook)) throw new IllegalArgumentException("Hook previously registered"); //存在到 hooks 这个map对象里面，就是这个大对象 hooks.put(hook, hook); }问题解决

经过上面问题的排查，造成hooks大对象的原因找到了，就是每次调用接口的时候，都会往hooks里面put一个对象。

所以，解决办法很简单，就是不用每次都去生成一个ScheduledExecutorService对象，类初始化的时候创建一次就行了

改造后的代码如下：

private ListeningExecutorService listeningExecutorService; private ScheduledExecutorService scheduledExecutorService; public static AsyncUtils getInstance() { return ThreadHolder.INSTANCE.getAsyncWithCallback(); } @SuppressWarnings("UnstableApiUsage") private AsyncUtils() { listeningExecutorService = MoreExecutors.listeningDecorator(ThreadPoolConstant.THREAD_POOL_EXECUTOR); scheduledExecutorService = MoreExecutors.getExitingScheduledExecutorService(ThreadPoolConstant.SCHEDULED_THREAD_POOL_EXECUTOR); } @SuppressWarnings("UnstableApiUsage") public T asyncWithTimeout(Callable callable, long time, TimeUnit unit) { try { ListenableFuture future = listeningExecutorService.submit(callable); return Futures.withTimeout(future, time, unit, scheduledExecutorService).get(time, unit); } catch (InterruptedException | ExecutionException | TimeoutException e) { log.warn("异步方法执行失败，error:{}", e.getMessage()); } return null; } private enum ThreadHolder { /** * 线程持有类 INSTANCE */ INSTANCE; private final AsyncUtils asyncUtils; ThreadHolder() { asyncUtils = new AsyncUtils(); } public AsyncUtils getAsyncWithCallback() { return asyncUtils; } }

一般遇到系统突然运行缓慢，CPU 100%，以及Full GC次数过多的问题，这种情况可能的原因主要有两种：

1.代码中某个位置读取数据量较大，导致系统内存耗尽，从而导致 Full GC 次数过多，系统缓慢。

2.代码中有比较耗CPU的操作，导致CPU 过高，系统运行缓慢。

相对来说，这是出现频率***的两种线上问题，而且他们会直接导致系统不可用。

另外有几种情况也会导致某个功能运行缓慢，但是不至于导致系统不可用：

1.代码某个位置有阻塞性的操作，导致该功能调用整体比较耗时，但出现是比较随机的。

2.某个线程由于某种原因而进入WAITING状态，此时该功能整体不可用，但是无法复现。

3.由于锁使用不当，导致多个线程进入死锁状态，从而导致系统整体比较缓慢。

对于这三种情况，通过查看CPU和系统内存情况是无法查看出具体问题的，因为他们相对来说都是具有一定阻塞性操作，CPU和系统内存使用情况都不高，但是功能却很慢。

清楚排查问题时能使用哪些工具

1、公司的监控系统：大部分公司都会有，可全方位监控JVM的各项指标。

2、JDK的自带工具，包括jmap、jstat等常用命令：# 查看堆内存各区域的使用率以及GC情况jstat -gcutil -h20 pid 1000# 查看堆内存中的存活对象，并按空间排序jmap -histo pid | head -n20# dump堆内存文件jmap -dump:format=b,file=heap pid

3、可视化的堆内存分析工具：JVisualVM、MAT等

更多可以参考：

这里补充一篇JVM调优实战总结：

总之，JVM 性能调优涉及方方面面的取舍，往往是牵一发而动全身，需要全盘考虑各方面的影响。所以了解 JVM 内部的一些运行原理，还是特别重要的，它有益于我们加深对代码更深层次的理解，帮助我们书写出更高效的代码。

作者：theonenini 链接：https://www.cnblogs.com/zsy-cyl/p/17222297.html 来源：cnblogs

【本文地址】

记一次生产频繁发生FullGC问题

记一次生产频繁发生FullGC问题

今日新闻

推荐新闻