Very High CPU usage Sonatype Nexus Docker

本文记录 Sonatype Nexus 私服遇到性能劣化问题的分析过程(未解决)

环境说明

使用 sonatype/nexus3 镜像启动,通过挂载卷存储数据

Nexus 配置,可以看到关键配置 -Xms8g -Xmx8g -XX:MaxDirectMemorySize=35158M -XX:+UseConcMarkSweepGC

200      105083 105062 99 11:01 ?        2-00:51:44 /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.282.b08-2.el8_3.x86_64/jre/bin/java -server -Dinstall4j.jvmDir=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.282.b08-2.el8_3.x86_64/jre -Dexe4j.moduleName=/opt/sonatype/nexus/bin/nexus -XX:+UnlockDiagnosticVMOptions -Dinstall4j.launcherId=245 -Dinstall4j.swt=false -Di4jv=0 -Di4jv=0 -Di4jv=0 -Di4jv=0 -Di4jv=0 -Xms8g -Xmx8g -XX:MaxDirectMemorySize=35158M -XX:ActiveProcessorCount=16 -XX:+UseParNewGC -XX:ParallelGCThreads=12 -XX:MaxTenuringThreshold=6 -XX:SurvivorRatio=5 -XX:+UseConcMarkSweepGC -XX:-CMSParallelRemarkEnabled -XX:CMSInitiatingOccupancyFraction=65 -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseCMSCompactAtFullCollection -XX:+CMSClassUnloadingEnabled -XX:+DisableExplicitGC -XX:+PrintGCDetails -Xloggc:/nexus-data/vgc/nexus-1646967690.vgc -XX:+UnlockDiagnosticVMOptions -XX:+LogVMOutput -XX:LogFile=../sonatype-work/nexus3/log/jvm.log -XX:-OmitStackTraceInFastThrow -Djava.net.preferIPv4Stack=true -Dkaraf.home=. -Dkaraf.base=. -Dkaraf.etc=etc/karaf -Djava.util.logging.config.file=etc/karaf/java.util.logging.properties -Dkaraf.data=../sonatype-work/nexus3 -Dkaraf.log=../sonatype-work/nexus3/log -Djava.io.tmpdir=../sonatype-work/nexus3/tmp -Dkaraf.startLocalConsole=false -Djdk.tls.ephemeralDHKeySize=2048 -Djava.endorsed.dirs=lib/endorsed -Di4j.vpt=true -classpath /opt/sonatype/nexus/.install4j/i4jruntime.jar:/opt/sonatype/nexus/lib/boot/nexus-main.jar:/opt/sonatype/nexus/lib/boot/activation-1.1.1.jar:/opt/sonatype/nexus/lib/boot/jakarta.xml.bind-api-2.3.3.jar:/opt/sonatype/nexus/lib/boot/jaxb-runtime-2.3.3.jar:/opt/sonatype/nexus/lib/boot/txw2-2.3.3.jar:/opt/sonatype/nexus/lib/boot/istack-commons-runtime-3.0.10.jar:/opt/sonatype/nexus/lib/boot/org.apache.karaf.main-4.3.6.jar:/opt/sonatype/nexus/lib/boot/osgi.core-7.0.0.jar:/opt/sonatype/nexus/lib/boot/org.apache.karaf.specs.activator-4.3.6.jar:/opt/sonatype/nexus/lib/boot/org.apache.karaf.diagnostic.boot-4.3.6.jar:/opt/sonatype/nexus/lib/boot/org.apache.karaf.jaas.boot-4.3.6.jar com.install4j.runtime.launcher.UnixLauncher run 9d17dc87 0 0 org.sonatype.nexus.karaf.NexusMain

系统内核

# uname -a
Linux 10-1-207-194 3.10.0-1160.31.1.el7.x86_64 #1 SMP Thu Jun 10 13:32:12 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

内存信息

# free -h
              total        used        free      shared  buff/cache   available
Mem:           251G         38G        6.2G        4.0G        206G        208G
Swap:          4.0G        3.8G        209M

故障现象

读取 pom 耗时 16 秒

[root@10-1-207-194 ~]# time curl http://127.0.0.1:8088/nexus/repository/releases/com/ai/nc-common-um/4.1.0/nc-common-um-4.1.0.pom
<?xml version="1.0" encoding="UTF-8"?>
<project xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd" xmlns="http://maven.apache.org/POM/4.0.0"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <modelVersion>4.0.0</modelVersion>
  <parent>
    ...
  </parent>
  ...
  <dependencies>
   ...
  </dependencies>
  <repositories>
    ...
  </repositories>
</project>

real	0m16.433s
user	0m0.005s
sys	0m0.004s

在 Nexus 容器内安装 jstack 工具

因为 nexus3 官方镜像中只有 JRE,所以我们需要在容器内安装 JDK

以 root 权限登录容器内部

docker exec -u 0 -it nexus3.9.x bash

安装 JDK

yum install java-1.8.0-openjdk-devel.x86_64

劣化分析

以普通用户登录容器内部

docker exec -it nexus3.9.x bash

使用 top 命令查看 CPU 利用率为 1666%

bash-4.4$ top
top - 07:22:50 up 263 days, 15:25,  0 users,  load average: 13.83, 12.73, 12.55
Tasks:   4 total,   1 running,   3 sleeping,   0 stopped,   0 zombie
%Cpu(s): 23.5 us,  2.5 sy,  0.0 ni, 74.1 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem : 257429.8 total,   6311.8 free,  39158.4 used, 211959.5 buff/cache
MiB Swap:   4096.0 total,    209.1 free,   3886.9 used. 213295.4 avail Mem

   PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
     1 nexus     20   0   43.8g  10.4g  32524 S  1666   4.1   2984:04 java
   878 root      20   0   19336   2384   1832 S   0.0   0.0   0:00.13 bash
  2379 nexus     20   0   12036   1952   1464 S   0.0   0.0   0:00.01 bash
  2386 nexus     20   0   49120   2164   1504 R   0.0   0.0   0:00.00 top

使用 top -n 1 -b -H -p 1 | head -n 30 && jstack 1 > /nexus-data/nexus_thread.info 命令查看 CPU 利用率高的线程并保存线程信息到 /nexus-data/nexus_thread.info

bash-4.4$ top -n 1 -b -H -p 1 | head -n 30 && jstack 1 > /nexus-data/nexus_thread.info
top - 07:39:25 up 263 days, 15:42,  0 users,  load average: 17.86, 17.93, 15.91
Threads: 527 total,  17 running, 510 sleeping,   0 stopped,   0 zombie
%Cpu(s): 35.1 us,  4.1 sy,  0.0 ni, 60.8 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem : 257429.8 total,   4224.3 free,  35897.7 used, 217307.8 buff/cache
MiB Swap:   4096.0 total,    211.9 free,   3884.1 used. 216556.0 avail Mem

   PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
   265 nexus     20   0   43.8g  10.4g  32548 R  94.1   4.1  69:01.28 java
  1416 nexus     20   0   43.8g  10.4g  32548 S  64.7   4.1  16:43.97 qtp1892721215-6
  1653 nexus     20   0   43.8g  10.4g  32548 S  58.8   4.1   4:32.59 qtp1892721215-8
  1951 nexus     20   0   43.8g  10.4g  32548 R  58.8   4.1   2:08.03 qtp1892721215-1
  2285 nexus     20   0   43.8g  10.4g  32548 S  58.8   4.1  11:54.16 qtp1892721215-1
   793 nexus     20   0   43.8g  10.4g  32548 S  52.9   4.1  21:48.02 qtp1892721215-5
  1432 nexus     20   0   43.8g  10.4g  32548 S  52.9   4.1   6:49.96 qtp1892721215-6
  1685 nexus     20   0   43.8g  10.4g  32548 S  52.9   4.1   3:34.37 qtp1892721215-8
  1747 nexus     20   0   43.8g  10.4g  32548 R  52.9   4.1  11:23.02 qtp1892721215-9
  1803 nexus     20   0   43.8g  10.4g  32548 R  52.9   4.1  11:59.63 qtp1892721215-1
  2254 nexus     20   0   43.8g  10.4g  32548 S  52.9   4.1   0:27.77 qtp1892721215-1
  2293 nexus     20   0   43.8g  10.4g  32548 S  52.9   4.1   3:47.03 qtp1892721215-1
  2295 nexus     20   0   43.8g  10.4g  32548 R  52.9   4.1   0:25.11 qtp1892721215-1
  2306 nexus     20   0   43.8g  10.4g  32548 S  52.9   4.1   2:12.40 qtp1892721215-1
  2330 nexus     20   0   43.8g  10.4g  32548 R  52.9   4.1   8:36.55 qtp1892721215-1
  1721 nexus     20   0   43.8g  10.4g  32548 R  47.1   4.1  15:02.04 qtp1892721215-9
  1772 nexus     20   0   43.8g  10.4g  32548 S  47.1   4.1   4:22.38 qtp1892721215-9
  1781 nexus     20   0   43.8g  10.4g  32548 S  47.1   4.1  16:55.50 qtp1892721215-9
  1784 nexus     20   0   43.8g  10.4g  32548 R  47.1   4.1  13:05.06 qtp1892721215-9
  1807 nexus     20   0   43.8g  10.4g  32548 S  47.1   4.1   8:50.17 qtp1892721215-1
  1838 nexus     20   0   43.8g  10.4g  32548 R  47.1   4.1   1:15.92 qtp1892721215-1
  1901 nexus     20   0   43.8g  10.4g  32548 R  47.1   4.1   2:42.83 qtp1892721215-1
  1933 nexus     20   0   43.8g  10.4g  32548 S  47.1   4.1   2:39.22 qtp1892721215-1

分别查看高 CPU 的线程堆栈

PID=265(0x109)

"Concurrent Mark-Sweep GC Thread" os_prio=0 tid=0x00007f4e0c112000 nid=0x109 runnable

PID=1416(0x588)

"qtp1892721215-631 <command>sql.select from asset where bucket = :bucket and name = :propValue</command>" #631 prio=5 os_prio=0 tid=0x00007f4cfc004000 nid=0x588 runnable [0x00007f4a9afec000]
   java.lang.Thread.State: RUNNABLE
        at com.orientechnologies.common.concur.lock.ODistributedCounter.decrement(ODistributedCounter.java:50)
        at com.orientechnologies.common.concur.lock.OReadersWriterSpinLock.releaseReadLock(OReadersWriterSpinLock.java:113)
        at com.orientechnologies.common.concur.lock.OPartitionedLockManager$SpinLockWrapper.unlock(OPartitionedLockManager.java:107)
        at com.orientechnologies.orient.core.storage.cache.local.twoq.O2QCache.doLoad(O2QCache.java:379)
        at com.orientechnologies.orient.core.storage.cache.local.twoq.O2QCache.load(O2QCache.java:294)
        at com.orientechnologies.orient.core.storage.impl.local.paginated.base.ODurableComponent.loadPage(ODurableComponent.java:145)
        at com.orientechnologies.orient.core.storage.impl.local.paginated.OPaginatedCluster.readFullEntry(OPaginatedCluster.java:1898)
        at com.orientechnologies.orient.core.storage.impl.local.paginated.OPaginatedCluster.readRecordBuffer(OPaginatedCluster.java:779)
        at com.orientechnologies.orient.core.storage.impl.local.paginated.OPaginatedCluster.readRecord(OPaginatedCluster.java:742)
        at com.orientechnologies.orient.core.storage.impl.local.paginated.OPaginatedCluster.readRecord(OPaginatedCluster.java:721)
        at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.doReadRecord(OAbstractPaginatedStorage.java:4220)
        at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.readRecord(OAbstractPaginatedStorage.java:3807)
        at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.readRecord(OAbstractPaginatedStorage.java:1410)
        at com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx$SimpleRecordReader.readRecord(ODatabaseDocumentTx.java:3395)
        at com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx.executeReadRecord(ODatabaseDocumentTx.java:2008)
        at com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx.load(ODatabaseDocumentTx.java:656)
        at com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx.load(ODatabaseDocumentTx.java:103)
        at com.orientechnologies.orient.core.sql.OCommandExecutorSQLSelect.executeSearchRecord(OCommandExecutorSQLSelect.java:585)
        at com.orientechnologies.orient.core.sql.OCommandExecutorSQLSelect.serialIterator(OCommandExecutorSQLSelect.java:1638)
        at com.orientechnologies.orient.core.sql.OCommandExecutorSQLSelect.fetchFromTarget(OCommandExecutorSQLSelect.java:1585)
        at com.orientechnologies.orient.core.sql.OCommandExecutorSQLSelect.fetchValuesFromIndexCursor(OCommandExecutorSQLSelect.java:2466)
        at com.orientechnologies.orient.core.sql.OCommandExecutorSQLSelect.searchForIndexes(OCommandExecutorSQLSelect.java:2280)
        at com.orientechnologies.orient.core.sql.OCommandExecutorSQLSelect.searchInClasses(OCommandExecutorSQLSelect.java:1017)
        at com.orientechnologies.orient.core.sql.OCommandExecutorSQLResultsetAbstract.assignTarget(OCommandExecutorSQLResultsetAbstract.java:203)
        at com.orientechnologies.orient.core.sql.OCommandExecutorSQLSelect.assignTarget(OCommandExecutorSQLSelect.java:527)
        at com.orientechnologies.orient.core.sql.OCommandExecutorSQLSelect.executeSearch(OCommandExecutorSQLSelect.java:509)
        at com.orientechnologies.orient.core.sql.OCommandExecutorSQLSelect.execute(OCommandExecutorSQLSelect.java:485)
        at com.orientechnologies.orient.core.sql.OCommandExecutorSQLDelegate.execute(OCommandExecutorSQLDelegate.java:70)
        at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.executeCommand(OAbstractPaginatedStorage.java:3400)
        at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.command(OAbstractPaginatedStorage.java:3318)
        at com.orientechnologies.orient.core.command.OCommandRequestTextAbstract.execute(OCommandRequestTextAbstract.java:69)
        at org.sonatype.nexus.repository.storage.MetadataNodeEntityAdapter.findByProperty(MetadataNodeEntityAdapter.java:165)

PID=1653(0x675)

"qtp1892721215-859 <command>sql.select from asset where bucket = :bucket and name = :propValue</command>" #859 prio=5 os_prio=0 tid=0x00007f496400d800 nid=0x675 runnable [0x00007f4be2de8000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000006005092a8> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
        at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
        at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
        at com.orientechnologies.common.collection.closabledictionary.OClosableEntry.acquireStateLock(OClosableEntry.java:84)
        at com.orientechnologies.common.collection.closabledictionary.OClosableLinkedContainer.acquire(OClosableLinkedContainer.java:292)
        at com.orientechnologies.orient.core.storage.cache.local.OWOWCache.getFilledUpTo(OWOWCache.java:959)
        at com.orientechnologies.orient.core.storage.impl.local.paginated.base.ODurableComponent.getFilledUpTo(ODurableComponent.java:132)
        at com.orientechnologies.orient.core.storage.impl.local.paginated.OPaginatedCluster.readFullEntry(OPaginatedCluster.java:1889)
        at com.orientechnologies.orient.core.storage.impl.local.paginated.OPaginatedCluster.readRecordBuffer(OPaginatedCluster.java:779)
        at com.orientechnologies.orient.core.storage.impl.local.paginated.OPaginatedCluster.readRecord(OPaginatedCluster.java:742)
        at com.orientechnologies.orient.core.storage.impl.local.paginated.OPaginatedCluster.readRecord(OPaginatedCluster.java:721)
        at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.doReadRecord(OAbstractPaginatedStorage.java:4220)
        at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.readRecord(OAbstractPaginatedStorage.java:3807)
        at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.readRecord(OAbstractPaginatedStorage.java:1410)
        at com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx$SimpleRecordReader.readRecord(ODatabaseDocumentTx.java:3395)
        at com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx.executeReadRecord(ODatabaseDocumentTx.java:2008)
        at com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx.load(ODatabaseDocumentTx.java:656)
        at com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx.load(ODatabaseDocumentTx.java:103)
        at com.orientechnologies.orient.core.sql.OCommandExecutorSQLSelect.executeSearchRecord(OCommandExecutorSQLSelect.java:585)
        at com.orientechnologies.orient.core.sql.OCommandExecutorSQLSelect.serialIterator(OCommandExecutorSQLSelect.java:1638)

PID=1951(0x79f)

"qtp1892721215-1159 <command>sql.select from asset where bucket = :bucket and name = :propValue</command>" #1159 prio=5 os_prio=0 tid=0x00007f48d800f000 nid=0x79f waiting on condition [0x00007f45c51cd000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000006005092a8> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
        at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
        at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
        at com.orientechnologies.common.collection.closabledictionary.OClosableEntry.acquireStateLock(OClosableEntry.java:84)
        at com.orientechnologies.common.collection.closabledictionary.OClosableLinkedContainer.acquire(OClosableLinkedContainer.java:292)
        at com.orientechnologies.orient.core.storage.cache.local.OWOWCache.getFilledUpTo(OWOWCache.java:959)
        at com.orientechnologies.orient.core.storage.impl.local.paginated.base.ODurableComponent.getFilledUpTo(ODurableComponent.java:132)
        at com.orientechnologies.orient.core.storage.impl.local.paginated.OPaginatedCluster.readRecordBuffer(OPaginatedCluster.java:762)
        at com.orientechnologies.orient.core.storage.impl.local.paginated.OPaginatedCluster.readRecord(OPaginatedCluster.java:742)
        at com.orientechnologies.orient.core.storage.impl.local.paginated.OPaginatedCluster.readRecord(OPaginatedCluster.java:721)
        at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.doReadRecord(OAbstractPaginatedStorage.java:4220)
        at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.readRecord(OAbstractPaginatedStorage.java:3807)
        at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.readRecord(OAbstractPaginatedStorage.java:1410)
        at com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx$SimpleRecordReader.readRecord(ODatabaseDocumentTx.java:3395)
        at com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx.executeReadRecord(ODatabaseDocumentTx.java:2008)
        at com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx.load(ODatabaseDocumentTx.java:656)
        at com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx.load(ODatabaseDocumentTx.java:103)
        at com.orientechnologies.orient.core.sql.OCommandExecutorSQLSelect.executeSearchRecord(OCommandExecutorSQLSelect.java:585)
        at com.orientechnologies.orient.core.sql.OCommandExecutorSQLSelect.serialIterator(OCommandExecutorSQLSelect.java:1638)
        at com.orientechnologies.orient.core.sql.OCommandExecutorSQLSelect.fetchFromTarget(OCommandExecutorSQLSelect.java:1585)

PID=2285(0x8ed)

"qtp1892721215-1491 <command>sql.select from asset where bucket = :bucket and name = :propValue</command>" #1491 prio=5 os_prio=0 tid=0x00007f4a30006000 nid=0x8ed waiting on condition [0x00007f47c6de8000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x0000000600508ef0> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
        at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
        at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
        at com.orientechnologies.common.collection.closabledictionary.OClosableEntry.acquireStateLock(OClosableEntry.java:84)
        at com.orientechnologies.common.collection.closabledictionary.OClosableLinkedContainer.acquire(OClosableLinkedContainer.java:292)
        at com.orientechnologies.orient.core.storage.cache.local.OWOWCache.getFilledUpTo(OWOWCache.java:959)
        at com.orientechnologies.orient.core.storage.impl.local.paginated.base.ODurableComponent.getFilledUpTo(ODurableComponent.java:132)
        at com.orientechnologies.orient.core.storage.impl.local.paginated.OPaginatedCluster.readFullEntry(OPaginatedCluster.java:1889)
        at com.orientechnologies.orient.core.storage.impl.local.paginated.OPaginatedCluster.readRecordBuffer(OPaginatedCluster.java:779)
        at com.orientechnologies.orient.core.storage.impl.local.paginated.OPaginatedCluster.readRecord(OPaginatedCluster.java:742)
        at com.orientechnologies.orient.core.storage.impl.local.paginated.OPaginatedCluster.readRecord(OPaginatedCluster.java:721)
        at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.doReadRecord(OAbstractPaginatedStorage.java:4220)
        at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.readRecord(OAbstractPaginatedStorage.java:3807)
        at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.readRecord(OAbstractPaginatedStorage.java:1410)
        at com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx$SimpleRecordReader.readRecord(ODatabaseDocumentTx.java:3395)
        at com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx.executeReadRecord(ODatabaseDocumentTx.java:2008)
        at com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx.load(ODatabaseDocumentTx.java:656)
        at com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx.load(ODatabaseDocumentTx.java:103)
        at com.orientechnologies.orient.core.sql.OCommandExecutorSQLSelect.executeSearchRecord(OCommandExecutorSQLSelect.java:585)
        at com.orientechnologies.orient.core.sql.OCommandExecutorSQLSelect.serialIterator(OCommandExecutorSQLSelect.java:1638)

有的时候还发现大量 CPU 利用率在 50% 左右的线程,这些线程执行垃圾收集,但无法回收导致 CPU 利用率高的堆内存

bash-4.4$ top -n 1 -b -H -p 1 | head -n 30 && jstack 1 > /nexus-data/nexus_thread.info
top - 08:04:55 up 263 days, 16:07,  0 users,  load average: 22.25, 16.58, 15.25
Threads: 503 total,  12 running, 491 sleeping,   0 stopped,   0 zombie
%Cpu(s): 25.1 us,  3.0 sy,  0.0 ni, 71.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem : 257429.8 total,   2308.7 free,  36091.9 used, 219029.1 buff/cache
MiB Swap:   4096.0 total,    212.6 free,   3883.4 used. 216354.0 avail Mem

   PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
   265 nexus     20   0   43.8g  10.4g  32660 S  52.9   4.1  78:42.98 java
   257 nexus     20   0   43.8g  10.4g  32660 R  47.1   4.1  55:13.06 java
   259 nexus     20   0   43.8g  10.4g  32660 R  47.1   4.1  55:13.65 java
   250 nexus     20   0   43.8g  10.4g  32660 R  41.2   4.1  55:12.86 java
   251 nexus     20   0   43.8g  10.4g  32660 R  41.2   4.1  55:12.68 java
   252 nexus     20   0   43.8g  10.4g  32660 R  41.2   4.1  55:13.58 java
   253 nexus     20   0   43.8g  10.4g  32660 R  41.2   4.1  55:13.15 java
   254 nexus     20   0   43.8g  10.4g  32660 R  41.2   4.1  55:13.40 java
   255 nexus     20   0   43.8g  10.4g  32660 R  41.2   4.1  55:13.25 java
   256 nexus     20   0   43.8g  10.4g  32660 R  41.2   4.1  55:13.24 java
   258 nexus     20   0   43.8g  10.4g  32660 R  41.2   4.1  55:13.19 java
   260 nexus     20   0   43.8g  10.4g  32660 R  41.2   4.1  55:12.91 java
   261 nexus     20   0   43.8g  10.4g  32660 R  41.2   4.1  55:13.13 java
"Gang worker#0 (Parallel GC Threads)" os_prio=0 tid=0x00007f4e0c05d800 nid=0xfa runnable

"Gang worker#1 (Parallel GC Threads)" os_prio=0 tid=0x00007f4e0c05f000 nid=0xfb runnable

"Gang worker#2 (Parallel GC Threads)" os_prio=0 tid=0x00007f4e0c060800 nid=0xfc runnable

"Gang worker#3 (Parallel GC Threads)" os_prio=0 tid=0x00007f4e0c065000 nid=0xfd runnable

"Gang worker#4 (Parallel GC Threads)" os_prio=0 tid=0x00007f4e0c067000 nid=0xfe runnable

"Gang worker#5 (Parallel GC Threads)" os_prio=0 tid=0x00007f4e0c068800 nid=0xff runnable

"Gang worker#6 (Parallel GC Threads)" os_prio=0 tid=0x00007f4e0c06a800 nid=0x100 runnable

"Gang worker#7 (Parallel GC Threads)" os_prio=0 tid=0x00007f4e0c06c800 nid=0x101 runnable

"Gang worker#8 (Parallel GC Threads)" os_prio=0 tid=0x00007f4e0c06e000 nid=0x102 runnable

"Gang worker#9 (Parallel GC Threads)" os_prio=0 tid=0x00007f4e0c070000 nid=0x103 runnable

"Gang worker#10 (Parallel GC Threads)" os_prio=0 tid=0x00007f4e0c071800 nid=0x104 runnable

"Gang worker#11 (Parallel GC Threads)" os_prio=0 tid=0x00007f4e0c073800 nid=0x105 runnable

GC 日志分析

查看 GC 文件发现频繁

22045.210: [GC (Allocation Failure) 22045.210: [ParNew: 876288K->146048K(876288K), 0.1231922 secs] 4367587K->3685888K(8242560K), 0.1234838 secs] [Times: user=1.08 sys=0.01, real=0.12 secs]
22045.545: [GC (Allocation Failure) 22045.545: [ParNew: 876288K->118569K(876288K), 0.1480261 secs] 4300040K->3649444K(8242560K), 0.1483200 secs] [Times: user=1.19 sys=0.04, real=0.15 secs]
22045.901: [GC (Allocation Failure) 22045.901: [ParNew: 848809K->136830K(876288K), 0.1007517 secs] 4269203K->3557224K(8242560K), 0.1010417 secs] [Times: user=0.92 sys=0.02, real=0.10 secs]
22046.199: [GC (Allocation Failure) 22046.199: [ParNew: 867070K->146048K(876288K), 0.1262746 secs] 4186874K->3508758K(8242560K), 0.1265621 secs] [Times: user=1.15 sys=0.02, real=0.12 secs]
22046.528: [GC (Allocation Failure) 22046.528: [ParNew: 876288K->146048K(876288K), 0.1154658 secs] 4135340K->3445385K(8242560K), 0.1157458 secs] [Times: user=1.03 sys=0.01, real=0.12 secs]
22046.863: [GC (Allocation Failure) 22046.864: [ParNew: 876288K->146048K(876288K), 0.1380234 secs] 4056185K->3381568K(8242560K), 0.1383243 secs] [Times: user=1.20 sys=0.02, real=0.14 secs]
22047.238: [GC (Allocation Failure) 22047.239: [ParNew: 876288K->146048K(876288K), 0.1175596 secs] 4007589K->3321101K(8242560K), 0.1178741 secs] [Times: user=0.91 sys=0.01, real=0.11 secs]
22047.577: [GC (Allocation Failure) 22047.577: [ParNew: 876288K->146048K(876288K), 0.1095810 secs] 3979304K->3283188K(8242560K), 0.1098780 secs] [Times: user=0.98 sys=0.01, real=0.11 secs]
22047.906: [GC (Allocation Failure) 22047.906: [ParNew: 876288K->146048K(876288K), 0.1460188 secs] 3936187K->3253343K(8242560K), 0.1464933 secs] [Times: user=1.33 sys=0.02, real=0.15 secs]
22048.254: [GC (Allocation Failure) 22048.254: [ParNew: 876288K->146048K(876288K), 0.1357416 secs] 3906382K->3219026K(8242560K), 0.1360523 secs] [Times: user=1.24 sys=0.01, real=0.14 secs]
22048.604: [GC (Allocation Failure) 22048.605: [ParNew: 876288K->146048K(876288K), 0.1257560 secs] 3860409K->3178499K(8242560K), 0.1260555 secs] [Times: user=1.11 sys=0.01, real=0.13 secs]
22048.944: [GC (Allocation Failure) 22048.944: [ParNew: 876288K->146048K(876288K), 0.1227148 secs] 3848451K->3162435K(8242560K), 0.1230789 secs] [Times: user=0.93 sys=0.01, real=0.12 secs]
22049.275: [GC (Allocation Failure) 22049.275: [ParNew: 876288K->146048K(876288K), 0.1300142 secs] 3837726K->3164053K(8242560K), 0.1303308 secs] [Times: user=1.12 sys=0.02, real=0.13 secs]
22049.617: [GC (Allocation Failure) 22049.617: [ParNew: 876288K->146048K(876288K), 0.1560830 secs] 3785033K->3102054K(8242560K), 0.1564236 secs] [Times: user=1.30 sys=0.04, real=0.16 secs]
22049.829: [CMS-concurrent-sweep: 7.565/12.417 secs] [Times: user=185.39 sys=17.96, real=12.42 secs]
22049.830: [CMS-concurrent-reset-start]
22049.892: [CMS-concurrent-reset: 0.063/0.063 secs] [Times: user=1.14 sys=0.16, real=0.06 secs]
22049.984: [GC (Allocation Failure) 22049.984: [ParNew: 876288K->146048K(876288K), 0.1332560 secs] 3749726K->3073390K(8242560K), 0.1335766 secs] [Times: user=1.14 sys=0.04, real=0.13 secs]
22050.326: [GC (Allocation Failure) 22050.326: [ParNew: 876288K->146048K(876288K), 0.1500684 secs] 3803630K->3113169K(8242560K), 0.1504454 secs] [Times: user=1.24 sys=0.02, real=0.15 secs]
22050.701: [GC (Allocation Failure) 22050.701: [ParNew: 876288K->146048K(876288K), 0.1538092 secs] 3843409K->3161059K(8242560K), 0.1541364 secs] [Times: user=1.23 sys=0.03, real=0.15 secs]
22051.055: [GC (Allocation Failure) 22051.055: [ParNew: 876288K->146048K(876288K), 0.1283893 secs] 3891299K->3198929K(8242560K), 0.1287281 secs] [Times: user=1.08 sys=0.00, real=0.13 secs]
22051.401: [GC (Allocation Failure) 22051.401: [ParNew: 876288K->146048K(876288K), 0.1382001 secs] 3929169K->3244359K(8242560K), 0.1385485 secs] [Times: user=1.07 sys=0.02, real=0.14 secs]
22051.753: [GC (Allocation Failure) 22051.754: [ParNew: 876288K->146048K(876288K), 0.1212361 secs] 3974599K->3291855K(8242560K), 0.1217161 secs] [Times: user=1.01 sys=0.03, real=0.12 secs]
22052.076: [GC (Allocation Failure) 22052.077: [ParNew: 876288K->146048K(876288K), 0.1348037 secs] 4022095K->3334990K(8242560K), 0.1351310 secs] [Times: user=1.12 sys=0.01, real=0.14 secs]
22052.432: [GC (Allocation Failure) 22052.432: [ParNew: 876288K->146048K(876288K), 0.1207005 secs] 4065230K->3377339K(8242560K), 0.1210904 secs] [Times: user=1.01 sys=0.02, real=0.12 secs]
22052.757: [GC (Allocation Failure) 22052.757: [ParNew: 876288K->146048K(876288K), 0.1216388 secs] 4107579K->3427506K(8242560K), 0.1219509 secs] [Times: user=0.96 sys=0.02, real=0.13 secs]
22053.102: [GC (Allocation Failure) 22053.102: [ParNew: 876288K->146048K(876288K), 0.1176569 secs] 4157746K->3474495K(8242560K), 0.1179791 secs] [Times: user=0.94 sys=0.03, real=0.12 secs]
22053.444: [GC (Allocation Failure) 22053.444: [ParNew: 876288K->146048K(876288K), 0.1153058 secs] 4204735K->3524874K(8242560K), 0.1156208 secs] [Times: user=0.93 sys=0.02, real=0.12 secs]
22053.825: [GC (Allocation Failure) 22053.825: [ParNew: 876288K->146048K(876288K), 0.1370090 secs] 4255114K->3574406K(8242560K), 0.1376114 secs] [Times: user=1.16 sys=0.04, real=0.14 secs]
22054.226: [GC (Allocation Failure) 22054.226: [ParNew: 876288K->146048K(876288K), 0.1293793 secs] 4304646K->3603376K(8242560K), 0.1297292 secs] [Times: user=1.06 sys=0.02, real=0.13 secs]
22054.580: [GC (Allocation Failure) 22054.580: [ParNew: 876288K->146048K(876288K), 0.1383698 secs] 4333616K->3655122K(8242560K), 0.1387208 secs] [Times: user=1.10 sys=0.01, real=0.14 secs]
22054.928: [GC (Allocation Failure) 22054.928: [ParNew: 876288K->146048K(876288K), 0.1168988 secs] 4385362K->3691694K(8242560K), 0.1172026 secs] [Times: user=1.04 sys=0.01, real=0.12 secs]
22055.269: [GC (Allocation Failure) 22055.269: [ParNew: 876288K->146048K(876288K), 0.1274913 secs] 4421934K->3724706K(8242560K), 0.1278116 secs] [Times: user=1.02 sys=0.02, real=0.12 secs]
22055.626: [GC (Allocation Failure) 22055.626: [ParNew: 876288K->146048K(876288K), 0.1319667 secs] 4454946K->3769275K(8242560K), 0.1323446 secs] [Times: user=1.03 sys=0.03, real=0.14 secs]
22055.964: [GC (Allocation Failure) 22055.964: [ParNew: 876288K->146048K(876288K), 0.1282238 secs] 4499515K->3814194K(8242560K), 0.1285613 secs] [Times: user=1.02 sys=0.02, real=0.13 secs]
22056.286: [GC (Allocation Failure) 22056.286: [ParNew: 876288K->146048K(876288K), 0.1295692 secs] 4544434K->3854773K(8242560K), 0.1299219 secs] [Times: user=1.01 sys=0.02, real=0.13 secs]
22056.627: [GC (Allocation Failure) 22056.627: [ParNew: 876288K->146048K(876288K), 0.1299080 secs] 4585013K->3904723K(8242560K), 0.1302944 secs] [Times: user=1.06 sys=0.02, real=0.13 secs]
22056.962: [GC (Allocation Failure) 22056.962: [ParNew: 876288K->146048K(876288K), 0.1490967 secs] 4634963K->3964984K(8242560K), 0.1494244 secs] [Times: user=1.15 sys=0.02, real=0.15 secs]
22057.318: [GC (Allocation Failure) 22057.318: [ParNew: 876288K->146048K(876288K), 0.1499307 secs] 4695224K->4010986K(8242560K), 0.1503423 secs] [Times: user=1.23 sys=0.03, real=0.16 secs]
22057.680: [GC (Allocation Failure) 22057.680: [ParNew: 876288K->146048K(876288K), 0.1232765 secs] 4741226K->4057267K(8242560K), 0.1235796 secs] [Times: user=0.99 sys=0.04, real=0.12 secs]
22058.018: [GC (Allocation Failure) 22058.018: [ParNew: 876288K->146048K(876288K), 0.1574910 secs] 4787507K->4107281K(8242560K), 0.1577976 secs] [Times: user=1.32 sys=0.03, real=0.16 secs]

22045.210: [GC (Allocation Failure) 22045.210: [ParNew: 876288K->146048K(876288K), 0.1231922 secs] 4367587K->3685888K(8242560K), 0.1234838 secs] [Times: user=1.08 sys=0.01, real=0.12 secs]

  • ParNew: 876288K->146048K(876288K), 0.1231922 secs

GC 前年轻代 876288KB(876MB), GC后该内存区域使用容量 146048KB; GC 耗时 0.1231922 秒

  • 4367587K->3685888K(8242560K), 0.1234838 secs

堆区垃圾回收前的大小 4367587K(4GB),堆区垃圾回收后的大小 3685888K(3.6GB),堆区总大小 8242560K(8GB); GC 耗时 0.1234838 秒

  • [Times: user=1.08 sys=0.01, real=0.12 secs]

分别表示用户态耗时 1.08 秒,内核态耗时 0.01 秒,总耗时 0.12 秒

分析下可以得出结论:

  • 该次 GC 新生代减少了 876288K - 146048K = 730240K (730MB)
  • Heap 区总共减少了 4367587K - 3685888K = 681699K (681MB)

730240K – 681699K = 48541K (48MB),说明该次共有 48MB 内存从年轻代移到了老年代,可以看出年轻代频繁达到最大(876MB)并触法 ParNew, 该收集器采用复制算法回收内存,期间会停止其他工作线程,即 Stop The World