本文主要分析一個頻繁產生對象形成gc時間過長的case。java
gc時間過長,平均gc pause的時間要將近4秒,有13%的gc超過10秒,太可怕了,部分gc日誌以下:spring
[PSYoungGen: 457878K->126656K(489472K)] 1746043K->1453131K(1887744K), 12.1965757 secs] [Times: user=5.59 sys=0.52, real=12.19 secs] 154415.774: [GC (Allocation Failure) Desired survivor size 212860928 bytes, new threshold 1 (max 15) [PSYoungGen: 376192K->65968K(484864K)] 1702667K->1392499K(1883136K), 0.1665513 secs] [Times: user=0.10 sys=0.00, real=0.17 secs] 154416.838: [GC (Allocation Failure) Desired survivor size 235929600 bytes, new threshold 1 (max 15) [PSYoungGen: 341424K->196182K(445952K)] 1667955K->1523034K(1844224K), 1.7996294 secs] [Times: user=0.89 sys=0.03, real=1.80 secs] 154419.456: [GC (Allocation Failure) Desired survivor size 225968128 bytes, new threshold 1 (max 15) [PSYoungGen: 434262K->121776K(468480K)] 1761114K->1486938K(1866752K), 23.0844304 secs] [Times: user=10.75 sys=0.81, real=23.09 secs] 154442.541: [Full GC (Ergonomics) [PSYoungGen: 121776K->0K(468480K)] [ParOldGen: 1365162K->208108K(1398272K)] 1486938K->208108K(1866752K), [Metaspace: 93615K->93615K(1132544K)], 23.5955214 secs] [Times: user=3.76 sys=5.30, real=23.59 secs] 154504.670: [GC (Allocation Failure) Desired survivor size 217579520 bytes, new threshold 1 (max 15) [PSYoungGen: 200553K->106368K(486400K)] 408662K->314476K(1884672K), 1.0664613 secs] [Times: user=0.39 sys=0.13, real=1.06 secs] 154507.542: [GC (Allocation Failure) Desired survivor size 218103808 bytes, new threshold 1 (max 15) [PSYoungGen: 372096K->144182K(478208K)] 580204K->425927K(1876480K), 5.7999561 secs] [Times: user=1.12 sys=1.55, real=5.80 secs] 154514.037: [GC (Allocation Failure) Desired survivor size 213909504 bytes, new threshold 1 (max 15) [PSYoungGen: 409910K->87920K(489984K)] 691655K->407999K(1888256K), 10.1020217 secs] [Times: user=4.46 sys=0.61, real=10.11 secs] 154563.240: [GC (Allocation Failure) Desired survivor size 213385216 bytes, new threshold 1 (max 15) [PSYoungGen: 328380K->65952K(485888K)] 648460K->386087K(1884160K), 0.0918412 secs] [Times: user=0.04 sys=0.01, real=0.09 secs] 154564.037: [GC (Allocation Failure) Desired survivor size 219676672 bytes, new threshold 1 (max 15) [PSYoungGen: 342944K->153558K(478208K)] 663079K->474022K(1876480K), 3.1948641 secs] [Times: user=0.72 sys=0.69, real=3.19 secs] 154568.135: [GC (Allocation Failure) Desired survivor size 212336640 bytes, new threshold 1 (max 15) [PSYoungGen: 423382K->98528K(484352K)] 743846K->457302K(1882624K), 13.4085860 secs] [Times: user=6.04 sys=0.69, real=13.41 secs]
經過jmap dump下內存以後,使用mat分享,查看thread_overview
json
能夠看到ElasticsearchJestHealthIndicator.doHealthCheck持有了不少對象沒釋放app
spring-context-4.3.7.RELEASE-sources.jar!/org/springframework/jmx/export/SpringModelMBean.javaelasticsearch
/** * Switches the {@link Thread#getContextClassLoader() context ClassLoader} for the * managed resources {@link ClassLoader} before allowing the invocation to occur. * @see javax.management.modelmbean.ModelMBean#invoke */ @Override public Object invoke(String opName, Object[] opArgs, String[] sig) throws MBeanException, ReflectionException { ClassLoader currentClassLoader = Thread.currentThread().getContextClassLoader(); try { Thread.currentThread().setContextClassLoader(this.managedResourceClassLoader); return super.invoke(opName, opArgs, sig); } finally { Thread.currentThread().setContextClassLoader(currentClassLoader); } }
spring-boot-actuator-1.4.5.RELEASE-sources.jar!/org/springframework/boot/actuate/endpoint/jmx/DataEndpointMBean.javaide
@ManagedResource public class DataEndpointMBean extends EndpointMBean { /** * Create a new {@link DataEndpointMBean} instance. * @param beanName the bean name * @param endpoint the endpoint to wrap * @param objectMapper the {@link ObjectMapper} used to convert the payload */ public DataEndpointMBean(String beanName, Endpoint<?> endpoint, ObjectMapper objectMapper) { super(beanName, endpoint, objectMapper); } @ManagedAttribute(description = "Invoke the underlying endpoint") public Object getData() { return convert(getEndpoint().invoke()); } }
系本地idea開啓了Enable JMX Agent才能夠復現
spring-boot-admin-server-1.4.6-sources.jar!/de/codecentric/boot/admin/registry/StatusUpdateApplicationListener.javaspring-boot
private long updatePeriod = 10_000L; public void startStatusUpdate() { if (scheduledTask != null && !scheduledTask.isDone()) { return; } scheduledTask = taskScheduler.scheduleAtFixedRate(new Runnable() { @Override public void run() { statusUpdater.updateStatusForAllApplications(); } }, updatePeriod); LOGGER.debug("Scheduled status-updater task for every {}ms", updatePeriod); }
spring-boot-admin-server-1.4.6-sources.jar!/de/codecentric/boot/admin/registry/StatusUpdater.javaui
public void updateStatusForAllApplications() { long now = System.currentTimeMillis(); for (Application application : store.findAll()) { if (now - statusLifetime > application.getStatusInfo().getTimestamp()) { updateStatus(application); } } } public void updateStatus(Application application) { StatusInfo oldStatus = application.getStatusInfo(); StatusInfo newStatus = queryStatus(application); Application newState = Application.create(application).withStatusInfo(newStatus).build(); store.save(newState); if (!newStatus.equals(oldStatus)) { publisher.publishEvent( new ClientApplicationStatusChangedEvent(newState, oldStatus, newStatus)); } } protected StatusInfo queryStatus(Application application) { LOGGER.trace("Updating status for {}", application); try { @SuppressWarnings("unchecked") ResponseEntity<Map<String, Object>> response = restTemplate.getForEntity( application.getHealthUrl(), (Class<Map<String, Object>>) (Class<?>) Map.class); LOGGER.debug("/health for {} responded with {}", application, response); if (response.hasBody() && response.getBody().get("status") instanceof String) { return StatusInfo.valueOf((String) response.getBody().get("status")); } else if (response.getStatusCode().is2xxSuccessful()) { return StatusInfo.ofUp(); } else { return StatusInfo.ofDown(); } } catch (Exception ex) { if ("OFFLINE".equals(application.getStatusInfo().getStatus())) { LOGGER.debug("Couldn't retrieve status for {}", application, ex); } else { LOGGER.warn("Couldn't retrieve status for {}", application, ex); } return StatusInfo.ofOffline(); } }
能夠看到這個admin-server註冊了個定時任務,定時調用/health
spring-boot-actuator-1.4.5.RELEASE-sources.jar!/org/springframework/boot/actuate/autoconfigure/ElasticsearchHealthIndicatorConfiguration.javathis
class ElasticsearchHealthIndicatorConfiguration { @Configuration @ConditionalOnBean(Client.class) @ConditionalOnEnabledHealthIndicator("elasticsearch") @EnableConfigurationProperties(ElasticsearchHealthIndicatorProperties.class) static class ElasticsearchClientHealthIndicatorConfiguration extends CompositeHealthIndicatorConfiguration<ElasticsearchHealthIndicator, Client> { private final Map<String, Client> clients; private final ElasticsearchHealthIndicatorProperties properties; ElasticsearchClientHealthIndicatorConfiguration(Map<String, Client> clients, ElasticsearchHealthIndicatorProperties properties) { this.clients = clients; this.properties = properties; } @Bean @ConditionalOnMissingBean(name = "elasticsearchHealthIndicator") public HealthIndicator elasticsearchHealthIndicator() { return createHealthIndicator(this.clients); } @Override protected ElasticsearchHealthIndicator createHealthIndicator(Client client) { return new ElasticsearchHealthIndicator(client, this.properties); } } @Configuration @ConditionalOnBean(JestClient.class) @ConditionalOnEnabledHealthIndicator("elasticsearch") static class ElasticsearchJestHealthIndicatorConfiguration extends CompositeHealthIndicatorConfiguration<ElasticsearchJestHealthIndicator, JestClient> { private final Map<String, JestClient> clients; ElasticsearchJestHealthIndicatorConfiguration(Map<String, JestClient> clients) { this.clients = clients; } @Bean @ConditionalOnMissingBean(name = "elasticsearchHealthIndicator") public HealthIndicator elasticsearchHealthIndicator() { return createHealthIndicator(this.clients); } @Override protected ElasticsearchJestHealthIndicator createHealthIndicator( JestClient client) { return new ElasticsearchJestHealthIndicator(client); } } }
spring-boot-actuator-1.4.5.RELEASE-sources.jar!/org/springframework/boot/actuate/health/ElasticsearchHealthIndicator.javaidea
@Override protected void doHealthCheck(Health.Builder builder) throws Exception { List<String> indices = this.properties.getIndices(); ClusterHealthResponse response = this.client.admin().cluster() .health(Requests.clusterHealthRequest(indices.isEmpty() ? allIndices : indices.toArray(new String[indices.size()]))) .actionGet(this.properties.getResponseTimeout()); switch (response.getStatus()) { case GREEN: case YELLOW: builder.up(); break; case RED: default: builder.down(); break; } builder.withDetail("clusterName", response.getClusterName()); builder.withDetail("numberOfNodes", response.getNumberOfNodes()); builder.withDetail("numberOfDataNodes", response.getNumberOfDataNodes()); builder.withDetail("activePrimaryShards", response.getActivePrimaryShards()); builder.withDetail("activeShards", response.getActiveShards()); builder.withDetail("relocatingShards", response.getRelocatingShards()); builder.withDetail("initializingShards", response.getInitializingShards()); builder.withDetail("unassignedShards", response.getUnassignedShards()); }
spring-boot-actuator-1.4.5.RELEASE-sources.jar!/org/springframework/boot/actuate/health/ElasticsearchHealthIndicatorProperties.java
@ConfigurationProperties(prefix = "management.health.elasticsearch", ignoreUnknownFields = false) public class ElasticsearchHealthIndicatorProperties { /** * Comma-separated index names. */ private List<String> indices = new ArrayList<String>(); /** * Time, in milliseconds, to wait for a response from the cluster. */ private long responseTimeout = 100L; //...... }
若是有指定indices,則會查詢他們的健康狀況,好比/index1,index2/_stats,若是沒有則查詢全部的indices,這個就是個潛在的坑。
spring-boot-actuator-1.4.5.RELEASE-sources.jar!/org/springframework/boot/actuate/health/ElasticsearchJestHealthIndicator.java
@Override protected void doHealthCheck(Health.Builder builder) throws Exception { JestResult aliases = this.jestClient.execute(new Stats.Builder().build()); JsonElement root = this.jsonParser.parse(aliases.getJsonString()); JsonObject shards = root.getAsJsonObject().get("_shards").getAsJsonObject(); int failedShards = shards.get("failed").getAsInt(); if (failedShards != 0) { builder.outOfService(); } else { builder.up(); } }
jest最後發出的http請求是/_all/_stats,這個就是問題所在,查詢全部_all的統計數據,對於一個大的elasticsearch平臺來講,返回的數據是巨大的,將近5000條數據,返回的json純文件都要20多M,這個再加上定時任務/health查詢,致使新生對象不斷產生,ygc很是頻繁,形成內存泄露的現象。
定時監控應用health是個好東東,可是得注意頻率,另外還得關注具體實現,像elasticsearch的這個稍不注意就被坑了,至關於定時產生必定量的垃圾,頻率超過垃圾回收的速度,相似內存泄露,給應用gc帶來很大的負擔。