ES相关诊断及内存优化总结



ES相关诊断

文章通过es提供的jvm/stats内存占用统计发现内存占比最大的部分,针对这部分不断进行排查和尝试,最终进行优化和总结。

ES机器查看内存:

我们查看了机器上的es设置内存:

http://127.0.0.1:9200/_cat/nodes?h=heap.max

为1.9Gb。

http://127.0.0.1:9200/_cat/shards?pretty 查看分片

http://127.0.0.1:9200/_cluster/health?pretty 健康状态为yellow 应为单点导致

查看每个index占用内存情况:

http://127.0.0.1:9200/_cat/indices?v&h=i,tm,sm&s=tm:desc

内存占用情况:

i                                tm      sm
nfvofcaps                    11.9mb  11.9mb
nfvocatalog                   1.9mb   1.9mb
nfvosysmgnt                   1.9mb   1.9mb
nfvonslcm                     1.1mb   1.1mb
nfvowindriver                   1mb     1mb
nfvoemanagerbe              963.1kb 918.6kb
nfvoemspm                   580.3kb 453.6kb
nfvohosts                   417.4kb  80.6kb
nfvovms                     293.7kb 267.8kb
nfvomultivimbroker          150.2kb 150.2kb
nfvoemanagerbesafe          146.1kb 146.1kb
nfvopolicy                   88.7kb  88.7kb
nfvodsmspoolsvolumes         62.2kb  62.2kb
netelement                   59.7kb  59.7kb
nfvosysmgntkpi               59.3kb  59.3kb
nfvodsmspools                52.1kb  52.1kb
nfvoserversethports          51.4kb  51.4kb
nfvomysql                      49kb    49kb
nfvoservers                  47.9kb  47.1kb
nfvodsms                     44.8kb    44kb
mysqlsysmanager              11.7kb  11.7kb
.kibana                       8.5kb   8.5kb
nfvorouters                      0b      0b
nfvoddos                         0b      0b
jbossmq-httpil                   0b      0b
nfvodiskarraysdisk               0b      0b
nfvoswitches                     0b      0b
nfvowafs                         0b      0b
nfvodiskarraysluns               0b      0b
nfvois                           0b      0b
nfvodiskarraysstoragepools       0b      0b
nfvodiskarrays                   0b      0b
invoker                          0b      0b
nfvofirewalls                    0b      0b
nfvodiskarrayscontrollers        0b      0b
nfvoroutersports                 0b      0b
nfvodiskarraysethernetports      0b      0b
nfvolbsports                     0b      0b
nfvoswitchesports                0b      0b
nfvolbs                          0b      0b
nfvouser                         0b      0b

es节点报错信息:

2019-10-23T01:43:09,074][INFO ][o.e.c.r.a.DiskThresholdMonitor] [node-1] low disk watermark [85%] exceeded on [b2udS94bS8mY971eqSholA][node-1][/opt/cmcc/es_data_backup/data/nodes/0] free: 25.1gb[12.9%], replicas will not be assigned to this node

这个是磁盘空间超过es的高水位配置阈值,清理es磁盘无用文件,节省es磁盘空间。

es jvm参数配置:

-Xms2g -Xmx2g

-XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly

-XX:+DisableExplicitGC 堆外不让回收

查看es进程占用物理内存情况

Name:	java
State:	S (sleeping)
Tgid:	1629
Ngid:	0
Pid:	1629
PPid:	1531
TracerPid:	0
Uid:	1000	1000	1000	1000
Gid:	1000	1000	1000	1000
FDSize:	1024
Groups:	1000 
NStgid:	1629	35
NSpid:	1629	35
NSpgid:	1628	34
NSsid:	1628	34
VmPeak:	30758876 kB 进程所使用的虚拟内存的峰值
VmSize:	30753964 kB  进程当前使用的虚拟内存的大小
VmLck:	       0 kB
VmPin:	       0 kB
VmHWM:	 5195512 kB 进程所使用的物理内存的峰值 5GB
VmRSS:	 5193688 kB 进程当前使用的物理内存的大小
VmData:	14763200 kB 进程占用的数据段大小
VmStk:	     132 kB
VmExe:	       4 kB
VmLib:	   18308 kB
VmPTE:	   13324 kB
VmPMD:	     132 kB
VmSwap:	       0 kB
HugetlbPages:	       0 kB
Threads:	157
SigQ:	0/257554
SigPnd:	0000000000000000
ShdPnd:	0000000000000000
SigBlk:	0000000000000000
SigIgn:	0000000000000003
SigCgt:	2000000181005ccc
CapInh:	00000000a80425fb
CapPrm:	0000000000000000
CapEff:	0000000000000000
CapBnd:	00000000a80425fb
CapAmb:	0000000000000000
Seccomp:	2
Speculation_Store_Bypass:	vulnerable
Cpus_allowed:	ffff
Cpus_allowed_list:	0-15
Mems_allowed:	00000000,00000001
Mems_allowed_list:	0
voluntary_ctxt_switches:	17
nonvoluntary_ctxt_switches:	4

很明显es java进程所占用的虚拟内存较多。

查看es jvm内存占用情况

jvm内存监控占用

http://127.0.0.1:9200/_nodes/stats/jvm?pretty jvm统计信息

{
  "_nodes" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "cluster_name" : "es-cluster",
  "nodes" : {
    "b2udS94bS8mY971eqSholA" : {
      "timestamp" : 1571801549425,
      "name" : "node-1",
      "transport_address" : "127.0.0.1:9300",
      "host" : "127.0.0.1",
      "ip" : "127.0.0.1:9300",
      "roles" : [
        "master",
        "data",
        "ingest"
      ],
      "jvm" : {
        "timestamp" : 1571801549425,
        "uptime_in_millis" : 1206006,
        "mem" : {
          "heap_used_in_bytes" : 530281400,// 已用的堆内存 ,大约500MB,不算多
          "heap_used_percent" : 25, //使用的堆内存比例只占用25%
          "heap_committed_in_bytes" : 2075918336,
          "heap_max_in_bytes" : 2075918336,
          "non_heap_used_in_bytes" : 101650240,
          "non_heap_committed_in_bytes" : 107356160,
          "pools" : {
            "young" : {  //新生代,young
              "used_in_bytes" : 26739464,
              "max_in_bytes" : 572653568,
              "peak_used_in_bytes" : 572653568,
              "peak_max_in_bytes" : 572653568
            },
            "survivor" : { //新生代survivor区
              "used_in_bytes" : 7893160,
              "max_in_bytes" : 71565312,
              "peak_used_in_bytes" : 71565312,
              "peak_max_in_bytes" : 71565312
            },
            "old" : { //老年代
              "used_in_bytes" : 495648776,
              "max_in_bytes" : 1431699456,
              "peak_used_in_bytes" : 495648776,
              "peak_max_in_bytes" : 1431699456
            }
          }
        },
        "threads" : {
          "count" : 120,
          "peak_count" : 148
        },
        "gc" : {
          "collectors" : {
            "young" : {
              "collection_count" : 9,
              "collection_time_in_millis" : 269
            },
            "old" : {
              "collection_count" : 1,
              "collection_time_in_millis" : 118
            }
          }
        },
        "buffer_pools" : {
          "direct" : {
            "count" : 119,
            "used_in_bytes" : 337886916, //322Mb
            "total_capacity_in_bytes" : 337886915
          },
          "mapped" : {
            "count" : 628,
            "used_in_bytes" : 16095062834, //14GB ,es大部分内存占用在这里了,问题已经找到了,
            "total_capacity_in_bytes" : 16095062834
          }
        },
        "classes" : {
          "current_loaded_count" : 11014,
          "total_loaded_count" : 11014,
          "total_unloaded_count" : 0
        }
      }
    }
  }
}

es node节点监控统计

http://127.0.0.1:9200/_nodes/stats?pretty 节点信息统计

{
  "_nodes" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "cluster_name" : "es-cluster",
  "nodes" : {
    "b2udS94bS8mY971eqSholA" : {
      "timestamp" : 1571811340573,
      "name" : "node-1",
      "transport_address" : "127.0.0.1:9300",
      "host" : "127.0.0.1",
      "ip" : "127.0.0.1:9300",
      "roles" : [
        "master",
        "data",
        "ingest"
      ],
      "indices" : {
        "docs" : {
          "count" : 21176041, //索引文档数
          "deleted" : 0
        },
        "store" : {
          "size_in_bytes" : 16179548377, //存储大小15GB,磁盘大小
          "throttle_time_in_millis" : 0
        },
        "indexing" : {
          "index_total" : 77751, //索引总数
          "index_time_in_millis" : 25491,
          "index_current" : 0,
          "index_failed" : 0,
          "delete_total" : 0,
          "delete_time_in_millis" : 0,
          "delete_current" : 0,
          "noop_update_total" : 0,
          "is_throttled" : false,
          "throttle_time_in_millis" : 0
        },
        "get" : {
          "total" : 4369,
          "time_in_millis" : 318,
          "exists_total" : 4369,
          "exists_time_in_millis" : 318,
          "missing_total" : 0,
          "missing_time_in_millis" : 0,
          "current" : 0
        },
        "search" : {
          "open_contexts" : 0,
          "query_total" : 4532,
          "query_time_in_millis" : 11888,
          "query_current" : 0,
          "fetch_total" : 4516,
          "fetch_time_in_millis" : 321,
          "fetch_current" : 0,
          "scroll_total" : 0,
          "scroll_time_in_millis" : 0,
          "scroll_current" : 0,
          "suggest_total" : 0,
          "suggest_time_in_millis" : 0,
          "suggest_current" : 0
        },
        "merges" : {
          "current" : 0,
          "current_docs" : 0,
          "current_size_in_bytes" : 0,
          "total" : 69,
          "total_time_in_millis" : 9719,
          "total_docs" : 334463,
          "total_size_in_bytes" : 217335811,
          "total_stopped_time_in_millis" : 0,
          "total_throttled_time_in_millis" : 0,
          "total_auto_throttle_in_bytes" : 1363148800
        },
        "refresh" : {
          "total" : 734,
          "total_time_in_millis" : 7487,
          "listeners" : 0
        },
        "flush" : {
          "total" : 96,
          "total_time_in_millis" : 6427
        },
        "warmer" : {
          "current" : 0,
          "total" : 838,
          "total_time_in_millis" : 288
        },
        "query_cache" : {
          "memory_size_in_bytes" : 0,
          "total_count" : 0,
          "hit_count" : 0,
          "miss_count" : 0,
          "cache_size" : 0,
          "cache_count" : 0,
          "evictions" : 0
        },
        "fielddata" : {
          "memory_size_in_bytes" : 101328, //内存大小98kb
          "evictions" : 0
        },
        "completion" : {
          "size_in_bytes" : 0
        },
        "segments" : {
          "count" : 277,
          "memory_in_bytes" : 21673476, //segement内存大小 20Mb
          "terms_memory_in_bytes" : 16329238, //15MB
          "stored_fields_memory_in_bytes" : 4734696,
          "term_vectors_memory_in_bytes" : 0,
          "norms_memory_in_bytes" : 245888,
          "points_memory_in_bytes" : 159766,
          "doc_values_memory_in_bytes" : 203888,
          "index_writer_memory_in_bytes" : 0,
          "version_map_memory_in_bytes" : 0,
          "fixed_bit_set_memory_in_bytes" : 0,
          "max_unsafe_auto_id_timestamp" : 1571800339228,
          "file_sizes" : { }
        },
        "translog" : {
          "operations" : 3904,
          "size_in_bytes" : 3197456
        },
        "request_cache" : {
          "memory_size_in_bytes" : 0,
          "evictions" : 0,
          "hit_count" : 0,
          "miss_count" : 0
        },
        "recovery" : {
          "current_as_source" : 0,
          "current_as_target" : 0,
          "throttle_time_in_millis" : 0
        }
      },
      "os" : {
        "timestamp" : 1571811340594,
        "cpu" : {
          "percent" : 1,
          "load_average" : {
            "1m" : 0.08,
            "5m" : 0.2,
            "15m" : 0.25
          }
        },
        "mem" : {
          "total_in_bytes" : 67559206912, //整个机器节点总内存大小
          "free_in_bytes" : 1820467200,
          "used_in_bytes" : 65738739712, //整个机器节点 内存使用61GB大小
          "free_percent" : 3,
          "used_percent" : 97
        },
        "swap" : {
          "total_in_bytes" : 0,
          "free_in_bytes" : 0,
          "used_in_bytes" : 0
        }
      },
      "process" : {
        "timestamp" : 1571811340595,
        "open_file_descriptors" : 635,
        "max_file_descriptors" : 1048576,
        "cpu" : {
          "percent" : 0,
          "total_in_millis" : 324400
        },
        "mem" : {
          "total_virtual_in_bytes" : 28956041216
        }
      },
      "jvm" : {
        "timestamp" : 1571811340595,
        "uptime_in_millis" : 10997176,
        "mem" : {
          "heap_used_in_bytes" : 660027368, //629Mb左右
          "heap_used_percent" : 31,
          "heap_committed_in_bytes" : 2075918336,
          "heap_max_in_bytes" : 2075918336,
          "non_heap_used_in_bytes" : 118930880,
          "non_heap_committed_in_bytes" : 125509632,
          "pools" : {
            "young" : {
              "used_in_bytes" : 24099448,
              "max_in_bytes" : 572653568,
              "peak_used_in_bytes" : 572653568,
              "peak_max_in_bytes" : 572653568
            },
            "survivor" : {
              "used_in_bytes" : 16533280,
              "max_in_bytes" : 71565312,
              "peak_used_in_bytes" : 71565312,
              "peak_max_in_bytes" : 71565312
            },
            "old" : {
              "used_in_bytes" : 619394640,
              "max_in_bytes" : 1431699456,
              "peak_used_in_bytes" : 619394640,
              "peak_max_in_bytes" : 1431699456
            }
          }
        },
        "threads" : {
          "count" : 122,
          "peak_count" : 148
        },
        "gc" : {
          "collectors" : {
            "young" : {
              "collection_count" : 39,
              "collection_time_in_millis" : 751
            },
            "old" : {
              "collection_count" : 1,
              "collection_time_in_millis" : 118
            }
          }
        },
        "buffer_pools" : {
          "direct" : {
            "count" : 122,
            "used_in_bytes" : 338030316, //直接内存 322MB
            "total_capacity_in_bytes" : 338030315
          },
          "mapped" : {
            "count" : 615,
            "used_in_bytes" : 16152649050, //14GB
            "total_capacity_in_bytes" : 16152649050
          }
        },
        "classes" : {
          "current_loaded_count" : 11801,
          "total_loaded_count" : 11801,
          "total_unloaded_count" : 0
        }
      },
      "thread_pool" : {
        "bulk" : {
          "threads" : 16,
          "queue" : 0,
          "active" : 0,
          "rejected" : 0,
          "largest" : 16,
          "completed" : 670
        },
        "fetch_shard_started" : {
          "threads" : 1,
          "queue" : 0,
          "active" : 0,
          "rejected" : 0,
          "largest" : 32,
          "completed" : 65
        },
        "fetch_shard_store" : {
          "threads" : 0,
          "queue" : 0,
          "active" : 0,
          "rejected" : 0,
          "largest" : 0,
          "completed" : 0
        },
        "flush" : {
          "threads" : 2,
          "queue" : 0,
          "active" : 0,
          "rejected" : 0,
          "largest" : 5,
          "completed" : 192
        },
        "force_merge" : {
          "threads" : 0,
          "queue" : 0,
          "active" : 0,
          "rejected" : 0,
          "largest" : 0,
          "completed" : 0
        },
        "generic" : {
          "threads" : 4,
          "queue" : 0,
          "active" : 0,
          "rejected" : 0,
          "largest" : 4,
          "completed" : 1171
        },
        "get" : {
          "threads" : 16,
          "queue" : 0,
          "active" : 0,
          "rejected" : 0,
          "largest" : 16,
          "completed" : 4369
        },
        "index" : {
          "threads" : 0,
          "queue" : 0,
          "active" : 0,
          "rejected" : 0,
          "largest" : 0,
          "completed" : 0
        },
        "listener" : {
          "threads" : 0,
          "queue" : 0,
          "active" : 0,
          "rejected" : 0,
          "largest" : 0,
          "completed" : 0
        },
        "management" : {
          "threads" : 3,
          "queue" : 0,
          "active" : 1,
          "rejected" : 0,
          "largest" : 3,
          "completed" : 10204
        },
        "refresh" : {
          "threads" : 6,
          "queue" : 0,
          "active" : 0,
          "rejected" : 0,
          "largest" : 6,
          "completed" : 450082
        },
        "search" : {
          "threads" : 25,
          "queue" : 0,
          "active" : 0,
          "rejected" : 0,
          "largest" : 25,
          "completed" : 9052
        },
        "snapshot" : {
          "threads" : 0,
          "queue" : 0,
          "active" : 0,
          "rejected" : 0,
          "largest" : 0,
          "completed" : 0
        },
        "warmer" : {
          "threads" : 3,
          "queue" : 0,
          "active" : 0,
          "rejected" : 0,
          "largest" : 3,
          "completed" : 818
        }
      },
      "fs" : {
        "timestamp" : 1571811340596,
        "total" : {
          "total_in_bytes" : 208112619520,
          "free_in_bytes" : 26904035328,
          "available_in_bytes" : 26887258112,
          "spins" : "true"
        },
        "data" : [
          {
            "path" : "/opt/cmcc/es_data_backup/data/nodes/0",
            "mount" : "/opt/cmcc/es_data_backup (/dev/vda1)",
            "type" : "ext4",
            "total_in_bytes" : 208112619520,
            "free_in_bytes" : 26904035328,
            "available_in_bytes" : 26887258112,
            "spins" : "true"
          }
        ],
        "io_stats" : {
          "devices" : [
            {
              "device_name" : "vda1",
              "operations" : 52502,
              "read_operations" : 3551,
              "write_operations" : 48951,
              "read_kilobytes" : 34620,
              "write_kilobytes" : 1163244
            }
          ],
          "total" : {
            "operations" : 52502,
            "read_operations" : 3551,
            "write_operations" : 48951,
            "read_kilobytes" : 34620,
            "write_kilobytes" : 1163244
          }
        }
      },
      "transport" : {
        "server_open" : 0,
        "rx_count" : 0,
        "rx_size_in_bytes" : 0,
        "tx_count" : 0,
        "tx_size_in_bytes" : 0
      },
      "http" : {
        "current_open" : 62,
        "total_opened" : 204
      },
      "breakers" : {
        "request" : {
          "limit_size_in_bytes" : 1245551001,
          "limit_size" : "1.1gb",
          "estimated_size_in_bytes" : 0,
          "estimated_size" : "0b",
          "overhead" : 1.0,
          "tripped" : 0
        },
        "fielddata" : {
          "limit_size_in_bytes" : 1245551001,
          "limit_size" : "1.1gb",
          "estimated_size_in_bytes" : 101328,
          "estimated_size" : "98.9kb",
          "overhead" : 1.03,
          "tripped" : 0
        },
        "in_flight_requests" : {
          "limit_size_in_bytes" : 2075918336,
          "limit_size" : "1.9gb",
          "estimated_size_in_bytes" : 0,
          "estimated_size" : "0b",
          "overhead" : 1.0,
          "tripped" : 0
        },
        "parent" : {
          "limit_size_in_bytes" : 1453142835,
          "limit_size" : "1.3gb",
          "estimated_size_in_bytes" : 101328,
          "estimated_size" : "98.9kb",
          "overhead" : 1.0,
          "tripped" : 0
        }
      },
      "script" : {
        "compilations" : 4,
        "cache_evictions" : 0
      },
      "discovery" : {
        "cluster_state_queue" : {
          "total" : 0,
          "pending" : 0,
          "committed" : 0
        }
      },
      "ingest" : {
        "total" : {
          "count" : 0,
          "time_in_millis" : 0,
          "current" : 0,
          "failed" : 0
        },
        "pipelines" : { }
      }
    }
  }
}

问题解决

ES 占用内存最大的是java 进程堆外内存对象池mapped占用了14GB左右。

ES中有这样一段话:ES对于索引的访问是通过memory mapped file来访问的,经常访问的segment,只要没有合并,再次访问时可以直接从page cache里读取。 所以索引里被经常访问的热数据片段,等同于内存读取。

1.执行命令:

curl -XPUT ‘http://127.0.0.1:9200/_cluster/settings’ -d ‘{“persistent”:{“indices.breaker.fielddata.limit”:”60%”}}’

curl -XPUT ‘http://127.0.0.1:9200/_cluster/settings’ -d ‘{“persistent”:{“indices.fielddata.cache.size”:”10%”}}’

返回:

{“acknowledged”:true,”persistent”:{“indices”:{“breaker”:{“fielddata”:{“limit”:”40%”}}}},”transient”:{}}

2.执行命令,清空缓存:

curl -XPOST ‘http://127.0.0.1:9200/_cache/clear’

没有很大的效果,内存依然维持在20GB左右。

3.执行命令,解决错误( low disk watermark [85%] exceeded on ):

curl -XPUT ‘http://127.0.0.1:9200/_cluster/settings’ -d ‘{“transient”:{“cluster.routing.allocation.disk.threshold_enabled”:false}}’

配置文件中elasticsearch.yml添加以下配置,保证查询检索fielddata能够有淘汰,而不是无限制使用:

但是依然没有很大效果,内存依然维持在20GB左右。

indices.fielddata.cache.size: 20%

#或者

indices.fielddata.cache.size: 2gb

停掉log_es服务,内存占用情况:

root@cmcc-nfvo-baseimage-1903:~# free -m
              total        used        free      shared  buff/cache   available
Mem:          64429       18899        7202         605       38327       44080
Swap:             0           0           0

启动log_es内存占用情况:

root@cmcc-nfvo-baseimage-1903:~# free -m
              total        used        free      shared  buff/cache   available
Mem:          64429       24669        1453         605       38306       38309
Swap:             0           0           0

es进程本身占用内存并不是很多。并且通过排查添加以上参数可以优化缓存的使用。但是通过es的jvm stats发下mapped还是占用很多,这时候突然想到es进程关闭了,内存的占用stats还是下不来,可能会有问题。

发现和es配置有关,默认会打开nmap,也就是将es索引文件做内存映射。

查询到官网文章:https://www.elastic.co/guide/en/elasticsearch/reference/6.8/index-modules-store.html

设置参数,重启 问题解决,jvm.buffered_pools.mapped降低为0,从jvm监控指标来看问题解决:

index.store.type: hybridfs

es常见可采用的手段:

  1. 删除无用index,关闭无效index,减少倒排索引文件segement的索引内存占用。

  2. 配置增加jvm参数,增加内存回收日志gc.log打印,去掉 -XX:+DisableExplicitGC , jvm参数保证堆外内存的回收。

    -XX:+PrintGC

    -XX:+PrintGCDetails

    -XX:+PrintGCTimeStamps

    -Xloggc:filename

    -XX:+DisableExplicitGC

    有一篇文章专门关于es堆外内存回收的:https://discuss.elastic.co/t/xx-disableexplicitgc-used-in-default-config-why/1138 讨论。

  3. ES更加适用于集群部署,单节点部署写性能和查询性能不高,集群部署可以分担节点的分配,从而内存降下来,但总体使用内存不会有较大变化。

参考资料

https://blog.csdn.net/qqqq0199181/article/details/88634305 Es堆外内存溢出

https://discuss.elastic.co/t/xx-disableexplicitgc-used-in-default-config-why/1138 es堆外内存讨论

https://www.cnblogs.com/churao/p/8509649.html es磁盘空间报错 low disk watermark

https://blog.51cto.com/miaocbin/1860921 es low disk watermark产生的问题及影响

https://www.cnblogs.com/bigben0123/p/11188709.html es内存占用优化

https://elasticsearch.cn/article/32 es中文社区-内存那些事

https://discuss.elastic.co/t/memory-usage-of-the-machine-with-es-is-continuously-increasing/23537/10 es内存持续上升

重点文章:

https://blog.csdn.net/u010824591/article/details/78614505 es监控总结

https://elasticsearch.cn/question/2709 关于es缓存

https://elasticsearch.cn/question/284 es缓存fielddata设置

https://discuss.elastic.co/t/elasticsearch-5-6-very-quickly-increasing-direct-buffer-pools/119561 es direct 内存

es官网给出的解决办法:

https://stackoverflow.com/questions/55630620/elasticsearch-high-mapped-buffer-pools es mapped buffer pool占用过大

https://www.elastic.co/guide/en/elasticsearch/reference/6.8/index-modules-store.html mapped buffer pool占用解决办法