ES相关诊断
文章通过es提供的jvm/stats内存占用统计发现内存占比最大的部分,针对这部分不断进行排查和尝试,最终进行优化和总结。
ES机器查看内存:
我们查看了机器上的es设置内存:
http://127.0.0.1:9200/_cat/nodes?h=heap.max
为1.9Gb。
http://127.0.0.1:9200/_cat/shards?pretty 查看分片
http://127.0.0.1:9200/_cluster/health?pretty 健康状态为yellow 应为单点导致
查看每个index占用内存情况:
http://127.0.0.1:9200/_cat/indices?v&h=i,tm,sm&s=tm:desc
内存占用情况:
i tm sm
nfvofcaps 11.9mb 11.9mb
nfvocatalog 1.9mb 1.9mb
nfvosysmgnt 1.9mb 1.9mb
nfvonslcm 1.1mb 1.1mb
nfvowindriver 1mb 1mb
nfvoemanagerbe 963.1kb 918.6kb
nfvoemspm 580.3kb 453.6kb
nfvohosts 417.4kb 80.6kb
nfvovms 293.7kb 267.8kb
nfvomultivimbroker 150.2kb 150.2kb
nfvoemanagerbesafe 146.1kb 146.1kb
nfvopolicy 88.7kb 88.7kb
nfvodsmspoolsvolumes 62.2kb 62.2kb
netelement 59.7kb 59.7kb
nfvosysmgntkpi 59.3kb 59.3kb
nfvodsmspools 52.1kb 52.1kb
nfvoserversethports 51.4kb 51.4kb
nfvomysql 49kb 49kb
nfvoservers 47.9kb 47.1kb
nfvodsms 44.8kb 44kb
mysqlsysmanager 11.7kb 11.7kb
.kibana 8.5kb 8.5kb
nfvorouters 0b 0b
nfvoddos 0b 0b
jbossmq-httpil 0b 0b
nfvodiskarraysdisk 0b 0b
nfvoswitches 0b 0b
nfvowafs 0b 0b
nfvodiskarraysluns 0b 0b
nfvois 0b 0b
nfvodiskarraysstoragepools 0b 0b
nfvodiskarrays 0b 0b
invoker 0b 0b
nfvofirewalls 0b 0b
nfvodiskarrayscontrollers 0b 0b
nfvoroutersports 0b 0b
nfvodiskarraysethernetports 0b 0b
nfvolbsports 0b 0b
nfvoswitchesports 0b 0b
nfvolbs 0b 0b
nfvouser 0b 0b
es节点报错信息:
2019-10-23T01:43:09,074][INFO ][o.e.c.r.a.DiskThresholdMonitor] [node-1] low disk watermark [85%] exceeded on [b2udS94bS8mY971eqSholA][node-1][/opt/cmcc/es_data_backup/data/nodes/0] free: 25.1gb[12.9%], replicas will not be assigned to this node
这个是磁盘空间超过es的高水位配置阈值,清理es磁盘无用文件,节省es磁盘空间。
es jvm参数配置:
-Xms2g -Xmx2g
-XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly
-XX:+DisableExplicitGC 堆外不让回收
查看es进程占用物理内存情况
Name: java
State: S (sleeping)
Tgid: 1629
Ngid: 0
Pid: 1629
PPid: 1531
TracerPid: 0
Uid: 1000 1000 1000 1000
Gid: 1000 1000 1000 1000
FDSize: 1024
Groups: 1000
NStgid: 1629 35
NSpid: 1629 35
NSpgid: 1628 34
NSsid: 1628 34
VmPeak: 30758876 kB 进程所使用的虚拟内存的峰值
VmSize: 30753964 kB 进程当前使用的虚拟内存的大小
VmLck: 0 kB
VmPin: 0 kB
VmHWM: 5195512 kB 进程所使用的物理内存的峰值 5GB
VmRSS: 5193688 kB 进程当前使用的物理内存的大小
VmData: 14763200 kB 进程占用的数据段大小
VmStk: 132 kB
VmExe: 4 kB
VmLib: 18308 kB
VmPTE: 13324 kB
VmPMD: 132 kB
VmSwap: 0 kB
HugetlbPages: 0 kB
Threads: 157
SigQ: 0/257554
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000000000000
SigIgn: 0000000000000003
SigCgt: 2000000181005ccc
CapInh: 00000000a80425fb
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: 00000000a80425fb
CapAmb: 0000000000000000
Seccomp: 2
Speculation_Store_Bypass: vulnerable
Cpus_allowed: ffff
Cpus_allowed_list: 0-15
Mems_allowed: 00000000,00000001
Mems_allowed_list: 0
voluntary_ctxt_switches: 17
nonvoluntary_ctxt_switches: 4
很明显es java进程所占用的虚拟内存较多。
查看es jvm内存占用情况
jvm内存监控占用
http://127.0.0.1:9200/_nodes/stats/jvm?pretty jvm统计信息
{
"_nodes" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"cluster_name" : "es-cluster",
"nodes" : {
"b2udS94bS8mY971eqSholA" : {
"timestamp" : 1571801549425,
"name" : "node-1",
"transport_address" : "127.0.0.1:9300",
"host" : "127.0.0.1",
"ip" : "127.0.0.1:9300",
"roles" : [
"master",
"data",
"ingest"
],
"jvm" : {
"timestamp" : 1571801549425,
"uptime_in_millis" : 1206006,
"mem" : {
"heap_used_in_bytes" : 530281400,// 已用的堆内存 ,大约500MB,不算多
"heap_used_percent" : 25, //使用的堆内存比例只占用25%
"heap_committed_in_bytes" : 2075918336,
"heap_max_in_bytes" : 2075918336,
"non_heap_used_in_bytes" : 101650240,
"non_heap_committed_in_bytes" : 107356160,
"pools" : {
"young" : { //新生代,young
"used_in_bytes" : 26739464,
"max_in_bytes" : 572653568,
"peak_used_in_bytes" : 572653568,
"peak_max_in_bytes" : 572653568
},
"survivor" : { //新生代survivor区
"used_in_bytes" : 7893160,
"max_in_bytes" : 71565312,
"peak_used_in_bytes" : 71565312,
"peak_max_in_bytes" : 71565312
},
"old" : { //老年代
"used_in_bytes" : 495648776,
"max_in_bytes" : 1431699456,
"peak_used_in_bytes" : 495648776,
"peak_max_in_bytes" : 1431699456
}
}
},
"threads" : {
"count" : 120,
"peak_count" : 148
},
"gc" : {
"collectors" : {
"young" : {
"collection_count" : 9,
"collection_time_in_millis" : 269
},
"old" : {
"collection_count" : 1,
"collection_time_in_millis" : 118
}
}
},
"buffer_pools" : {
"direct" : {
"count" : 119,
"used_in_bytes" : 337886916, //322Mb
"total_capacity_in_bytes" : 337886915
},
"mapped" : {
"count" : 628,
"used_in_bytes" : 16095062834, //14GB ,es大部分内存占用在这里了,问题已经找到了,
"total_capacity_in_bytes" : 16095062834
}
},
"classes" : {
"current_loaded_count" : 11014,
"total_loaded_count" : 11014,
"total_unloaded_count" : 0
}
}
}
}
}
es node节点监控统计
http://127.0.0.1:9200/_nodes/stats?pretty 节点信息统计
{
"_nodes" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"cluster_name" : "es-cluster",
"nodes" : {
"b2udS94bS8mY971eqSholA" : {
"timestamp" : 1571811340573,
"name" : "node-1",
"transport_address" : "127.0.0.1:9300",
"host" : "127.0.0.1",
"ip" : "127.0.0.1:9300",
"roles" : [
"master",
"data",
"ingest"
],
"indices" : {
"docs" : {
"count" : 21176041, //索引文档数
"deleted" : 0
},
"store" : {
"size_in_bytes" : 16179548377, //存储大小15GB,磁盘大小
"throttle_time_in_millis" : 0
},
"indexing" : {
"index_total" : 77751, //索引总数
"index_time_in_millis" : 25491,
"index_current" : 0,
"index_failed" : 0,
"delete_total" : 0,
"delete_time_in_millis" : 0,
"delete_current" : 0,
"noop_update_total" : 0,
"is_throttled" : false,
"throttle_time_in_millis" : 0
},
"get" : {
"total" : 4369,
"time_in_millis" : 318,
"exists_total" : 4369,
"exists_time_in_millis" : 318,
"missing_total" : 0,
"missing_time_in_millis" : 0,
"current" : 0
},
"search" : {
"open_contexts" : 0,
"query_total" : 4532,
"query_time_in_millis" : 11888,
"query_current" : 0,
"fetch_total" : 4516,
"fetch_time_in_millis" : 321,
"fetch_current" : 0,
"scroll_total" : 0,
"scroll_time_in_millis" : 0,
"scroll_current" : 0,
"suggest_total" : 0,
"suggest_time_in_millis" : 0,
"suggest_current" : 0
},
"merges" : {
"current" : 0,
"current_docs" : 0,
"current_size_in_bytes" : 0,
"total" : 69,
"total_time_in_millis" : 9719,
"total_docs" : 334463,
"total_size_in_bytes" : 217335811,
"total_stopped_time_in_millis" : 0,
"total_throttled_time_in_millis" : 0,
"total_auto_throttle_in_bytes" : 1363148800
},
"refresh" : {
"total" : 734,
"total_time_in_millis" : 7487,
"listeners" : 0
},
"flush" : {
"total" : 96,
"total_time_in_millis" : 6427
},
"warmer" : {
"current" : 0,
"total" : 838,
"total_time_in_millis" : 288
},
"query_cache" : {
"memory_size_in_bytes" : 0,
"total_count" : 0,
"hit_count" : 0,
"miss_count" : 0,
"cache_size" : 0,
"cache_count" : 0,
"evictions" : 0
},
"fielddata" : {
"memory_size_in_bytes" : 101328, //内存大小98kb
"evictions" : 0
},
"completion" : {
"size_in_bytes" : 0
},
"segments" : {
"count" : 277,
"memory_in_bytes" : 21673476, //segement内存大小 20Mb
"terms_memory_in_bytes" : 16329238, //15MB
"stored_fields_memory_in_bytes" : 4734696,
"term_vectors_memory_in_bytes" : 0,
"norms_memory_in_bytes" : 245888,
"points_memory_in_bytes" : 159766,
"doc_values_memory_in_bytes" : 203888,
"index_writer_memory_in_bytes" : 0,
"version_map_memory_in_bytes" : 0,
"fixed_bit_set_memory_in_bytes" : 0,
"max_unsafe_auto_id_timestamp" : 1571800339228,
"file_sizes" : { }
},
"translog" : {
"operations" : 3904,
"size_in_bytes" : 3197456
},
"request_cache" : {
"memory_size_in_bytes" : 0,
"evictions" : 0,
"hit_count" : 0,
"miss_count" : 0
},
"recovery" : {
"current_as_source" : 0,
"current_as_target" : 0,
"throttle_time_in_millis" : 0
}
},
"os" : {
"timestamp" : 1571811340594,
"cpu" : {
"percent" : 1,
"load_average" : {
"1m" : 0.08,
"5m" : 0.2,
"15m" : 0.25
}
},
"mem" : {
"total_in_bytes" : 67559206912, //整个机器节点总内存大小
"free_in_bytes" : 1820467200,
"used_in_bytes" : 65738739712, //整个机器节点 内存使用61GB大小
"free_percent" : 3,
"used_percent" : 97
},
"swap" : {
"total_in_bytes" : 0,
"free_in_bytes" : 0,
"used_in_bytes" : 0
}
},
"process" : {
"timestamp" : 1571811340595,
"open_file_descriptors" : 635,
"max_file_descriptors" : 1048576,
"cpu" : {
"percent" : 0,
"total_in_millis" : 324400
},
"mem" : {
"total_virtual_in_bytes" : 28956041216
}
},
"jvm" : {
"timestamp" : 1571811340595,
"uptime_in_millis" : 10997176,
"mem" : {
"heap_used_in_bytes" : 660027368, //629Mb左右
"heap_used_percent" : 31,
"heap_committed_in_bytes" : 2075918336,
"heap_max_in_bytes" : 2075918336,
"non_heap_used_in_bytes" : 118930880,
"non_heap_committed_in_bytes" : 125509632,
"pools" : {
"young" : {
"used_in_bytes" : 24099448,
"max_in_bytes" : 572653568,
"peak_used_in_bytes" : 572653568,
"peak_max_in_bytes" : 572653568
},
"survivor" : {
"used_in_bytes" : 16533280,
"max_in_bytes" : 71565312,
"peak_used_in_bytes" : 71565312,
"peak_max_in_bytes" : 71565312
},
"old" : {
"used_in_bytes" : 619394640,
"max_in_bytes" : 1431699456,
"peak_used_in_bytes" : 619394640,
"peak_max_in_bytes" : 1431699456
}
}
},
"threads" : {
"count" : 122,
"peak_count" : 148
},
"gc" : {
"collectors" : {
"young" : {
"collection_count" : 39,
"collection_time_in_millis" : 751
},
"old" : {
"collection_count" : 1,
"collection_time_in_millis" : 118
}
}
},
"buffer_pools" : {
"direct" : {
"count" : 122,
"used_in_bytes" : 338030316, //直接内存 322MB
"total_capacity_in_bytes" : 338030315
},
"mapped" : {
"count" : 615,
"used_in_bytes" : 16152649050, //14GB
"total_capacity_in_bytes" : 16152649050
}
},
"classes" : {
"current_loaded_count" : 11801,
"total_loaded_count" : 11801,
"total_unloaded_count" : 0
}
},
"thread_pool" : {
"bulk" : {
"threads" : 16,
"queue" : 0,
"active" : 0,
"rejected" : 0,
"largest" : 16,
"completed" : 670
},
"fetch_shard_started" : {
"threads" : 1,
"queue" : 0,
"active" : 0,
"rejected" : 0,
"largest" : 32,
"completed" : 65
},
"fetch_shard_store" : {
"threads" : 0,
"queue" : 0,
"active" : 0,
"rejected" : 0,
"largest" : 0,
"completed" : 0
},
"flush" : {
"threads" : 2,
"queue" : 0,
"active" : 0,
"rejected" : 0,
"largest" : 5,
"completed" : 192
},
"force_merge" : {
"threads" : 0,
"queue" : 0,
"active" : 0,
"rejected" : 0,
"largest" : 0,
"completed" : 0
},
"generic" : {
"threads" : 4,
"queue" : 0,
"active" : 0,
"rejected" : 0,
"largest" : 4,
"completed" : 1171
},
"get" : {
"threads" : 16,
"queue" : 0,
"active" : 0,
"rejected" : 0,
"largest" : 16,
"completed" : 4369
},
"index" : {
"threads" : 0,
"queue" : 0,
"active" : 0,
"rejected" : 0,
"largest" : 0,
"completed" : 0
},
"listener" : {
"threads" : 0,
"queue" : 0,
"active" : 0,
"rejected" : 0,
"largest" : 0,
"completed" : 0
},
"management" : {
"threads" : 3,
"queue" : 0,
"active" : 1,
"rejected" : 0,
"largest" : 3,
"completed" : 10204
},
"refresh" : {
"threads" : 6,
"queue" : 0,
"active" : 0,
"rejected" : 0,
"largest" : 6,
"completed" : 450082
},
"search" : {
"threads" : 25,
"queue" : 0,
"active" : 0,
"rejected" : 0,
"largest" : 25,
"completed" : 9052
},
"snapshot" : {
"threads" : 0,
"queue" : 0,
"active" : 0,
"rejected" : 0,
"largest" : 0,
"completed" : 0
},
"warmer" : {
"threads" : 3,
"queue" : 0,
"active" : 0,
"rejected" : 0,
"largest" : 3,
"completed" : 818
}
},
"fs" : {
"timestamp" : 1571811340596,
"total" : {
"total_in_bytes" : 208112619520,
"free_in_bytes" : 26904035328,
"available_in_bytes" : 26887258112,
"spins" : "true"
},
"data" : [
{
"path" : "/opt/cmcc/es_data_backup/data/nodes/0",
"mount" : "/opt/cmcc/es_data_backup (/dev/vda1)",
"type" : "ext4",
"total_in_bytes" : 208112619520,
"free_in_bytes" : 26904035328,
"available_in_bytes" : 26887258112,
"spins" : "true"
}
],
"io_stats" : {
"devices" : [
{
"device_name" : "vda1",
"operations" : 52502,
"read_operations" : 3551,
"write_operations" : 48951,
"read_kilobytes" : 34620,
"write_kilobytes" : 1163244
}
],
"total" : {
"operations" : 52502,
"read_operations" : 3551,
"write_operations" : 48951,
"read_kilobytes" : 34620,
"write_kilobytes" : 1163244
}
}
},
"transport" : {
"server_open" : 0,
"rx_count" : 0,
"rx_size_in_bytes" : 0,
"tx_count" : 0,
"tx_size_in_bytes" : 0
},
"http" : {
"current_open" : 62,
"total_opened" : 204
},
"breakers" : {
"request" : {
"limit_size_in_bytes" : 1245551001,
"limit_size" : "1.1gb",
"estimated_size_in_bytes" : 0,
"estimated_size" : "0b",
"overhead" : 1.0,
"tripped" : 0
},
"fielddata" : {
"limit_size_in_bytes" : 1245551001,
"limit_size" : "1.1gb",
"estimated_size_in_bytes" : 101328,
"estimated_size" : "98.9kb",
"overhead" : 1.03,
"tripped" : 0
},
"in_flight_requests" : {
"limit_size_in_bytes" : 2075918336,
"limit_size" : "1.9gb",
"estimated_size_in_bytes" : 0,
"estimated_size" : "0b",
"overhead" : 1.0,
"tripped" : 0
},
"parent" : {
"limit_size_in_bytes" : 1453142835,
"limit_size" : "1.3gb",
"estimated_size_in_bytes" : 101328,
"estimated_size" : "98.9kb",
"overhead" : 1.0,
"tripped" : 0
}
},
"script" : {
"compilations" : 4,
"cache_evictions" : 0
},
"discovery" : {
"cluster_state_queue" : {
"total" : 0,
"pending" : 0,
"committed" : 0
}
},
"ingest" : {
"total" : {
"count" : 0,
"time_in_millis" : 0,
"current" : 0,
"failed" : 0
},
"pipelines" : { }
}
}
}
}
问题解决
ES 占用内存最大的是java 进程堆外内存对象池mapped占用了14GB左右。
ES中有这样一段话:ES对于索引的访问是通过memory mapped file来访问的,经常访问的segment,只要没有合并,再次访问时可以直接从page cache里读取。 所以索引里被经常访问的热数据片段,等同于内存读取。
1.执行命令:
curl -XPUT ‘http://127.0.0.1:9200/_cluster/settings’ -d ‘{“persistent”:{“indices.breaker.fielddata.limit”:”60%”}}’
curl -XPUT ‘http://127.0.0.1:9200/_cluster/settings’ -d ‘{“persistent”:{“indices.fielddata.cache.size”:”10%”}}’
返回:
{“acknowledged”:true,”persistent”:{“indices”:{“breaker”:{“fielddata”:{“limit”:”40%”}}}},”transient”:{}}
2.执行命令,清空缓存:
curl -XPOST ‘http://127.0.0.1:9200/_cache/clear’
没有很大的效果,内存依然维持在20GB左右。
3.执行命令,解决错误( low disk watermark [85%] exceeded on ):
curl -XPUT ‘http://127.0.0.1:9200/_cluster/settings’ -d ‘{“transient”:{“cluster.routing.allocation.disk.threshold_enabled”:false}}’
配置文件中elasticsearch.yml添加以下配置,保证查询检索fielddata能够有淘汰,而不是无限制使用:
但是依然没有很大效果,内存依然维持在20GB左右。
indices.fielddata.cache.size: 20%
#或者
indices.fielddata.cache.size: 2gb
停掉log_es服务,内存占用情况:
root@cmcc-nfvo-baseimage-1903:~# free -m
total used free shared buff/cache available
Mem: 64429 18899 7202 605 38327 44080
Swap: 0 0 0
启动log_es内存占用情况:
root@cmcc-nfvo-baseimage-1903:~# free -m
total used free shared buff/cache available
Mem: 64429 24669 1453 605 38306 38309
Swap: 0 0 0
es进程本身占用内存并不是很多。并且通过排查添加以上参数可以优化缓存的使用。但是通过es的jvm stats发下mapped还是占用很多,这时候突然想到es进程关闭了,内存的占用stats还是下不来,可能会有问题。
发现和es配置有关,默认会打开nmap,也就是将es索引文件做内存映射。
查询到官网文章:https://www.elastic.co/guide/en/elasticsearch/reference/6.8/index-modules-store.html
设置参数,重启 问题解决,jvm.buffered_pools.mapped降低为0,从jvm监控指标来看问题解决:
index.store.type: hybridfs
es常见可采用的手段:
-
删除无用index,关闭无效index,减少倒排索引文件segement的索引内存占用。
-
配置增加jvm参数,增加内存回收日志gc.log打印,去掉 -XX:+DisableExplicitGC , jvm参数保证堆外内存的回收。
-XX:+PrintGC
-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-Xloggc:filename
-XX:+DisableExplicitGC
有一篇文章专门关于es堆外内存回收的:https://discuss.elastic.co/t/xx-disableexplicitgc-used-in-default-config-why/1138 讨论。
-
ES更加适用于集群部署,单节点部署写性能和查询性能不高,集群部署可以分担节点的分配,从而内存降下来,但总体使用内存不会有较大变化。
参考资料
https://blog.csdn.net/qqqq0199181/article/details/88634305 Es堆外内存溢出
https://discuss.elastic.co/t/xx-disableexplicitgc-used-in-default-config-why/1138 es堆外内存讨论
https://www.cnblogs.com/churao/p/8509649.html es磁盘空间报错 low disk watermark
https://blog.51cto.com/miaocbin/1860921 es low disk watermark产生的问题及影响
https://www.cnblogs.com/bigben0123/p/11188709.html es内存占用优化
https://elasticsearch.cn/article/32 es中文社区-内存那些事
https://discuss.elastic.co/t/memory-usage-of-the-machine-with-es-is-continuously-increasing/23537/10 es内存持续上升
重点文章:
https://blog.csdn.net/u010824591/article/details/78614505 es监控总结
https://elasticsearch.cn/question/2709 关于es缓存
https://elasticsearch.cn/question/284 es缓存fielddata设置
https://discuss.elastic.co/t/elasticsearch-5-6-very-quickly-increasing-direct-buffer-pools/119561 es direct 内存
es官网给出的解决办法:
https://stackoverflow.com/questions/55630620/elasticsearch-high-mapped-buffer-pools es mapped buffer pool占用过大
https://www.elastic.co/guide/en/elasticsearch/reference/6.8/index-modules-store.html mapped buffer pool占用解决办法