Scale Your Metrics with Elasticsearch

A presentation at HighLoad++ in November 2018 in Moscow, Russia by Philipp Krenn

Slide 1

Slide 1

Scale Your Metrics with Elasticsearch Philipp Krenn @xeraa

Slide 2

Slide 2

Slide 3

Slide 3

$ curl http://localhost:9200 { "name" : "elasticsearch-hot", "cluster_name" : "metrics-cluster", "cluster_uuid" : "06nHPLLgTrmZEpYli6JW5w", "version" : { "number" : "6.5.0", "build_flavor" : "default", "build_type" : "tar", "build_hash" : "c53b7d3", "build_date" : "2018-11-08T21:28:50.577384Z", "build_snapshot" : false, "lucene_version" : "7.5.0", "minimum_wire_compatibility_version" : "5.6.0", "minimum_index_compatibility_version" : "5.0.0" }, "tagline" : "You Know, for Search" }

Slide 4

Slide 4

Slide 5

Slide 5

Slide 6

Slide 6

Slide 7

Slide 7

Slide 8

Slide 8

Slide 9

Slide 9

Slide 10

Slide 10

Slide 11

Slide 11

Slide 12

Slide 12

I'm not going to use a search engine for metrics. — Too often

Slide 13

Slide 13

Developer

Slide 14

Slide 14

Questions: https://sli.do/xeraa Answers: Live or https://twitter.com/xeraa

Slide 15

Slide 15

Agenda Building Blocks Architecture Demo

Slide 16

Slide 16

Building Blocks

Slide 17

Slide 17

Only accept features that scale. — https://github.com/elastic/engineering/blob/master/ development_constitution.md

Slide 18

Slide 18

Horizontal Scaling Shards Replication Writes & Reads

Slide 19

Slide 19

Cluster, Node, Index, Shard

Slide 20

Slide 20

Write Coordinating Node, Hash, Primary, Replica(s)

Slide 21

Slide 21

Get & Aggregate Coordinating Node, Hash, Shard

Slide 22

Slide 22

Search Coordinating Node, Query then Fetch

Slide 23

Slide 23

Append-Only Optimization IDs assigned on coordinating node Fast add instead of the slow update

Slide 24

Slide 24

Lucene Segments index.refresh_interval: 1s 7.0: index.search.idle.after

Slide 25

Slide 25

Storage Compression LZ4 (default) DEFLATE (best_compression)

Slide 26

Slide 26

BKD Trees Points in Lucene

Slide 27

Slide 27

Integer (1D 4 byte point) vs legacy IntField

Slide 28

Slide 28

Half & Scaled Floats

Slide 29

Slide 29

Slide 30

Slide 30

https://github.com/elastic/beats/blob/master/metricbeat/module/system/load/_meta/ fields.yml - name: load type: group description: > CPU load averages. release: ga fields: - name: "1" type: scaled_float scaling_factor: 100 description: > Load average for the last minute. - name: "5" type: scaled_float scaling_factor: 100 description: > Load average for the last 5 minutes. ...

Slide 31

Slide 31

_all Removal https://www.elastic.co/guide/en/elasticsearch/reference/ current/mapping-all-field.html

Slide 32

Slide 32

Doc Values Replaced Fielddata https://www.elastic.co/guide/en/elasticsearch/guide/ current/_deep_dive_on_doc_values.html

Slide 33

Slide 33

Architecture

Slide 34

Slide 34

Time Based Indices index: "metricbeat-%{[beat.version]}-%{+yyyy.MM.dd}"

Slide 35

Slide 35

Rollover Indices Condition when to switch

Slide 36

Slide 36

PUT /metricbeat-000001 { "aliases": { "metricbeat": {} } } # Add >1000 documents to metricbeat-000001 POST /metricbeat/_rollover { "conditions": { } } "max_age": "1d", "max_docs": 1000, "max_size": "5gb"

Slide 37

Slide 37

{ "acknowledged": true, "shards_acknowledged": true, "old_index": "metricbeat-000001", "new_index": "metricbeat-000002", "rolled_over": true, "dry_run": false, "conditions": { "[max_age: 1d]": false, "[max_docs: 1000]": true, "[max_size: 5gb]": false, } }

Slide 38

Slide 38

Rollups

Slide 39

Slide 39

Slide 40

Slide 40

PUT _xpack/rollup/job/metricbeat { "id": "metricbeat", "index_pattern": "metricbeat-*", "rollup_index": "metricbeat_rollup", "cron": "0 * * * * ?", "page_size": 1000, "groups": { "date_histogram": { "interval": "5m", "delay": "5m", "time_zone": "UTC", "field": "@timestamp" },

Slide 41

Slide 41

"terms": { "fields": [ "docker.container.id" ] } }, "metrics": [ { "field": "docker.network.in.bytes", "metrics": [ "sum" ] }, { "field": "docker.network.out.bytes", "metrics": [ "sum" ] } ] }

Slide 42

Slide 42

Nodes! "

Slide 43

Slide 43

$ bin/elasticsearch -Enode.attr.rack=rack1 -Enode.attr.size=hot PUT /metricbeat/_settings { "index.routing.allocation.include.size": "hot" }

Slide 44

Slide 44

Cross Cluster Search Tribe Node

Slide 45

Slide 45

Cross Cluster Replication

Slide 46

Slide 46

Demo

Slide 47

Slide 47

Index Lifecycle Management Currently https://github.com/elastic/curator

Slide 48

Slide 48

Slide 49

Slide 49

Slide 50

Slide 50

Slide 51

Slide 51

Frozen Indices Close + lazy open and release search resources

Slide 52

Slide 52

Conclusion

Slide 53

Slide 53

Agenda Building Blocks Architecture Demo

Slide 54

Slide 54

Benchmarks Fair Reproducible Close to Production

Slide 55

Slide 55

Slide 56

Slide 56

From to !

Slide 57

Slide 57

Questions? Philipp Krenn @xeraa