Make Your Data FABulous

A presentation at Highload fwdays in September 2018 in Kyiv, Ukraine, 02000 by Philipp Krenn

Slide 1

Slide 1

Slide 2

Slide 2

Developer

Slide 3

Slide 3

ViennaDB Papers We Love Vienna

Slide 4

Slide 4

What is the perfect datastore solution?

Slide 5

Slide 5

It depends...

Slide 6

Slide 6

Pick your tradeoffs

Slide 7

Slide 7

Slide 8

Slide 8

CAP Theorem

Slide 9

Slide 9

Slide 10

Slide 10

Consistent "[...] a total order on all operations such that each operation looks as if it were completed at a single instant."

Slide 11

Slide 11

Available "[...] every request received by a nonfailing node in the system must result in a response."

Slide 12

Slide 12

Partition Tolerant "[...] the network will be allowed to lose arbitrarily many messages sent from one node to another."

Slide 13

Slide 13

https://berb.github.io/diploma-thesis/original/061_challenge.html

Slide 14

Slide 14

Misconceptions Partition Tolerance is not a choice in a distributed system

Slide 15

Slide 15

Misconceptions Consistency in ACID is a predicate Consistency in CAP is a linear order

Slide 16

Slide 16

Robinson Crusoe

Slide 17

Slide 17

Slide 18

Slide 18

/dev/null breaks CAP: effect of write are always consistent, it's always available, and all replicas are consistent even during partitions. — https://twitter.com/ashic/status/591511683987701760

Slide 19

Slide 19

FAB Theory

Slide 20

Slide 20

Mark Harwood

Slide 21

Slide 21

Fast Near real-time instead of batch processing

Slide 22

Slide 22

Accurate Exact instead of approximate results

Slide 23

Slide 23

Big Parallelization needed to handle the data

Slide 24

Slide 24

Say Big Data one more time

Slide 25

Slide 25

Fast Big Accurate

Slide 26

Slide 26

Slide 27

Slide 27

Shard Unit of scale

Slide 28

Slide 28

Slide 29

Slide 29

"The evil wizard Mondain had attempted to gain control over Sosaria by trapping its essence in a crystal. When the Stranger at the end of Ultima I defeated Mondain and shattered the crystal, the crystal shards each held a refracted copy of Sosaria. http://www.raphkoster.com/2009/01/08/database-shardingcame-from-uo/

Slide 30

Slide 30

Terms Aggregation

Slide 31

Slide 31

Word Count Word Count Luke 64 Droid 13 R2 31 3PO 13 Alderaan 20 Princess 12 Kenobi 19 Ben 11 Obi-Wan 18 Vader 11 Droids 16 Han 10 Blast 15 Jedi 10 Imperial 15 Sandpeople 10

Slide 32

Slide 32

PUT starwars { "settings": { "number_of_shards": 5, "number_of_replicas": 0 } }

Slide 33

Slide 33

{ "index" : { "_index" { "word" : "Luke" } { "index" : { "_index" { "word" : "Luke" } { "index" : { "_index" { "word" : "Luke" } { "index" : { "_index" { "word" : "Luke" } ... : "starwars", "_type" : "_doc", "routing": "0" } } : "starwars", "_type" : "_doc", "routing": "1" } } : "starwars", "_type" : "_doc", "routing": "2" } } : "starwars", "_type" : "_doc", "routing": "3" } }

Slide 34

Slide 34

Slide 35

Slide 35

GET starwars/_search { "query": { "match": { "word": "Luke" } } }

Slide 36

Slide 36

{ "took": 6, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 64, "max_score": 3.2049506, "hits": [ { "_index": "starwars", "_type": "_doc", "_id": "0vVdy2IBkmPuaFRg659y", "_score": 3.2049506, "_routing": "1", "_source": { "word": "Luke" } }, ...

Slide 37

Slide 37

GET starwars/_search { "aggs": { "most_common": { "terms": { "field": "word.keyword", "size": 1 } } }, "size": 0 }

Slide 38

Slide 38

{ } "took": 13, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 288, "max_score": 0, "hits": [] }, "aggregations": { "most_common": { "doc_count_error_upper_bound": 10, "sum_other_doc_count": 232, "buckets": [ { "key": "Luke", "doc_count": 56 } ] } }

Slide 39

Slide 39

Slide 40

Slide 40

{ "index" : { "_index" { "word" : "Luke" } { "index" : { "_index" { "word" : "Luke" } { "index" : { "_index" { "word" : "Luke" } ... { "index" : { "_index" { "word" : "Luke" } { "index" : { "_index" { "word" : "Luke" } { "index" : { "_index" { "word" : "Luke" } { "index" : { "_index" { "word" : "Luke" } ... : "starwars", "_type" : "_doc", "routing": "0" } } : "starwars", "_type" : "_doc", "routing": "1" } } : "starwars", "_type" : "_doc", "routing": "2" } } : "starwars", "_type" : "_doc", "routing": "8" } } : "starwars", "_type" : "_doc", "routing": "9" } } : "starwars", "_type" : "_doc", "routing": "0" } } : "starwars", "_type" : "_doc", "routing": "0" } }

Slide 41

Slide 41

Routing shard# = hash(_routing) % #primary_shards

Slide 42

Slide 42

GET _cat/shards?index=starwars&v index starwars starwars starwars starwars starwars shard 3 4 2 1 0 prirep p p p p p state docs store ip node STARTED 58 6.4kb 172.19.0.2 Q88C3vO STARTED 26 5.2kb 172.19.0.2 Q88C3vO STARTED 71 6.9kb 172.19.0.2 Q88C3vO STARTED 63 6.6kb 172.19.0.2 Q88C3vO STARTED 70 6.7kb 172.19.0.2 Q88C3vO

Slide 43

Slide 43

(Sub) Results Per Shard shard_size = (size * 1.5 + 10)

Slide 44

Slide 44

How Many? Results per shard Results for aggregation

Slide 45

Slide 45

"doc_count_error_upper_bound": 10 "sum_other_doc_count": 232

Slide 46

Slide 46

GET starwars/_search { "aggs": { "most_common": { "terms": { "field": "word.keyword", "size": 1, "show_term_doc_count_error": true } } }, "size": 0 }

Slide 47

Slide 47

"aggregations": { "most_common": { "doc_count_error_upper_bound": 10, "sum_other_doc_count": 232, "buckets": [ { "key": "Luke", "doc_count": 56, "doc_count_error_upper_bound": 9 } ] } }

Slide 48

Slide 48

GET starwars/_search { "aggs": { "most_common": { "terms": { "field": "word.keyword", "size": 1, "shard_size": 20, "show_term_doc_count_error": true } } }, "size": 0 }

Slide 49

Slide 49

"aggregations": { "most_common": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 224, "buckets": [ { "key": "Luke", "doc_count": 64, "doc_count_error_upper_bound": 0 } ] } }

Slide 50

Slide 50

Cardinality Aggregation

Slide 51

Slide 51

Naive implementation: Hash set Predictable storage and performance?

Slide 52

Slide 52

Slide 53

Slide 53

[...] the maximum number of leading zeros that occur for all hash values, where intuitively hash values with more leading zeros are less likely and indicate a larger cardinality.

Slide 54

Slide 54

If the bit pattern is observed at the beginning of a hash value, then a good estimation of the size of the multiset is [...].

Slide 55

Slide 55

To reduce the large variability that such a single measurement has, a technique known as stochastic averaging is used.

Slide 56

Slide 56

Slide 57

Slide 57

GET starwars/_search { "aggs": { "type_count": { "cardinality": { "field": "word.keyword", "precision_threshold": 10 } } }, "size": 0 }

Slide 58

Slide 58

{ } "took": 3, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 288, "max_score": 0, "hits": [] }, "aggregations": { "type_count": { "value": 17 } }

Slide 59

Slide 59

precision_threshold Default 3,000 Maximum 40,000

Slide 60

Slide 60

Memory precision_threshold x 8 bytes

Slide 61

Slide 61

Slide 62

Slide 62

GET starwars/_search { "aggs": { "type_count": { "cardinality": { "field": "word.keyword", "precision_threshold": 12 } } }, "size": 0 }

Slide 63

Slide 63

{ } "took": 12, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 288, "max_score": 0, "hits": [] }, "aggregations": { "type_count": { "value": 16 } }

Slide 64

Slide 64

Precompute Hashes? Client or mapper-murmur3 plugin

Slide 65

Slide 65

It Depends ! large / high-cardinality fields ! low cardinality / numeric fields

Slide 66

Slide 66

Improvement: LogLog-β https://github.com/elastic/elasticsearch/ pull/22323

Slide 67

Slide 67

Improvement? "New cardinality estimation algorithms for HyperLogLog sketches" https://arxiv.org/abs/1702.01284

Slide 68

Slide 68

Inverse Document Frequency

Slide 69

Slide 69

GET starwars/_search { "query": { "match": { "word": "Luke" } } }

Slide 70

Slide 70

... { "_index": "starwars", "_type": "_doc", "_id": "0vVdy2IBkmPuaFRg659y", "_score": 3.2049506, "_routing": "1", "_source": { "word": "Luke" } }, { "_index": "starwars", "_type": "_doc", "_id": "2PVdy2IBkmPuaFRg659y", "_score": 3.2049506, "_routing": "7", "_source": { "word": "Luke" } }, { "_index": "starwars", "_type": "_doc", "_id": "0_Vdy2IBkmPuaFRg659y", "_score": 3.1994843, "_routing": "2", "_source": { "word": "Luke" } }, ...

Slide 71

Slide 71

Slide 72

Slide 72

Term Frequency / Inverse Document Frequency (TF/IDF)

Slide 73

Slide 73

BM25 Default in Elasticsearch 5.0

Slide 74

Slide 74

Term Frequency

Slide 75

Slide 75

Slide 76

Slide 76

Inverse Document Frequency

Slide 77

Slide 77

Slide 78

Slide 78

Field-Length Norm

Slide 79

Slide 79

Query Then Fetch

Slide 80

Slide 80

Query

Slide 81

Slide 81

Fetch

Slide 82

Slide 82

DFS Query Then Fetch Distributed Frequency Search

Slide 83

Slide 83

GET starwars/_search?search_type=dfs_query_then_fetch { "query": { "match": { "word": "Luke" } } }

Slide 84

Slide 84

{ "_index": "starwars", "_type": "_doc", "_id": "0fVdy2IBkmPuaFRg659y", "_score": 1.5367417, "_routing": "0", "_source": { "word": "Luke" } }, { "_index": "starwars", "_type": "_doc", "_id": "2_Vdy2IBkmPuaFRg659y", "_score": 1.5367417, "_routing": "0", "_source": { "word": "Luke" } }, { "_index": "starwars", "_type": "_doc", "_id": "3PVdy2IBkmPuaFRg659y", "_score": 1.5367417, "_routing": "0", "_source": { "word": "Luke" } }, ...

Slide 85

Slide 85

Slide 86

Slide 86

Slide 87

Slide 87

Don’t use dfs_query_then_fetch in production. It really isn’t required. — https://www.elastic.co/guide/en/elasticsearch/ guide/current/relevance-is-broken.html

Slide 88

Slide 88

Single Shard Default in 7.0

Slide 89

Slide 89

Simon Says Use a single shard until it blows up

Slide 90

Slide 90

PUT starwars/_settings { "settings": { "index.blocks.write": true } }

Slide 91

Slide 91

POST starwars/_shrink/starletwars?copy_settings=true { "settings": { "number_of_shards": 1, "number_of_replicas": 0 } }

Slide 92

Slide 92

GET starletwars/_search { "query": { "match": { "word": "Luke" } }, "_source": false }

Slide 93

Slide 93

{ "_index": "starletwars", "_type": "_doc", "_id": "0fVdy2IBkmPuaFRg659y", "_score": 1.5367417, "_routing": "0" }, { "_index": "starletwars", "_type": "_doc", "_id": "2_Vdy2IBkmPuaFRg659y", "_score": 1.5367417, "_routing": "0" }, { }, "_index": "starletwars", "_type": "_doc", "_id": "3PVdy2IBkmPuaFRg659y", "_score": 1.5367417, "_routing": "0"

Slide 94

Slide 94

GET starletwars/_search { "aggs": { "most_common": { "terms": { "field": "word.keyword", "size": 1 } } }, "size": 0 }

Slide 95

Slide 95

{ } "took": 1, "timed_out": false, "_shards": { "total": 1, "successful": 1, "skipped": 0, "failed": 0 }, "hits": { "total": 288, "max_score": 0, "hits": [] }, "aggregations": { "most_common": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 224, "buckets": [ { "key": "Luke", "doc_count": 64 } ] } }

Slide 96

Slide 96

Change for the Cardinality Count?

Slide 97

Slide 97

Slide 98

Slide 98

Conclusion

Slide 99

Slide 99

Tradeoffs...

Slide 100

Slide 100

Consistent Available Partition Tolerant Fast Accurate Big

Slide 101

Slide 101

Questions? Philipp Krenn PS: Stickers @xeraa