/dev/null breaks CAP: effect of
write are always consistent,
it's always available, and all
replicas are consistent even
during partitions.
—
https://twitter.com/ashic/status/591511683987701760
Slide 16
FAB
Theory
Slide 17
Fast
Near real-time
instead of batch processing
Slide 18
Accurate
Exact
instead of approximate results
Slide 19
Big
Parallel computing
tools are needed to
handle the data
Slide 20
The 42 V's of Big Data and
Data Science
https://www.elderresearch.com/company/blog/42-v-of-big-data
Slide 21
Slide 22
Slide 23
Fast
✅
Big
✅
Accurate
❔
Slide 24
Slide 25
Shard
Unit of scale
Slide 26
Slide 27
The evil wizard Mondain
had attempted to gain
control over Sosaria by
trapping its essence in a
crystal. When the Stranger
at the end...
Slide 28
...of Ultima I defeated
Mondain and shattered the
crystal, the crystal shards
each held a refracted copy of
Sosaria.
—
http://www.raphkoster.com/2009/01/08/database-
sharding-came-from-uo/
Term Frequency /
Inverse Document
Frequency (TF/IDF)
Slide 57
BM25
Default in Elasticsearch 5.0
Slide 58
Term Frequency
Slide 59
Slide 60
Inverse Document
Frequency
Slide 61
Slide 62
Field-Length Norm
Slide 63
Query Then Fetch
Slide 64
Query
Slide 65
Fetch
Slide 66
DFS Query Then Fetch
Distributed Frequency Search
Slide 67
GET starwars/_search?search_type=dfs_query_then_fetch
{
"query"
: {
"match"
: {
"word"
:
"Luke"
}
}
}
Slide 68
{
"_index"
:
"starwars"
,
"_type"
:
"_doc"
,
"_id"
:
"0fVdy2IBkmPuaFRg659y"
,
"_score"
:
1.5367417
,
"_routing"
:
"0"
,
"_source"
: {
"word"
:
"Luke"
}
},
{
"_index"
:
"starwars"
,
"_type"
:
"_doc"
,
"_id"
:
"2_Vdy2IBkmPuaFRg659y"
,
"_score"
:
1.5367417
,
"_routing"
:
"0"
,
"_source"
: {
"word"
:
"Luke"
}
},
{
"_index"
:
"starwars"
,
"_type"
:
"_doc"
,
"_id"
:
"3PVdy2IBkmPuaFRg659y"
,
"_score"
:
1.5367417
,
"_routing"
:
"0"
,
"_source"
: {
"word"
:
"Luke"
}
},
...
Slide 69
Slide 70
Slide 71
Often Non or Minor Issue
Lots of documents
Even distribution
Slide 72
Don’t use
dfs_query_then_fetch
in
production. It really isn’t
required.
—
https://www.elastic.co/guide/en/elasticsearch/
guide/current/relevance-is-broken.html
Slide 73
Single Shard
Default in 7.0
Slide 74
Simon Says
Use a single shard until it blows up
Slide 75
PUT starwars/_settings
{
"settings"
: {
"index.blocks.write"
:
true
}
}
Slide 76
POST starwars/_shrink/starwars_single
{
"settings"
: {
"number_of_shards"
:
1
,
"number_of_replicas"
:
0
}
}
Slide 77
GET starwars_single/_search
{
"query"
: {
"match"
: {
"word"
:
"Luke"
}
},
"_source"
:
false
}
Slide 78
{
"_index"
:
"starwars_single"
,
"_type"
:
"_doc"
,
"_id"
:
"0fVdy2IBkmPuaFRg659y"
,
"_score"
:
1.5367417
,
"_routing"
:
"0"
},
{
"_index"
:
"starwars_single"
,
"_type"
:
"_doc"
,
"_id"
:
"2_Vdy2IBkmPuaFRg659y"
,
"_score"
:
1.5367417
,
"_routing"
:
"0"
},
{
"_index"
:
"starwars_single"
,
"_type"
:
"_doc"
,
"_id"
:
"3PVdy2IBkmPuaFRg659y"
,
"_score"
:
1.5367417
,
"_routing"
:
"0"
},
Slide 79
GET starwars_single/_search
{
"aggs"
: {
"most_common"
: {
"terms"
: {
"field"
:
"word.keyword"
,
"size"
:
1
}
}
},
"size"
:
0
}
Slide 80
{
"took"
:
1
,
"timed_out"
:
false
,
"_shards"
: {
"total"
:
1
,
"successful"
:
1
,
"skipped"
:
0
,
"failed"
:
0
},
"hits"
: {
"total"
:
288
,
"max_score"
:
0
,
"hits"
: []
},
"aggregations"
: {
"most_common"
: {
"doc_count_error_upper_bound"
:
0
,
"sum_other_doc_count"
:
224
,
"buckets"
: [
{
"key"
:
"Luke"
,
"doc_count"
:
64
}
]
}
}
}
Slide 81
Slide 82
Conclusion
Slide 83
Tradeoffs...
Slide 84
C
onsistent
4
A
vailable
4
P
artition Tolerant
F
ast
4
A
ccurate
4
B
ig
Slide 85
Questions?
Philipp Krenn
44444
@xeraa
PS: Stickers