Elastic Full-Text Search in Liferay

A presentation at Liferay Digital Solutions Forum in October 2018 in London, UK by Philipp Krenn

Slide 1

Slide 1

Elastic Full-Text Search in Liferay Philipp Krenn @xeraa

Slide 2

Slide 2

Developer

Slide 3

Slide 3

Slide 4

Slide 4

Slide 5

Slide 5

Store

Slide 6

Slide 6

Apache Lucene Elasticsearch

Slide 7

Slide 7

Slide 8

Slide 8

Example These are <em>not</em> the droids you are looking for.

Slide 9

Slide 9

html_strip Char Filter These are not the droids you are looking for.

Slide 10

Slide 10

standard Tokenizer These are not the droids you looking for are

Slide 11

Slide 11

lowercase Token Filter these are not the droids looking for you are

Slide 12

Slide 12

stop Token Filter droids you looking

Slide 13

Slide 13

snowball Token Filter droid you look

Slide 14

Slide 14

Analyze

Slide 15

Slide 15

GET /_analyze { "analyzer": "english", "text": "These are not the droids you are looking for." }

Slide 16

Slide 16

{ } "tokens": [ { "token": "droid", "start_offset": 18, "end_offset": 24, "type": "<ALPHANUM>", "position": 4 }, { "token": "you", "start_offset": 25, "end_offset": 28, "type": "<ALPHANUM>", "position": 5 }, ... ]

Slide 17

Slide 17

GET /_analyze { "char_filter": [ "html_strip" ], "tokenizer": "standard", "filter": [ "lowercase", "stop", "snowball" ], "text": "These are <em>not</em> the droids you are looking for." }

Slide 18

Slide 18

{ } "tokens": [ { "token": "droid", "start_offset": 27, "end_offset": 33, "type": "<ALPHANUM>", "position": 4 }, { "token": "you", "start_offset": 34, "end_offset": 37, "type": "<ALPHANUM>", "position": 5 }, ... ]

Slide 19

Slide 19

Stop Words a an and are as at be but by for if in into is it no not of on or such that the their then there these they this to was will with https://github.com/apache/lucene-solr/blob/master/lucene/ core/src/java/org/apache/lucene/analysis/standard/ StandardAnalyzer.java#L44-L50

Slide 20

Slide 20

Always Use Stop Words?

Slide 21

Slide 21

To be, or not to be.

Slide 22

Slide 22

Languages Arabic, Armenian, Basque, Brazilian, Bulgarian, Catalan, CJK, Czech, Danish, Dutch, English, Finnish, French, Galician, German, Greek, Hindi, Hungarian, Indonesian, Irish, Italian, Latvian, Lithuanian, Norwegian, Persian, Portuguese, Romanian, Russian, Sorani, Spanish, Swedish, Turkish, Thai

Slide 23

Slide 23

More Language Plugins Core: ICU (Asian languages), Kuromoji (advanced Japanese), Phonetic, SmartCN, Stempel (better Polish stemming), Ukrainian (stemming) Community: Hebrew, Vietnamese, Network Address Analysis, String2Integer,...

Slide 24

Slide 24

Language Rules English: Philipp's → philipp French: l'église → eglis German: äußerst → ausserst

Slide 25

Slide 25

Another Example Obi-Wan never told you what happened to your father.

Slide 26

Slide 26

Another Example obi wan never told you what happen your father

Slide 27

Slide 27

Another Example <b>No</b>. I am your father.

Slide 28

Slide 28

Another Example i am your father

Slide 29

Slide 29

Inverted Index am droid father happen i look never obi told wan what you your ID 1 0 1[4] 0 0 0 1[7] 0 0 0 0 0 1[5] 0 ID 2 0 0 1[9] 1[6] 0 0 1[2] 1[0] 1[3] 1[1] 1[5] 1[4] 1[8] ID 3 1[2] 0 1[4] 0 1[1] 0 0 0 0 0 0 0 1[3]

Slide 30

Slide 30

To / The Index

Slide 31

Slide 31

PUT /starwars { "settings": { "number_of_shards": 1, "analysis": { "filter": { "my_synonym_filter": { "type": "synonym", "synonyms": [ "father,dad", "droid => droid,machine" ] } },

Slide 32

Slide 32

}, } "analyzer": { "my_analyzer": { "char_filter": [ "html_strip" ], "tokenizer": "standard", "filter": [ "lowercase", "stop", "snowball", "my_synonym_filter" ] } }

Slide 33

Slide 33

} "mappings": { "_doc": { "properties": { "quote": { "type": "text", "analyzer": "my_analyzer" } } } }

Slide 34

Slide 34

GET /starwars/_mapping GET /starwars/_settings

Slide 35

Slide 35

PUT /starwars/_doc/1 { "quote": "These are <em>not</em> the droids you are looking for." } PUT /starwars/_doc/2 { "quote": "Obi-Wan never told you what happened to your father." } PUT /starwars/_doc/3 { "quote": "<b>No</b>. I am your father." }

Slide 36

Slide 36

GET /starwars/_doc/1 GET /starwars/_doc/1/_source

Slide 37

Slide 37

Search

Slide 38

Slide 38

POST /starwars/_search { "query": { "match_all": { } } }

Slide 39

Slide 39

GET vs POST

Slide 40

Slide 40

{ "took": 1, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 3, "max_score": 1, "hits": [ { "_index": "starwars", "_type": "_doc", "_id": "2", "_score": 1, "_source": { "quote": "Obi-Wan never told you what happened to your father." } }, ...

Slide 41

Slide 41

POST /starwars/_search { "query": { "match": { "quote": "droid" } } }

Slide 42

Slide 42

{ } "took": 2, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 1, "max_score": 0.39556286, "hits": [ { "_index": "starwars", "_type": "_doc", "_id": "1", "_score": 0.39556286, "_source": { "quote": "These are <em>not</em> the droids you are looking for." } } ] }

Slide 43

Slide 43

POST /starwars/_search { "query": { "match": { "quote": "dad" } } }

Slide 44

Slide 44

... "hits": { "total": 2, "max_score": 0.41913947, "hits": [ { "_index": "starwars", "_type": "_doc", "_id": "3", "_score": 0.41913947, "_source": { "quote": "<b>No</b>. I am your father." } }, { "_index": "starwars", "_type": "_doc", "_id": "2", "_score": 0.39291072, "_source": { "quote": "Obi-Wan never told you what happened to your father." } } ] } }

Slide 45

Slide 45

POST /starwars/_doc/0/_explain { "query": { "match": { "quote": "dad" } } }

Slide 46

Slide 46

{ } "_index": "starwars", "_type": "_doc", "_id": "0", "matched": false

Slide 47

Slide 47

POST /starwars/_doc/1/_explain { "query": { "match": { "quote": "dad" } } }

Slide 48

Slide 48

{ } "_index": "starwars", "_type": "_doc", "_id": "1", "matched": false, "explanation": { "value": 0, "description": "no matching term", "details": [] }

Slide 49

Slide 49

POST /starwars/_doc/2/_explain { "query": { "match": { "quote": "dad" } } }

Slide 50

Slide 50

{ "_index": "starwars", "_type": "_doc", "_id": "2", "matched": true, "explanation": { ...

Slide 51

Slide 51

POST /starwars/_search { "query": { "match": { "quote": "machine" } } }

Slide 52

Slide 52

{ } "took": 2, "timed_out": false, "_shards": { "total": 1, "successful": 1, "skipped": 0, "failed": 0 }, "hits": { "total": 1, "max_score": 1.2499592, "hits": [ { "_index": "starwars", "_type": "_doc", "_id": "1", "_score": 1.2499592, "_source": { "quote": "These are <em>not</em> the droids you are looking for." } } ] }

Slide 53

Slide 53

POST /starwars/_search { "query": { "match_phrase": { "quote": "I am your father" } } }

Slide 54

Slide 54

{ } "took": 3, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 1, "max_score": 1.5665855, "hits": [ { "_index": "starwars", "_type": "_doc", "_id": "3", "_score": 1.5665855, "_source": { "quote": "<b>No</b>. I am your father." } } ] }

Slide 55

Slide 55

POST /starwars/_search { "query": { "match_phrase": { "quote": { "query": "I am father", "slop": 1 } } } }

Slide 56

Slide 56

{ } "took": 16, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 1, "max_score": 0.8327639, "hits": [ { "_index": "starwars", "_type": "_doc", "_id": "3", "_score": 0.8327639, "_source": { "quote": "<b>No</b>. I am your father." } } ] }

Slide 57

Slide 57

POST /starwars/_search { "query": { "match_phrase": { "quote": { "query": "I am not your father", "slop": 1 } } } }

Slide 58

Slide 58

{ } "took": 5, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 1, "max_score": 1.0409548, "hits": [ { "_index": "starwars", "_type": "_doc", "_id": "3", "_score": 1.0409548, "_source": { "quote": "<b>No</b>. I am your father." } } ] }

Slide 59

Slide 59

POST /starwars/_search { "query": { "match": { "quote": { "query": "van", "fuzziness": "AUTO" } } } }

Slide 60

Slide 60

{ } "took": 14, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 1, "max_score": 0.18155496, "hits": [ { "_index": "starwars", "_type": "_doc", "_id": "2", "_score": 0.18155496, "_source": { "quote": "Obi-Wan never told you what happened to your father." } } ] }

Slide 61

Slide 61

POST /starwars/_search { "query": { "match": { "quote": { "query": "ovi-van", "fuzziness": 1 } } } }

Slide 62

Slide 62

{ } "took": 109, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 1, "max_score": 0.3798467, "hits": [ { "_index": "starwars", "_type": "_doc", "_id": "2", "_score": 0.3798467, "_source": { "quote": "Obi-Wan never told you what happened to your father." } } ] }

Slide 63

Slide 63

FuzzyQuery History http://blog.mikemccandless.com/2011/03/lucenes-fuzzyquery-is-100-times-faster.html Before: Brute force Now: Levenshtein Automaton

Slide 64

Slide 64

http://blog.notdot.net/2010/07/Damn-Cool-Algorithms-Levenshtein-Automata

Slide 65

Slide 65

SELECT * FROM starwars WHERE quote LIKE "?an" OR quote LIKE "V?n" OR quote LIKE "Va?"

Slide 66

Slide 66

Scoring

Slide 67

Slide 67

Term Frequency / Inverse Document Frequency (TF/IDF) Search one term

Slide 68

Slide 68

BM25 Default in Elasticsearch 5.0 https://speakerdeck.com/elastic/improved-text-scoring-withbm25

Slide 69

Slide 69

Term Frequency

Slide 70

Slide 70

Slide 71

Slide 71

Inverse Document Frequency

Slide 72

Slide 72

Slide 73

Slide 73

Field-Length Norm

Slide 74

Slide 74

POST /starwars/_search?explain=true { "query": { "match": { "quote": "father" } } }

Slide 75

Slide 75

... "_explanation": { "value": 0.41913947, "description": "weight(Synonym(quote:dad quote:father) in 0) [PerFieldSimilarity], result of:", "details": [ { "value": 0.41913947, "description": "score(doc=0,freq=2.0 = termFreq=2.0\n), product of:", "details": [ { "value": 0.2876821, "description": "idf(docFreq=1, docCount=1)", "details": [] }, { "value": 1.4569536, "description": "tfNorm, computed from:", "details": [ { "value": 2, "description": "termFreq=2.0", "details": [] }, ...

Slide 76

Slide 76

Score 0.41913947: i am your father 0.39291072: obi wan never told what happen your father you

Slide 77

Slide 77

Vector Space Model Search multiple terms

Slide 78

Slide 78

Search your father

Slide 79

Slide 79

Slide 80

Slide 80

Coordination Factor Reward multiple terms

Slide 81

Slide 81

Search for 3 terms 1 term: 2 terms: 3 terms:

Slide 82

Slide 82

Practical Scoring Function Putting it all together

Slide 83

Slide 83

score(q,d) = queryNorm(q) · coord(q,d) · ∑ ( tf(t in d) · idf(t)² · t.getBoost() · norm(t,d) ) (t in q)

Slide 84

Slide 84

Function Score Script, weight, random, field value, decay (geo or date)

Slide 85

Slide 85

POST /starwars/_search { "query": { "function_score": { "query": { "match": { "quote": "father" } }, "random_score": {} } } }

Slide 86

Slide 86

Compare Scores "100% perfect" vs a "50%" match

Slide 87

Slide 87

Don't do this. Seriously. Stop trying to think about your problem this way, it's not going to end well. — https://wiki.apache.org/lucene-java/ ScoresAsPercentages

Slide 88

Slide 88

GET /starwars/_analyze { "analyzer" : "my_analyzer", "text": "These are my father's machines." }

Slide 89

Slide 89

{ "tokens": [ { "token": "my", "start_offset": 10, "end_offset": 12, "type": "<ALPHANUM>", "position": 2 }, { "token": "father", "start_offset": 13, "end_offset": 21, "type": "<ALPHANUM>", "position": 3 }, { "token": "dad", "start_offset": 13, "end_offset": 21, "type": "SYNONYM", "position": 3 }, { "token": "machin", "start_offset": 22, "end_offset": 30, "type": "<ALPHANUM>", "position": 4 } ] }

Slide 90

Slide 90

PUT /starwars/_doc/4 { "quote": "These are my father's machines." }

Slide 91

Slide 91

POST /starwars/_search { "query": { "match": { "quote": "my father machine" } } }

Slide 92

Slide 92

"hits": { "total": 4, "max_score": 2.92523, "hits": [ { "_index": "starwars", "_type": "_doc", "_id": "4", "_score": 2.92523, "_source": { "quote": "These are my father's machines." } }, { "_index": "starwars", "_type": "_doc", "_id": "1", "_score": 0.8617505, "_source": { "quote": "These are <em>not</em> the droids you are looking for." } }, ...

Slide 93

Slide 93

2.92523 == 100%

Slide 94

Slide 94

DELETE /starwars/_doc/4 POST /starwars/_search { "query": { "match": { "quote": "my father machine" } } }

Slide 95

Slide 95

"hits": { "total": 3, "max_score": 1.2499592, "hits": [ { "_index": "starwars", "_type": "_doc", "_id": "1", "_score": 1.2499592, "_source": { "quote": "These are <em>not</em> the droids you are looking for." } }, ...

Slide 96

Slide 96

1.2499592 == 43% or 100%?

Slide 97

Slide 97

PUT /starwars/_doc/4 { "quote": "These droids are my father's father's machines." } POST /starwars/_search { "query": { "match": { "quote": "my father machine" } } }

Slide 98

Slide 98

"hits": { "total": 4, "max_score": 3.0068164, "hits": [ { "_index": "starwars", "_type": "_doc", "_id": "4", "_score": 3.0068164, "_source": { "quote": "These droids are my father's father's machines." } }, { "_index": "starwars", "_type": "_doc", "_id": "1", "_score": 0.89701396, "_source": { "quote": "These are <em>not</em> the droids you are looking for." } }, ...

Slide 99

Slide 99

3.0068164 == 103%?

Slide 100

Slide 100

Slide 101

Slide 101

Performance

Slide 102

Slide 102

Slide 103

Slide 103

Slide 104

Slide 104

Conclusion

Slide 105

Slide 105

Indexing Formatting Tokenize Lowercase, Stop Words, Stemming Synonyms

Slide 106

Slide 106

Scoring Term Frequency Inverse Document Frequency Field-Length Norm Vector Space Model

Slide 107

Slide 107

Advanced Queries Highlighting NGrams & Edge Grams Multiple Analyzers Reindex & Alias

Slide 108

Slide 108

There is more Elastic Stack

Slide 109

Slide 109

Slide 110

Slide 110

Thank You! Questions? Philipp Krenn PS: Stickers @xeraa

Slide 111

Slide 111

More

Slide 112

Slide 112

POST /starwars/_search { "query": { "match": { "quote": "father" } }, "highlight": { "type": "unified", "pre_tags": [ "<tag>" ], "post_tags": [ "</tag>" ], "fields": { "quote": {} } } }

Slide 113

Slide 113

... "hits": [ { "_index": "starwars", "_type": "_doc", "_id": "3", "_score": 0.41913947, "_source": { "quote": "<b>No</b>. I am your father." }, "highlight": { "quote": [ "<b>No</b>. I am your <tag>father</tag>." ] } }, ...

Slide 114

Slide 114

Boolean Queries must must_not should filter

Slide 115

Slide 115

POST /starwars/_search { "query": { "bool": { "must": { "match": { "quote": "father" } }, "should": [ { "match": { "quote": "your" } }, { "match": { "quote": "obi" } } ] } } }

Slide 116

Slide 116

... "hits": { "total": 2, "max_score": 0.96268076, "hits": [ { "_index": "starwars", "_type": "_doc", "_id": "2", "_score": 0.96268076, "_source": { "quote": "Obi-Wan never told you what happened to your father." } }, { "_index": "starwars", "_type": "_doc", "_id": "3", "_score": 0.73245656, "_source": { "quote": "<b>No</b>. I am your father." } } ] } }

Slide 117

Slide 117

POST /starwars/_search { "query": { "bool": { "filter": { "match": { "quote": "father" } }, "should": [ { "match": { "quote": "your" } }, { "match": { "quote": "obi" } } ] } } }

Slide 118

Slide 118

... "hits": { "total": 2, "max_score": 0.56977004, "hits": [ { "_index": "starwars", "_type": "_doc", "_id": "2", "_score": 0.56977004, "_source": { "quote": "Obi-Wan never told you what happened to your father." } }, { "_index": "starwars", "_type": "_doc", "_id": "3", "_score": 0.31331712, "_source": { "quote": "<b>No</b>. I am your father." } } ] } }

Slide 119

Slide 119

Named Queries & minimum_should_match

Slide 120

Slide 120

POST /starwars/_search { "query": { "bool": { "must": { "match": { "quote": "father" } }, "should": [ { "match": { "quote": { "query": "your", "_name": "quote-your" } } }, { "match": { "quote": { "query": "obi", "_name": "quote-obi" } } }, { "match": { "quote": { "query": "droid", "_name": "quote-droid" } } } ], "minimum_should_match": 2 } } }

Slide 121

Slide 121

... "hits": { "total": 1, "max_score": 1.8154771, "hits": [ { "_index": "starwars", "_type": "_doc", "_id": "2", "_score": 1.8154771, "_source": { "quote": "Obi-Wan never told you what happened to your father." }, "matched_queries": [ "quote-obi", "quote-your" ] } ] } }

Slide 122

Slide 122

Boosting >1 increase, <1 decrease, <0 punish

Slide 123

Slide 123

POST /starwars/_search { "query": { "bool": { "must": { "match": { "quote": "father" } }, "should": [ { "match": { "quote": "your" } }, { "match": { "quote": { "query": "obi", "boost": 3 } } } ] } } }

Slide 124

Slide 124

... "hits": { "total": 2, "max_score": 1.5324509, "hits": [ { "_index": "starwars", "_type": "_doc", "_id": "2", "_score": 1.5324509, "_source": { "quote": "Obi-Wan never told you what happened to your father." } }, { "_index": "starwars", "_type": "_doc", "_id": "3", "_score": 0.73245656, "_source": { "quote": "<b>No</b>. I am your father." } } ] } }

Slide 125

Slide 125

Suggestion Suggest a similar text _search end point _suggest deprecated since 5.0

Slide 126

Slide 126

POST /starwars/_search { "query": { "match": { "quote": "drui" } }, "suggest": { "my_suggestion" : { "text" : "drui", "term" : { "field" : "quote" } } } }

Slide 127

Slide 127

... "hits": { "total": 0, "max_score": null, "hits": [] }, "suggest": { "my_suggestion": [ { "text": "drui", "offset": 0, "length": 4, "options": [ { "text": "droid", "score": 0.5, "freq": 1 } ] } ] } }

Slide 128

Slide 128

NGram Partial matches Trigram Edge Gram

Slide 129

Slide 129

GET /_analyze { "char_filter": [ "html_strip" ], "tokenizer": { "type": "ngram", "min_gram": "3", "max_gram": "3", "token_chars": [ "letter" ] }, "filter": [ "lowercase" ], "text": "These are <em>not</em> the droids you are looking for." }

Slide 130

Slide 130

{ "tokens": [ { "token": "the", "start_offset": 0, "end_offset": 3, "type": "word", "position": 0 }, { "token": "hes", "start_offset": 1, "end_offset": 4, "type": "word", "position": 1 }, { "token": "ese", "start_offset": 2, "end_offset": 5, "type": "word", "position": 2 }, { "token": "are", "start_offset": 6, "end_offset": 9, "type": "word", "position": 3 }, ...

Slide 131

Slide 131

GET /_analyze { "char_filter": [ "html_strip" ], "tokenizer": { "type": "edge_ngram", "min_gram": "1", "max_gram": "3", "token_chars": [ "letter" ] }, "filter": [ "lowercase" ], "text": "These are <em>not</em> the droids you are looking for." }

Slide 132

Slide 132

{ "tokens": [ { "token": "t", "start_offset": 0, "end_offset": 1, "type": "word", "position": 0 }, { "token": "th", "start_offset": 0, "end_offset": 2, "type": "word", "position": 1 }, { "token": "the", "start_offset": 0, "end_offset": 3, "type": "word", "position": 2 }, { "token": "a", "start_offset": 6, "end_offset": 7, "type": "word", "position": 3 }, { "token": "ar", "start_offset": 6, "end_offset": 8, "type": "word", "position": 4 }, ...

Slide 133

Slide 133

Combining Analyzers Reindex Store multiple times Combine scores

Slide 134

Slide 134

PUT /starwars_v42 { "settings": { "number_of_shards": 1, "analysis": { "filter": { "my_synonym_filter": { "type": "synonym", "synonyms": [ "droid,machine", "father,dad" ] }, "my_ngram_filter": { "type": "ngram", "min_gram": "3", "max_gram": "3", "token_chars": [ "letter" ] } },

Slide 135

Slide 135

"analyzer": { "my_lowercase_analyzer": { "char_filter": [ "html_strip" ], "tokenizer": "whitespace", "filter": [ "lowercase" ] }, "my_full_analyzer": { "char_filter": [ "html_strip" ], "tokenizer": "standard", "filter": [ "lowercase", "stop", "snowball", "my_synonym_filter" ] },

Slide 136

Slide 136

}, } } "my_ngram_analyzer": { "char_filter": [ "html_strip" ], "tokenizer": "whitespace", "filter": [ "lowercase", "stop", "my_ngram_filter" ] }

Slide 137

Slide 137

} "mappings": { "_doc": { "properties": { "quote": { "type": "text", "fields": { "lowercase": { "type": "text", "analyzer": "my_lowercase_analyzer" }, "full": { "type": "text", "analyzer": "my_full_analyzer" }, "ngram": { "type": "text", "analyzer": "my_ngram_analyzer" } } } } } }

Slide 138

Slide 138

POST /_reindex { "source": { "index": "starwars" }, "dest": { "index": "starwars_v42" } }

Slide 139

Slide 139

PUT _alias { "actions": [ { "add": { "index": "starwars_v42", "alias": "starwars_extended" } } ] }

Slide 140

Slide 140

Aliases Atomic remove and add Point to multiple indices (read-only)

Slide 141

Slide 141

POST /starwars_extended/_search?explain=true { "query": { "multi_match": { "query": "obiwan", "fields": [ "quote", "quote.lowercase", "quote.full", "quote.ngram" ], "type": "most_fields" } } }

Slide 142

Slide 142

... "hits": { "total": 1, "max_score": 0.4912064, "hits": [ { "_shard": "[starwars_v42][2]", "_node": "BCDwzJ4WSw2dyoGLTzwlqw", "_index": "starwars_v42", "_type": "_doc", "_id": "2", "_score": 0.4912064, "_source": { "quote": "Obi-Wan never told you what happened to your father." }, ...

Slide 143

Slide 143

Whitespace Tokenizer "weight( Synonym(quote.ngram:biw quote.ngram:iwa quote.ngram:obi quote.ngram:wan) in 0) [PerFieldSimilarity], result of:"

Slide 144

Slide 144

POST /starwars_extended/_search { "query": { "multi_match": { "query": "you", "fields": [ "quote", "quote.lowercase", "quote.full^5", "quote.ngram" ], "type": "best_fields" } } }

Slide 145

Slide 145

"hits": [ { "_index": "starwars_v42", "_type": "_doc", "_id": "1", "_score": 1.6022799, "_source": { "quote": "These are <em>not</em> the droids you are looking for." } }, { "_index": "starwars_v42", "_type": "_doc", "_id": "2", "_score": 1.4997643, "_source": { "quote": "Obi-Wan never told you what happened to your father." } }, { "_index": "starwars_v42", "_type": "_doc", "_id": "3", "_score": 0.38650417, "_source": { "quote": "<b>No</b>. I am your father." } } ]

Slide 146

Slide 146

Multi Match Type best_fields Score of the best field (default) cross_fields All terms in at least one field most_fields Score sum of all fields phrase

Slide 147

Slide 147

Different Analyzers for Indexing and Searching Per query In the mapping

Slide 148

Slide 148

POST /starwars_extended/_search { "query": { "match": { "quote.ngram": { "query": "the", "analyzer": "standard" } } } }

Slide 149

Slide 149

... "hits": [ { "_index": "starwars_extended", "_type": "_doc", "_id": "2", "_score": 0.38254172, "_source": { "quote": "Obi-Wan never told you what happened to your father." } }, { "_index": "starwars_extended", "_type": "_doc", "_id": "3", "_score": 0.36165747, "_source": { "quote": "<b>No</b>. I am your father." } } ] ...

Slide 150

Slide 150

Edge Gram vs Trigram Extending a mapping Testing a custom mapping

Slide 151

Slide 151

POST /starwars_extended/_close PUT /starwars_extended/_settings { "analysis": { "filter": { "my_edgegram_filter": { "type": "edge_ngram", "min_gram": 3, "max_gram": 10 } }, "analyzer": { "my_edgegram_analyzer": { "char_filter": [ "html_strip" ], "tokenizer": "standard", "filter": [ "lowercase", "my_edgegram_filter" ] } } } } POST /starwars_extended/_open

Slide 152

Slide 152

GET starwars_extended/_analyze { "text": "Father", "analyzer": "my_edgegram_analyzer" }

Slide 153

Slide 153

{ } "tokens": [ { "token": "fat", "start_offset": 0, "end_offset": 6, "type": "<ALPHANUM>", "position": 0 }, { "token": "fath", "start_offset": 0, "end_offset": 6, "type": "<ALPHANUM>", "position": 0 }, { "token": "fathe", "start_offset": 0, "end_offset": 6, "type": "<ALPHANUM>", "position": 0 }, { "token": "father", "start_offset": 0, "end_offset": 6, "type": "<ALPHANUM>", "position": 0 } ]

Slide 154

Slide 154

PUT /starwars_extended/_doc/_mapping { "properties": { "quote": { "type": "text", "fields": { "edgegram": { "type": "text", "analyzer": "my_edgegram_analyzer", "search_analyzer": "standard" } } } } }

Slide 155

Slide 155

PUT /starwars_extended/_doc/4 { "quote": "I find your lack of faith disturbing." } PUT /starwars_extended/_doc/5 { "quote": "That... is your failure." }

Slide 156

Slide 156

GET /starwars_extended/_doc/4/_termvectors { "fields": [ "quote.edgegram" ], "offsets": true, "payloads": true, "positions": true, "term_statistics": true, "field_statistics": true }

Slide 157

Slide 157

{ "_index": "starwars_v42", "_type": "_doc", "_id": "4", "_version": 1, "found": true, "took": 3, "term_vectors": { "quote.edgegram": { "field_statistics": { "sum_doc_freq": 26, "doc_count": 2, "sum_ttf": 26 }, "terms": { "dis": { "doc_freq": 1, "ttf": 1, "term_freq": 1, "tokens": [ { "position": 6, "start_offset": 26, "end_offset": 36 } ] }, "dist": { "doc_freq": 1, "ttf": 1, ...

Slide 158

Slide 158

POST /starwars_extended/_search { "query": { "match": { "quote": "fail" } } }

Slide 159

Slide 159

POST /starwars_extended/_search { "query": { "match": { "quote.lowercase": "fail" } } }

Slide 160

Slide 160

POST /starwars_extended/_search { "query": { "match": { "quote.full": "fail" } } }

Slide 161

Slide 161

POST /starwars_extended/_search { "query": { "match": { "quote.ngram": "fail" } } }

Slide 162

Slide 162

... "hits": { "total": 2, "max_score": 1.0135446, "hits": [ { "_index": "starwars_v42", "_type": "_doc", "_id": "4", "_score": 1.0135446, "_source": { "quote": "I find your lack of faith disturbing." } }, { "_index": "starwars_v42", "_type": "_doc", "_id": "5", "_score": 0.50476736, "_source": { "quote": "That... is your failure." } } ] ...

Slide 163

Slide 163

POST /starwars_extended/_search { "query": { "match": { "quote.edgegram": "fail" } } }

Slide 164

Slide 164

... "hits": { "total": 1, "max_score": 0.39556286, "hits": [ { "_index": "starwars_v42", "_type": "_doc", "_id": "5", "_score": 0.39556286, "_source": { "quote": "That... is your failure." } } ] ...

Slide 165

Slide 165

The End