Stop Words a an and are as at be but by for if in into is it no not of on or such that the their then there these they this to was will with https://github.com/apache/lucene-solr/blob/master/lucene/ core/src/java/org/apache/lucene/analysis/standard/ StandardAnalyzer.java#L44-L50
Language Rules English: Philipp's → philipp French: l'église → eglis German: äußerst → ausserst
Slide 25
Another Example Obi-Wan never told you what happened to your father.
Slide 26
Another Example obi
wan never told you what happen your father
Slide 27
Another Example <b>No</b>. I am your father.
Slide 28
Another Example i
am your
father
Slide 29
Inverted Index am droid father happen i look never obi told wan what you your
ID 1 0 1[4] 0 0 0 1[7] 0 0 0 0 0 1[5] 0
ID 2 0 0 1[9] 1[6] 0 0 1[2] 1[0] 1[3] 1[1] 1[5] 1[4] 1[8]
ID 3 1[2] 0 1[4] 0 1[1] 0 0 0 0 0 0 0 1[3]
PUT /starwars/_doc/1 { "quote": "These are <em>not</em> the droids you are looking for." } PUT /starwars/_doc/2 { "quote": "Obi-Wan never told you what happened to your father." } PUT /starwars/_doc/3 { "quote": "<b>No</b>. I am your father." }
Slide 36
GET /starwars/_doc/1 GET /starwars/_doc/1/_source
Slide 37
Search
Slide 38
POST /starwars/_search { "query": { "match_all": { } } }
Slide 39
GET vs POST
Slide 40
{
"took": 1, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 3, "max_score": 1, "hits": [ { "_index": "starwars", "_type": "_doc", "_id": "2", "_score": 1, "_source": { "quote": "Obi-Wan never told you what happened to your father." } }, ...
Don't do this. Seriously. Stop trying to think about your problem this way, it's not going to end well. — https://wiki.apache.org/lucene-java/ ScoresAsPercentages
Slide 88
GET /starwars/_analyze { "analyzer" : "my_analyzer", "text": "These are my father's machines." }
PUT /starwars/_doc/4 { "quote": "These are my father's machines." }
Slide 91
POST /starwars/_search { "query": { "match": { "quote": "my father machine" } } }
Slide 92
"hits": { "total": 4, "max_score": 2.92523, "hits": [ { "_index": "starwars", "_type": "_doc", "_id": "4", "_score": 2.92523, "_source": { "quote": "These are my father's machines." } }, { "_index": "starwars", "_type": "_doc", "_id": "1", "_score": 0.8617505, "_source": { "quote": "These are <em>not</em> the droids you are looking for." } }, ...
Slide 93
2.92523 == 100%
Slide 94
DELETE /starwars/_doc/4 POST /starwars/_search { "query": { "match": { "quote": "my father machine" } } }
Slide 95
"hits": { "total": 3, "max_score": 1.2499592, "hits": [ { "_index": "starwars", "_type": "_doc", "_id": "1", "_score": 1.2499592, "_source": { "quote": "These are <em>not</em> the droids you are looking for." } }, ...
Slide 96
1.2499592 == 43% or 100%?
Slide 97
PUT /starwars/_doc/4 { "quote": "These droids are my father's father's machines." } POST /starwars/_search { "query": { "match": { "quote": "my father machine" } } }
Slide 98
"hits": { "total": 4, "max_score": 3.0068164, "hits": [ { "_index": "starwars", "_type": "_doc", "_id": "4", "_score": 3.0068164, "_source": { "quote": "These droids are my father's father's machines." } }, { "_index": "starwars", "_type": "_doc", "_id": "1", "_score": 0.89701396, "_source": { "quote": "These are <em>not</em> the droids you are looking for." } }, ...
PUT /starwars_extended/_doc/4 { "quote": "I find your lack of faith disturbing." } PUT /starwars_extended/_doc/5 { "quote": "That... is your failure." }