Leon's Blogging

Coding blogging for hackers.

Advance ElasticSearch

| Comments

Bool Query

  • must - 查詢必須匹配的字,並計算 _score (與 AND 等價)
  • filter - 查詢必須匹配的字,不計算 _score (代表對評分沒有任何貢獻,只是用來過濾)
  • should - 滿足任一匹配的字,將增加 _score ,否則,無任何影響,如果一個 query 中沒有 mustfilter 則必須匹配一個或以上的 should (與 OR 等價)
  • must_not - 查詢排除的字 (與 NOT 等價)
  • boost - 權重
  • minimum_should_match - 設定 should 至少要匹配幾個句子

Example 1:

user 必須是 kimchy,並且過濾出 tag 是 “tech” (匹配多寡並不影響 score),age 範圍排除 10 ~ 20,如果 tag 有 wow 或是 elasticsearchscore 比較高,兩個都有則更高

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
{
    "bool" : {
        "must" : {
            "term" : { "user" : "kimchy" }
        },
        "filter": {
            "term" : { "tag" : "tech" }
        },
        "must_not" : {
            "range" : {
                "age" : { "from" : 10, "to" : 20 }
            }
        },
        "should" : [
            {
                "term" : { "tag" : "wow" }
            },
            {
                "term" : { "tag" : "elasticsearch" }
            }
        ],
        "minimum_should_match" : 1,
        "boost" : 1.0
    }
}

Example 2:

將 bool 帶入 filter 一樣可以不計算分數

查找 title 字段匹配 how to make millions 並且不被 tagspam 的文件。那些被 tagstarred 或在2014之後的文件,將比另外那些文件擁有更高的排名。如果 兩者 都滿足,那麼它排名將更高,並過濾出 price 必須小於等於 29.99,且 category 不能是 ebooks 這兩個條件則不影響排名

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
{
    "bool": {
        "must":     { "match": { "title": "how to make millions" }},
        "must_not": { "match": { "tag":   "spam" }},
        "should": [
            { "match": { "tag": "starred" }},
            { "range": { "date": { "gte": "2014-01-01" }}}
        ],
        "filter": {
          "bool": {
              "must": [
                  { "range": { "price": { "lte": 29.99 }}}
              ],
              "must_not": [
                  { "term": { "category": "ebooks" }}
              ]
          }
        }
    }
}

Example3: constant_score

它將一個不變的常量評分應用於所有匹配的文件,比較簡潔用來取代只有一個 filter 的 bool

1
2
3
4
5
6
7
{
    "constant_score":   {
        "filter": {
            "term": { "category": "ebooks" }
        }
    }
}

Example4: boost 權重

預設為 1

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
{
    "query": {
        "bool": {
            "must": {
                "match": {
                    "content": {
                        "query":    "full text search",
                        "operator": "and"
                    }
                }
            },
            "should": [
                { "match": {
                    "content": {
                        "query": "Elasticsearch",
                        "boost": 3
                    }
                }},
                { "match": {
                    "content": {
                        "query": "Lucene",
                        "boost": 2
                    }
                }}
            ]
        }
    }
}

Example5: equle to match

OR

下面兩個相等

1
2
3
{
    "match": { "title": "brown fox"}
}
1
2
3
4
5
6
7
8
{
  "bool": {
    "should": [
      { "term": { "title": "brown" }},
      { "term": { "title": "fox"   }}
    ]
  }
}

AND

下面兩個相等

1
2
3
4
5
6
7
8
{
    "match": {
        "title": {
            "query":    "brown fox",
            "operator": "and"
        }
    }
}
1
2
3
4
5
6
7
8
{
  "bool": {
    "must": [
      { "term": { "title": "brown" }},
      { "term": { "title": "fox"   }}
    ]
  }
}

minimum_should_match

下面兩個相等

1
2
3
4
5
6
7
8
{
    "match": {
        "title": {
            "query":                "quick brown fox",
            "minimum_should_match": "75%"
        }
    }
}
1
2
3
4
5
6
7
8
9
10
{
  "bool": {
    "should": [
      { "term": { "title": "brown" }},
      { "term": { "title": "fox"   }},
      { "term": { "title": "quick" }}
    ],
    "minimum_should_match": 2
  }
}

Exact Values Search

數字查詢

1
2
3
SELECT document
FROM   products
WHERE  price = 20

通常精準的字查詢,就不需要計算分數,因此加上 constant_score

1
2
3
4
5
6
7
8
9
10
11
{
    "query" : {
        "constant_score" : {
            "filter" : {
                "term" : {
                    "price" : 20
                }
            }
        }
    }
}

text 查詢

1
2
3
SELECT product
FROM   products
WHERE  productID = "XHDK-A-1293-#fJ3"

這裡會有個問題,分析器會解析 XHDK-A-1293-#fJ3 -> XHDK A 1293 #fJ3,因此查詢時會有問題

1
2
3
4
5
6
7
8
9
10
11
12
GET /my_store/products/_search
{
    "query" : {
        "constant_score" : {
            "filter" : {
                "term" : {
                    "productID" : "XHDK-A-1293-#fJ3"
                }
            }
        }
    }
}

必須重新針對 productID 設定不要分析,重新設定前記得先刪除原本的 index

1
2
3
4
5
6
7
8
9
10
11
12
13
{
    "mappings" : {
        "products" : {
            "properties" : {
                "productID" : {
                    "type" : "string",
                    "index" : "not_analyzed"
                }
            }
        }
    }

}

Combining Filters

Example1

1
2
3
4
SELECT product
FROM   products
WHERE  (price = 20 OR productID = "XHDK-A-1293-#fJ3")
  AND  (price != 30)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
{
   "query" : {
      "bool" : {
         "filter" : {
            "bool" : {
              "should" : [
                 { "term" : {"price" : 20}},
                 { "term" : {"productID" : "XHDK-A-1293-#fJ3"}}
              ],
              "must_not" : {
                 "term" : {"price" : 30}
              }
           }
         }
      }
   }
}

Example2

1
2
3
4
5
SELECT document
FROM   products
WHERE  productID      = "KDKE-B-9947-#kL5"
  OR (     productID = "JODL-X-1937-#pV7"
       AND price     = 30 )
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
{
   "query" : {
      "bool" : {
         "filter" : {
            "bool" : {
              "should" : [
                { "term" : {"productID" : "KDKE-B-9947-#kL5"}},
                { "bool" : {
                  "must" : [
                    { "term" : {"productID" : "JODL-X-1937-#pV7"}},
                    { "term" : {"price" : 30}}
                  ]
                }}
              ]
           }
         }
      }
   }
}

Example3

  • 在收件箱中,且沒有被讀過的
  • 不在 收件箱中,但被標註重要的
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
{
  "query": {
      "constant_score": {
          "filter": {
              "bool": {
                 "should": [
                    { "bool": {
                          "must": [
                             { "term": { "folder": "inbox" }},
                             { "term": { "read": false }}
                          ]
                    }},
                    { "bool": {
                          "must_not": {
                             "term": { "folder": "inbox" }
                          },
                          "must": {
                             "term": { "important": true }
                          }
                    }}
                 ]
              }
            }
        }
    }
}

Disjunction Max Query 最佳字段

給予兩個字段

1
2
3
4
5
6
7
8
9
10
11
PUT /my_index/my_type/1
{
    "title": "Quick brown rabbits",
    "body":  "Brown rabbits are commonly seen."
}

PUT /my_index/my_type/2
{
    "title": "Keeping pets healthy",
    "body":  "My quick brown fox eats rabbits on a regular basis."
}

bool

使用一般的 bool,會發現 1 的分數會比較高,主要在於 1 的兩個句字都有包含到 Brown,但我們希望的是比較準確的 2,因為 body 就包含了 Brown fox

1
2
3
4
5
6
7
8
9
10
{
    "query": {
        "bool": {
            "should": [
                { "match": { "title": "Brown fox" }},
                { "match": { "body":  "Brown fox" }}
            ]
        }
    }
}

dis_max

將任何與任一查詢匹配的文件作為結果返回,但只將最佳匹配的評分作為查詢的評分結果返回

1
2
3
4
5
6
7
8
9
10
{
    "query": {
        "dis_max": {
            "queries": [
                { "match": { "title": "Brown fox" }},
                { "match": { "body":  "Brown fox" }}
            ]
        }
    }
}

如果用 dis_max 查出的兩個最佳匹配分數一樣,可以加上 tie_breaker 調優,將其他匹配的語句一起做計算並乘個比例,範圍在 0~1

tie_breaker 可以是 0 到 1 之間的浮點數,其中 0 代表使用 dis_max 最佳匹配語句的普通邏輯, 1 表示所有匹配語句同等重要。最佳的精確值需要根據數據與查詢調試得出,但是合理值應該與零接近(處於 0.1 - 0.4 之間),這樣就不會顛覆 dis_max 最佳匹配性質的根本。

1
2
3
4
5
6
7
8
9
10
11
{
    "query": {
        "dis_max": {
            "queries": [
                { "match": { "title": "Quick pets" }},
                { "match": { "body":  "Quick pets" }}
            ],
            "tie_breaker": 0.3
        }
    }
}

post_filter

post_filter

後過濾器,可以針對 query 完後的結果,做最後的 filter,並且不影響 aggregation

使用場景,像是用 agg 列出 category list,當點選某一個 category 時,並不希望影響到 category list,而只針對結果進行 filter

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
{
    "size" : 0,
    "query": {
        "match": {
            "make": "ford"
        }
    },
    "post_filter": {
        "term" : {
            "color" : "green"
        }
    },
    "aggs" : {
        "all_colors": {
            "terms" : { "field" : "color" }
        }
    }
}

Function Score Query

Reference

Comments