Elasticsearch使用:Match

您所在的位置:网站首页 phrase英文解释 Elasticsearch使用:Match

Elasticsearch使用:Match

2024-07-11 18:50| 来源: 网络整理| 查看: 265

简介

Es官方文档

match_phrase的特点:

词项匹配(查询分词的词项必须完全匹配到索引分词的词项中,并且词项的相对位置position必须一致)分词后的相对位置也必须要精准匹配(slop)使用slop之后,位置越近的得分就越高短语查询和邻近查询都比简单的 query 查询代价更高 。 一个 match 查询仅仅是看词条是否存在于倒排索引中,而一个 match_phrase 查询是必须计算并比较多个可能重复词项的位置

总结:

1.使用短语查询时使用Es默认的标准分词器(标准分词器:细粒度切分)最好,这样可以使查询分词和索引分词的词项最大可能的达到匹配

2.特别适合在一段文本中不连续的词的搭配情景(例:文章、说明、长文本...)

准备数据代码语言:javascript复制新建索引: PUT test_phrase 设置索引mapping: PUT /test_phrase/_mapping/_doc { "properties": { "name": { "type":"text" } } } 结果: { "mapping": { "_doc": { "properties": { "name": { "type": "text" } } } } } 插入数据: PUT test_phrase/_doc/2 { "name":"我爱北京天安门" } 查询数据: POST test_phrase/_search { "query": {"match_all": {}} } 结果: { "took" : 3, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : 2, "max_score" : 1.0, "hits" : [ { "_index" : "test_phrase", "_type" : "_doc", "_id" : "2", "_score" : 1.0, "_source" : { "name" : "我爱北京天安门" } } ] } } 查看分词词项: POST test_phrase/_analyze { "field": "name", "text": "我爱北京天安门" } 结果: { "tokens" : [ { "token" : "我", "start_offset" : 0, "end_offset" : 1, "type" : "", "position" : 0 }, { "token" : "爱", "start_offset" : 1, "end_offset" : 2, "type" : "", "position" : 1 }, { "token" : "北", "start_offset" : 2, "end_offset" : 3, "type" : "", "position" : 2 }, { "token" : "京", "start_offset" : 3, "end_offset" : 4, "type" : "", "position" : 3 }, { "token" : "天", "start_offset" : 4, "end_offset" : 5, "type" : "", "position" : 4 }, { "token" : "安", "start_offset" : 5, "end_offset" : 6, "type" : "", "position" : 5 }, { "token" : "门", "start_offset" : 6, "end_offset" : 7, "type" : "", "position" : 6 } ] }演示阶段关键词"我"代码语言:javascript复制POST test_phrase/_search { "query": { "match_phrase": { "name": { "query": "我" } } } } 结果: { "took" : 2, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 0.2876821, "hits" : [ { "_index" : "test_phrase", "_type" : "_doc", "_id" : "2", "_score" : 0.2876821, "_source" : { "name" : "我爱北京天安门" } } ] } } 分析: POST test_phrase/_analyze { "field": "name", "text": "我" } { "tokens" : [ { "token" : "我", "start_offset" : 0, "end_offset" : 1, "type" : "", "position" : 0 } ] } 查询分词"我"的position位置是0,首先文档"我爱北京天安门"的索引分词中有"我"且position为0,符合短语查询的要求,因此可以正确返回。关键词"我爱"代码语言:javascript复制POST test_phrase/_search { "query": { "match_phrase": { "name": { "query": "我爱" } } } } 结果: { "took" : 4, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 0.5753642, "hits" : [ { "_index" : "test_phrase", "_type" : "_doc", "_id" : "2", "_score" : 0.5753642, "_source" : { "name" : "我爱北京天安门" } } ] } } 分析: POST test_phrase/_analyze { "field": "name", "text": "我爱" } { "tokens" : [ { "token" : "我", "start_offset" : 0, "end_offset" : 1, "type" : "", "position" : 0 }, { "token" : "爱", "start_offset" : 1, "end_offset" : 2, "type" : "", "position" : 1 } ] } 查询分词"我爱"的position分别是"我"-0、"爱"-1, 索引分词中也存在"我"、"爱"词项,其次"我"-0、"爱"-1的相对position也符合要求,因此可以正确返回。关键词"我北"代码语言:javascript复制POST test_phrase/_search { "query": { "match_phrase": { "name": { "query": "我北" } } } } 结果: { "took" : 2, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : 0, "max_score" : null, "hits" : [ ] } } 分析: POST test_phrase/_analyze { "field": "name", "text": "我北" } { "tokens" : [ { "token" : "我", "start_offset" : 0, "end_offset" : 1, "type" : "", "position" : 0 }, { "token" : "北", "start_offset" : 1, "end_offset" : 2, "type" : "", "position" : 1 } ] } 查询分词中"我"的position是0,"北"的position是1, 索引分词中"我"的position是0,"北"的position是2, 虽然查询分词的词项在索引分词的词项中都存在,但是相对的position并未匹配要求,导致搜索结果不能正确返回。 修正:"slop": 1 POST test_phrase/_search { "query": { "match_phrase": { "name": { "query": "我北", "slop": 1 } } } } { "took" : 5, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 0.37229446, "hits" : [ { "_index" : "test_phrase", "_type" : "_doc", "_id" : "2", "_score" : 0.37229446, "_source" : { "name" : "我爱北京天安门" } } ] } }关键词“爱京”代码语言:javascript复制POST test_phrase/_search { "query": { "match_phrase": { "name": { "query": "爱北京" } } } } { "took" : 2, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 0.8630463, "hits" : [ { "_index" : "test_phrase", "_type" : "_doc", "_id" : "2", "_score" : 0.8630463, "_source" : { "name" : "我爱北京天安门" } } ] } } 查询分词中"爱"的position是0,"北"的position是1,"京"的position是2。 索引分词中"爱"的position是1,"北"的position是2,"京"的position是3。 查询分词和索引分词的词项都匹配,同时词项的相对位置也符合要求,所以可以检索成功。提升相关度使用邻近度提高相关度

我们可以将一个简单的 match 查询作为一个 must 子句。 这个查询将决定哪些文档需要被包含到结果集中。 我们可以用 minimum_should_match 参数去除长尾。 然后我们可以以 should 子句的形式添加更多特定查询。 每一个匹配成功的都会增加匹配文档的相关度。

代码语言:javascript复制GET /my_index/my_type/_search { "query": { "bool": { "must": { "match": { #must 子句从结果集中包含或者排除文档 "title": { "query": "quick brown fox", "minimum_should_match": "30%" } } }, "should": { "match_phrase": { #should 子句增加了匹配到文档的相关度评分。 "title": { "query": "quick brown fox", "slop": 50 } } } } } }


【本文地址】


今日新闻


推荐新闻


CopyRight 2018-2019 办公设备维修网 版权所有 豫ICP备15022753号-3