Elastic Stack如何使用

如果你没有听说过Elastic Stack,那你一定听说过ELK,实际上ELK是三款软件的简称,分别是ElasticsearchLogstashKibana组成,在发展的过程中,又有新成员Beats的加入,所以就形成了Elastic Stack。所以说,ELK是旧的称呼,Elastic Stack是新的名字。

索引

创建索引

创建默认索引

创建索引api接口地址:127.0.0.1:9200/articles?pretty(创建articles索引)

请求方式:put

查看刚才创建好了articles状态

127.0.0.1:9200/articles/?pretty

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
{
"articles": {
"aliases": {},
"mappings": {},
"settings": {
"index": {
"routing": {
"allocation": {
"include": {
"_tier_preference": "data_content"
}
}
},
"number_of_shards": "1",
"provided_name": "articles",
"creation_date": "1625205220230",
"number_of_replicas": "1",
"uuid": "LoE8nBAVRnaLA4GPJfiBuA",
"version": {
"created": "7130299"
}
}
}
}
}

number_of_shards 是指索引要做多少个分片,只能在创建索引时指定,后期无法修改。(创建时未指定,默认为1)
number_of_replicas 是指每个分片有多少个副本,后期可以动态修改。(创建时未指定,默认为1)

primary shard:主分片,每个文档都存储在一个分片中,当你存储一个文档的时候,系统会首先存储在主分片中,然后会复制到不同的副本中。默认情况下,一个索引有5个主分片。你可以在事先制定分片的数量,当分片一旦建立,分片的数量则不能修改。

replica shard:副本分片,每一个分片有零个或多个副本。副本主要是主分片的复制,可以 增加高可用性,提高性能。
默认情况下,一个主分配有一个副本,但副本的数量可以在后面动态的配置增加。
副本必须部署在不同的节点上,不能部署在和主分片相同的节点上。

创建索引时并设置分片

创建索引api接口地址:127.0.0.1:9200/articles?pretty(创建articles索引)

请求方式:put

请求体:

1
2
3
4
5
6
 {
"settings":{
"index.number_of_shards":2,
"index.number_of_replicas":1
}
}

查看索引状态

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
{
"articles": {
"aliases": {},
"mappings": {},
"settings": {
"index": {
"routing": {
"allocation": {
"include": {
"_tier_preference": "data_content"
}
}
},
"number_of_shards": "2",
"provided_name": "articles",
"creation_date": "1625200722765",
"number_of_replicas": "1",
"uuid": "7cl2XAMLSWegRYMXFmI87Q",
"version": {
"created": "7130299"
}
}
}
}
}

创建索引时并设置映射

创建用户索引api地址:127.0.0.1:9200/users?pretty

请求方式:put

请求体:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
 {
"mappings":{
"properties":{
"name":{
"type":"text",
"analyzer": "ik_max_word"
},
"age":{
"type":"integer"
},
"createtime":{
"type":"date"
},
"position":{
"type":"text",
"analyzer": "ik_max_word"
},
"url": {
"type": "keyword",
"index": false,
"doc_values": false
}
}
}
}

type:字段类型

analyzer:分析器(这里使用了ik中文分词器,第三方插件需要安装);不设置默认使用standard标准分析器,即逐个字符拆分。

index:禁用索引,这个字段不能被搜索,但是它并不妨碍做聚合。

doc_values:对一个字段进行排序;对一个字段进行聚合;某些过滤,比如地理位置过滤 某些与字段相关的脚本计算; 使用 docvalue_fields 返回搜索结果部分字段值

查询

查询所有文档

语法:elasticsearch服务地址/索引/_search

可选参数

_source:只获取 _source 部分参数,类似数据库查询中的指定字段,而不是 select * 返回 所有字段(多个字段之间使用逗号分隔)

size: 要返回的结果数量,默认为 10

from: 要跳过的结果数量,默认为 0

查询5篇文章,从第10条开始查询,只显示id和title

  1. 使用get带参数请求查询

    请求方式:get

    请求地址:http://127.0.0.1:9903/articles/_search?_source=title,id&size=5&from=10

    返回结果:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    {
    "took": 5,
    "timed_out": false,
    "_shards": {
    "total": 3,
    "successful": 3,
    "skipped": 0,
    "failed": 0
    },
    "hits": {
    "total": {
    "value": 2222,
    "relation": "eq"
    },
    "max_score": 1.0,
    "hits": [
    {
    "_index": "articles",
    "_type": "_doc",
    "_id": "442",
    "_score": 1.0,
    "_source": {
    "id": 442,
    "title": "深入学习HTML5的history API"
    }
    },
    {
    "_index": "articles",
    "_type": "_doc",
    "_id": "450",
    "_score": 1.0,
    "_source": {
    "id": 450,
    "title": "想让百度删除不想收录的域名或快照的最快解决方法"
    }
    },
    {
    "_index": "articles",
    "_type": "_doc",
    "_id": "466",
    "_score": 1.0,
    "_source": {
    "id": 466,
    "title": "PHP采集远程图片保存本地"
    }
    },
    {
    "_index": "articles",
    "_type": "_doc",
    "_id": "490",
    "_score": 1.0,
    "_source": {
    "id": 490,
    "title": "8个最佳Web开发资源推荐"
    }
    },
    {
    "_index": "articles",
    "_type": "_doc",
    "_id": "530",
    "_score": 1.0,
    "_source": {
    "id": 530,
    "title": "前方高能反应!设计师最常见的五个设计误区"
    }
    }
    ]
    }
    }
  2. 使用get/post带请求体查询

    请求方式:post/get

    请求地址:http://127.0.0.1:9903/articles/_search

    请求体:

    查询所有,返回指定fields字段,不返回_source,请求条数为5,从第10条开始获取。

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    {
    "query": {
    "match_all": {}
    },
    "fields": [
    "id",
    "title"
    ],
    "_source": false,
    "size": 5,
    "from": 10
    }

    返回结果:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    84
    85
    86
    87
    88
    89
    {
    "took": 4,
    "timed_out": false,
    "_shards": {
    "total": 3,
    "successful": 3,
    "skipped": 0,
    "failed": 0
    },
    "hits": {
    "total": {
    "value": 2222,
    "relation": "eq"
    },
    "max_score": 1.0,
    "hits": [
    {
    "_index": "articles",
    "_type": "_doc",
    "_id": "442",
    "_score": 1.0,
    "fields": {
    "title": [
    "深入学习HTML5的history API"
    ],
    "id": [
    442
    ]
    }
    },
    {
    "_index": "articles",
    "_type": "_doc",
    "_id": "450",
    "_score": 1.0,
    "fields": {
    "title": [
    "想让百度删除不想收录的域名或快照的最快解决方法"
    ],
    "id": [
    450
    ]
    }
    },
    {
    "_index": "articles",
    "_type": "_doc",
    "_id": "466",
    "_score": 1.0,
    "fields": {
    "title": [
    "PHP采集远程图片保存本地"
    ],
    "id": [
    466
    ]
    }
    },
    {
    "_index": "articles",
    "_type": "_doc",
    "_id": "490",
    "_score": 1.0,
    "fields": {
    "title": [
    "8个最佳Web开发资源推荐"
    ],
    "id": [
    490
    ]
    }
    },
    {
    "_index": "articles",
    "_type": "_doc",
    "_id": "530",
    "_score": 1.0,
    "fields": {
    "title": [
    "前方高能反应!设计师最常见的五个设计误区"
    ],
    "id": [
    530
    ]
    }
    }
    ]
    }
    }

根据文档id查询

1
2
3
4
GET <index>/_doc/<_id>      查询指定文档id的文档信息
HEAD <index>/_doc/<_id> 查询指定文档id的文档是否存在,只判断文档是否存在,head 返回的信息更少、 性能更高,满足特殊业务场景使用:
GET <index>/_source/<_id> 查询指定文档id,只返回 _source 信息
HEAD <index>/_source/<_id> 查询指定文档id的文档是否存在,只判断文档是否存在,head 返回的信息更少、 性能更高,满足特殊业务场景使用:

语法:GET elasticsearch服务器地址/索引/_doc/文档id

可选参数:

_source:只获取 _source 部分参数,类似数据库查询中的指定字段,而不是 select * 返回 所有字段(多个字段之间使用逗号分隔);默认返回所有字段;设为false不返回任何字段

查询id为530的文档,只显示id和title

  1. 使用_doc查询,返回文档信息

    请求方式:GET

    请求地址:http://127.0.0.1:9903/articles/_doc/530?_source=title,id

    返回结果:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    {
    "_index": "articles",
    "_type": "_doc",
    "_id": "530",
    "_version": 1,
    "_seq_no": 205,
    "_primary_term": 4,
    "found": true,
    "_source": {
    "id": 530,
    "title": "前方高能反应!设计师最常见的五个设计误区"
    }
    }
  2. 使用_source查询,只返回source

    请求方式:GET

    请求地址:http://127.0.0.1:9903/articles/_source/530?_source=title,id 或者http://127.0.0.1:9903/articles/_source/530?_source_includes=title,id

    返回结果:

    1
    2
    3
    4
    {
    "id": 530,
    "title": "前方高能反应!设计师最常见的五个设计误区"
    }

批量查询

Mutil get:ES 同时支持批量查询,需要使用 _mget API

查询文档 ID 等于 466 和 490 的文档信息

内容太长,此处只取id和title

请求方式:get/post

请求地址:http://127.0.0.1:9903/articles/_mget?_source=title,id

请求体:

1
2
3
4
5
6
7
8
9
10
{
"docs": [
{
"_id": "466"
},
{
"_id": "490"
}
]
}

返回结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
{
"docs": [
{
"_index": "articles",
"_type": "_doc",
"_id": "466",
"_version": 1,
"_seq_no": 195,
"_primary_term": 4,
"found": true,
"_source": {
"id": 466,
"title": "PHP采集远程图片保存本地"
}
},
{
"_index": "articles",
"_type": "_doc",
"_id": "490",
"_version": 1,
"_seq_no": 198,
"_primary_term": 4,
"found": true,
"_source": {
"id": 490,
"title": "8个最佳Web开发资源推荐"
}
}
]
}

Query DSL

查询索引包括全文本查询、组合查询、结构化查询等。

Search和Filter区别

  1. Query 查询

    用于解答文档是否存在,并且告知返回文档与查询条件的匹配度,返回 _score 评分 供用户选择。

  2. Filter 查询

    只用于返回文档是否与查询匹配,但是不会告诉你匹配度,即不进行评分。在做聚 合查询时,filter 经常发挥更大的作用。因为没有评分 Elasticsearch 的处理速度就会提高,提升了整体响应时间。同时 filter 可以缓存查询结果,而 Query 则不能缓存。

使用场景

如果涉及到全文检索以及评分相关业务使用 Query,其他场景推荐使用 Filter 查询。

组合查询

Boolean 查询

Boolean 查询包含 must、filter、should、must_not。

must :必须匹配并且返回评分(文档 必须 匹配这些条件才能被包含进来。);

filter 忽略评分,(必须 匹配,但它以不评分、过滤模式来进行。这些语句对评分没有贡献,只是根据过滤标准来排除或包含文档。)

should 相当于数据库查询中的 or,针对 should 有一个特殊的情况,也就是所有的搜索只有 should ,那么必须满足should 里的其中一个才会被搜索到。(如果满足这些语句中的任意语句,将增加 _score ,否则,无任何影响。它们主要用于修正每个文档的相关性得分。)

must_not 为不匹配,相当于不等于(文档 必须不 匹配这些条件才能被包含进来。)。

  • 查询作者为2;类别为3;浏览量不在2000-8000之间的文档

    请求方式:get/post

    请求地址:http://127.0.0.1:9903/articles/_search?_source=title,id,author,views,cat

    请求体:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
{
"query": {
"bool": {
"must": {
"term": {
"author": 2
}
},
"filter": {
"term": {
"cat": 4
}
},
"must_not": [
{
"range": {
"views": {
"gte": 2000,
"lte": 8000
}
}
}
]
}
}
}

​ 返回结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
{
"took": 7,
"timed_out": false,
"_shards": {
"total": 3,
"successful": 3,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 7,
"relation": "eq"
},
"max_score": 1.0,
"hits": [
{
"_index": "articles",
"_type": "_doc",
"_id": "2047",
"_score": 1.0,
"_source": {
"author": 2,
"cat": 4,
"id": 2047,
"title": "虚拟机使用lvm管理新增磁盘",
"views": 8848
}
},
{
"_index": "articles",
"_type": "_doc",
"_id": "927",
"_score": 1.0,
"_source": {
"author": 2,
"cat": 4,
"id": 927,
"title": "为什么说编程是有史以来最好的工作",
"views": 1237
}
},
{
"_index": "articles",
"_type": "_doc",
"_id": "457",
"_score": 1.0,
"_source": {
"author": 2,
"cat": 4,
"id": 457,
"title": "Flex 布局语法教程",
"views": 1168
}
},
{
"_index": "articles",
"_type": "_doc",
"_id": "597",
"_score": 1.0,
"_source": {
"author": 2,
"cat": 4,
"id": 597,
"title": "从浏览器多进程到JS单线程,JS运行机制最全面的一次梳理",
"views": 8824
}
},
{
"_index": "articles",
"_type": "_doc",
"_id": "2028",
"_score": 1.0,
"_source": {
"author": 2,
"cat": 4,
"id": 2028,
"title": "Docker 入门教程03 使用容器工作",
"views": 9479
}
},
{
"_index": "articles",
"_type": "_doc",
"_id": "246",
"_score": 1.0,
"_source": {
"author": 2,
"cat": 4,
"id": 246,
"title": "Unicode与JavaScript详解",
"views": 42
}
},
{
"_index": "articles",
"_type": "_doc",
"_id": "1541",
"_score": 1.0,
"_source": {
"author": 2,
"cat": 4,
"id": 1541,
"title": " 你还在用 os.path?快来感受一下 pathlib 给你带来的便捷吧! ",
"views": 9638
}
}
]
}
}

删除

删除所有文档

请求路径:POST /索引名/_delete_by_query

请求体:

1
2
3
4
5
{
"query": {
"match_all": {}
}
}

示例

  • 查询标题包含python web的文档

    请求路径:GET http://127.0.0.1:9903/articles/_search?_source=title,id,author,views,cat&size=5&q=title:python web

    返回结果:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    84
    {
    "took": 11,
    "timed_out": false,
    "_shards": {
    "total": 3,
    "successful": 3,
    "skipped": 0,
    "failed": 0
    },
    "hits": {
    "total": {
    "value": 816,
    "relation": "eq"
    },
    "max_score": 4.800988,
    "hits": [
    {
    "_index": "articles",
    "_type": "_doc",
    "_id": "1385",
    "_score": 4.800988,
    "_source": {
    "author": 13,
    "cat": 2,
    "id": 1385,
    "title": " Python爬虫利器四之PhantomJS的用法 ",
    "views": 7790
    }
    },
    {
    "_index": "articles",
    "_type": "_doc",
    "_id": "1392",
    "_score": 4.795393,
    "_source": {
    "author": 20,
    "cat": 2,
    "id": 1392,
    "title": " Python爬虫进阶一之爬虫框架概述 ",
    "views": 5728
    }
    },
    {
    "_index": "articles",
    "_type": "_doc",
    "_id": "459",
    "_score": 4.630121,
    "_source": {
    "author": 18,
    "cat": 7,
    "id": 459,
    "title": "web前端规范",
    "views": 1731
    }
    },
    {
    "_index": "articles",
    "_type": "_doc",
    "_id": "1329",
    "_score": 4.488963,
    "_source": {
    "author": 19,
    "cat": 3,
    "id": 1329,
    "title": " [Python3网络爬虫开发实战] 1.6-Web库的安装 ",
    "views": 1492
    }
    },
    {
    "_index": "articles",
    "_type": "_doc",
    "_id": "342",
    "_score": 4.456541,
    "_source": {
    "author": 5,
    "cat": 7,
    "id": 342,
    "title": "浅谈大型web系统架构",
    "views": 4317
    }
    }
    ]
    }
    }
  • 请求所有字段中包含python web的文档

    请求路径:GET http://127.0.0.1:9903/articles/_search?_source=title,id,author,views,cat&size=5&q=_all:python web

    返回结果:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    84
    {
    "took": 7,
    "timed_out": false,
    "_shards": {
    "total": 3,
    "successful": 3,
    "skipped": 0,
    "failed": 0
    },
    "hits": {
    "total": {
    "value": 576,
    "relation": "eq"
    },
    "max_score": 4.630121,
    "hits": [
    {
    "_index": "articles",
    "_type": "_doc",
    "_id": "459",
    "_score": 4.630121,
    "_source": {
    "author": 18,
    "cat": 7,
    "id": 459,
    "title": "web前端规范",
    "views": 1731
    }
    },
    {
    "_index": "articles",
    "_type": "_doc",
    "_id": "342",
    "_score": 4.456541,
    "_source": {
    "author": 5,
    "cat": 7,
    "id": 342,
    "title": "浅谈大型web系统架构",
    "views": 4317
    }
    },
    {
    "_index": "articles",
    "_type": "_doc",
    "_id": "812",
    "_score": 4.456541,
    "_source": {
    "author": 1,
    "cat": 3,
    "id": 812,
    "title": "想做web开发 就学JavaScript",
    "views": 8885
    }
    },
    {
    "_index": "articles",
    "_type": "_doc",
    "_id": "410",
    "_score": 4.3550134,
    "_source": {
    "author": 16,
    "cat": 4,
    "id": 410,
    "title": "Web开发初学指南",
    "views": 4300
    }
    },
    {
    "_index": "articles",
    "_type": "_doc",
    "_id": "578",
    "_score": 4.3550134,
    "_source": {
    "author": 10,
    "cat": 1,
    "id": 578,
    "title": "Web Worker 使用教程",
    "views": 2814
    }
    }
    ]
    }
    }
  • 全文搜索标题包含python或web的文档,使用请求体的方式

    请求路径:GET/POST http://127.0.0.1:9903/articles/_search

    请求体:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    {
    "from": 0,
    "size": 5,
    "_source": [
    "id",
    "title"
    ],
    "query": {
    "match": {
    "title": "python web"
    }
    }
    }
  • 搜索标题包含python爬虫短语的文档

    请求路径:GET/POST http://127.0.0.1:9903/articles/_search

    请求体:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    {
    "from": 0,
    "size": 10,
    "_source": [
    "id",
    "title"
    ],
    "query": {
    "match_phrase": {
    "title": "python爬虫"
    }
    }
    }

    返回结果:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    84
    85
    86
    87
    88
    89
    90
    91
    92
    93
    94
    95
    96
    97
    98
    99
    100
    101
    102
    103
    104
    105
    106
    107
    108
    109
    110
    111
    112
    113
    114
    115
    116
    117
    118
    119
    {
    "took": 9,
    "timed_out": false,
    "_shards": {
    "total": 3,
    "successful": 3,
    "skipped": 0,
    "failed": 0
    },
    "hits": {
    "total": {
    "value": 28,
    "relation": "eq"
    },
    "max_score": 6.0065117,
    "hits": [
    {
    "_index": "articles",
    "_type": "_doc",
    "_id": "281",
    "_score": 6.0065117,
    "_source": {
    "id": 281,
    "title": "什么是Python爬虫?"
    }
    },
    {
    "_index": "articles",
    "_type": "_doc",
    "_id": "1206",
    "_score": 5.663828,
    "_source": {
    "id": 1206,
    "title": "什么是Python爬虫?"
    }
    },
    {
    "_index": "articles",
    "_type": "_doc",
    "_id": "1361",
    "_score": 5.119743,
    "_source": {
    "id": 1361,
    "title": " 自建免费PYTHON爬虫代理IP池 "
    }
    },
    {
    "_index": "articles",
    "_type": "_doc",
    "_id": "1567",
    "_score": 5.084609,
    "_source": {
    "id": 1567,
    "title": " Python爬虫入门一之综述 "
    }
    },
    {
    "_index": "articles",
    "_type": "_doc",
    "_id": "1387",
    "_score": 4.8739586,
    "_source": {
    "id": 1387,
    "title": " Python爬虫利器五之Selenium的用法 "
    }
    },
    {
    "_index": "articles",
    "_type": "_doc",
    "_id": "1569",
    "_score": 4.863277,
    "_source": {
    "id": 1569,
    "title": " Python爬虫入门二之爬虫基础了解 "
    }
    },
    {
    "_index": "articles",
    "_type": "_doc",
    "_id": "1385",
    "_score": 4.863277,
    "_source": {
    "id": 1385,
    "title": " Python爬虫利器四之PhantomJS的用法 "
    }
    },
    {
    "_index": "articles",
    "_type": "_doc",
    "_id": "1386",
    "_score": 4.863277,
    "_source": {
    "id": 1386,
    "title": " Python爬虫利器六之PyQuery的用法 "
    }
    },
    {
    "_index": "articles",
    "_type": "_doc",
    "_id": "1537",
    "_score": 4.837265,
    "_source": {
    "id": 1537,
    "title": " Python 爬虫利器之 Pyppeteer 的用法 "
    }
    },
    {
    "_index": "articles",
    "_type": "_doc",
    "_id": "1381",
    "_score": 4.642378,
    "_source": {
    "id": 1381,
    "title": " Python爬虫利器一之Requests库的用法 "
    }
    }
    ]
    }
    }
  • 精确查找term,查找文档字段id为299的文档

    请求路径:GET/POST http://127.0.0.1:9903/articles/_search

    请求体:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    {
    "_source": [
    "id",
    "title"
    ],
    "query": {
    "term": {
    "id": 299
    }
    }
    }

    返回结果:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    {
    "took": 3,
    "timed_out": false,
    "_shards": {
    "total": 3,
    "successful": 3,
    "skipped": 0,
    "failed": 0
    },
    "hits": {
    "total": {
    "value": 1,
    "relation": "eq"
    },
    "max_score": 1.0,
    "hits": [
    {
    "_index": "articles",
    "_type": "_doc",
    "_id": "299",
    "_score": 1.0,
    "_source": {
    "id": 299,
    "title": "为什么我要说 JavaScript 对象字面量很酷?"
    }
    }
    ]
    }
    }
  • 范围查找 range,查找点击量在9500到10000的文档

    请求路径:GET/POST http://127.0.0.1:9903/articles/_search

    请求体:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    {
    "_source": [
    "id",
    "title",
    "views"
    ],
    "query": {
    "range": {
    "views": {
    "gte": 9500,
    "lte": 10000
    }
    }
    }
    }

    返回结果:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    84
    85
    86
    87
    88
    89
    90
    91
    92
    93
    94
    95
    96
    97
    98
    99
    100
    101
    102
    103
    104
    105
    106
    107
    108
    109
    110
    111
    112
    113
    114
    115
    116
    117
    118
    119
    120
    121
    122
    123
    124
    125
    126
    127
    128
    129
    {
    "took": 5,
    "timed_out": false,
    "_shards": {
    "total": 3,
    "successful": 3,
    "skipped": 0,
    "failed": 0
    },
    "hits": {
    "total": {
    "value": 107,
    "relation": "eq"
    },
    "max_score": 1.0,
    "hits": [
    {
    "_index": "articles",
    "_type": "_doc",
    "_id": "230",
    "_score": 1.0,
    "_source": {
    "id": 230,
    "title": "理解矩阵乘法",
    "views": 9746
    }
    },
    {
    "_index": "articles",
    "_type": "_doc",
    "_id": "326",
    "_score": 1.0,
    "_source": {
    "id": 326,
    "title": "jQuery+JSONP通过调用虾米接口实现类似点点网发布音乐的功能",
    "views": 9713
    }
    },
    {
    "_index": "articles",
    "_type": "_doc",
    "_id": "390",
    "_score": 1.0,
    "_source": {
    "id": 390,
    "title": "大型网站的 HTTPS 实践(二):HTTPS 对性能的影响",
    "views": 9660
    }
    },
    {
    "_index": "articles",
    "_type": "_doc",
    "_id": "711",
    "_score": 1.0,
    "_source": {
    "id": 711,
    "title": "HTML特殊符号对照表大全",
    "views": 9736
    }
    },
    {
    "_index": "articles",
    "_type": "_doc",
    "_id": "890",
    "_score": 1.0,
    "_source": {
    "id": 890,
    "title": "JavaScript易错知识点整理",
    "views": 9808
    }
    },
    {
    "_index": "articles",
    "_type": "_doc",
    "_id": "930",
    "_score": 1.0,
    "_source": {
    "id": 930,
    "title": "调试 CSS 的方法",
    "views": 9824
    }
    },
    {
    "_index": "articles",
    "_type": "_doc",
    "_id": "92",
    "_score": 1.0,
    "_source": {
    "id": 92,
    "title": "如何使用 Issue 管理软件项目?",
    "views": 9834
    }
    },
    {
    "_index": "articles",
    "_type": "_doc",
    "_id": "1894",
    "_score": 1.0,
    "_source": {
    "id": 1894,
    "title": "[静下心来看python]-[11]-[__seq__]",
    "views": 9597
    }
    },
    {
    "_index": "articles",
    "_type": "_doc",
    "_id": "1916",
    "_score": 1.0,
    "_source": {
    "id": 1916,
    "title": "python 冒泡排序",
    "views": 9805
    }
    },
    {
    "_index": "articles",
    "_type": "_doc",
    "_id": "1931",
    "_score": 1.0,
    "_source": {
    "id": 1931,
    "title": "yun update 如何更新 安全补丁",
    "views": 9825
    }
    }
    ]
    }
    }
  • 高亮搜索 highlight 搜索标题包含python web 编程,并高亮显示关键词

    请求路径:GET/POST http://127.0.0.1:9903/articles/_search

    请求体:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    {
    "_source": [
    "id",
    "title",
    "views"
    ],
    "query": {
    "match": {
    "title": "python web 编程"
    }
    },
    "highlight":{
    "fields":{
    "title":{}
    }
    }
    }

    返回结果:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    84
    85
    86
    87
    88
    89
    90
    91
    92
    93
    94
    95
    96
    97
    98
    99
    100
    101
    102
    103
    104
    105
    106
    107
    108
    109
    110
    111
    112
    113
    114
    115
    116
    117
    118
    119
    120
    121
    122
    123
    124
    125
    126
    127
    128
    129
    130
    131
    132
    133
    134
    135
    136
    137
    138
    139
    140
    141
    142
    143
    144
    145
    146
    147
    148
    149
    150
    151
    152
    153
    154
    155
    156
    157
    158
    159
    160
    161
    162
    163
    164
    165
    166
    167
    168
    169
    170
    171
    172
    173
    174
    175
    176
    177
    178
    179
    {
    "took": 91,
    "timed_out": false,
    "_shards": {
    "total": 3,
    "successful": 3,
    "skipped": 0,
    "failed": 0
    },
    "hits": {
    "total": {
    "value": 409,
    "relation": "eq"
    },
    "max_score": 6.9713564,
    "hits": [
    {
    "_index": "articles",
    "_type": "_doc",
    "_id": "1183",
    "_score": 6.9713564,
    "_source": {
    "id": 1183,
    "title": "如何自学python编程入门?",
    "views": 9077
    },
    "highlight": {
    "title": [
    "如何自学<em>python</em><em>编程</em>入门?"
    ]
    }
    },
    {
    "_index": "articles",
    "_type": "_doc",
    "_id": "1203",
    "_score": 6.8202724,
    "_source": {
    "id": 1203,
    "title": "Python编程游戏有哪些?",
    "views": 3612
    },
    "highlight": {
    "title": [
    "<em>Python</em><em>编程</em>游戏有哪些?"
    ]
    }
    },
    {
    "_index": "articles",
    "_type": "_doc",
    "_id": "977",
    "_score": 6.6650963,
    "_source": {
    "id": 977,
    "title": "搞懂了这几点,你就学会了Web编程",
    "views": 1732
    },
    "highlight": {
    "title": [
    "搞懂了这几点,你就学会了<em>Web</em><em>编程</em>"
    ]
    }
    },
    {
    "_index": "articles",
    "_type": "_doc",
    "_id": "1190",
    "_score": 6.639513,
    "_source": {
    "id": 1190,
    "title": "Python函数式编程实例详解",
    "views": 3830
    },
    "highlight": {
    "title": [
    "<em>Python</em>函数式<em>编程</em>实例详解"
    ]
    }
    },
    {
    "_index": "articles",
    "_type": "_doc",
    "_id": "1186",
    "_score": 6.614889,
    "_source": {
    "id": 1186,
    "title": "Python编程100例—海绵宝宝",
    "views": 9480
    },
    "highlight": {
    "title": [
    "<em>Python</em><em>编程</em>100例—海绵宝宝"
    ]
    }
    },
    {
    "_index": "articles",
    "_type": "_doc",
    "_id": "1187",
    "_score": 6.3046937,
    "_source": {
    "id": 1187,
    "title": "Python编程应该下载什么软件?",
    "views": 2521
    },
    "highlight": {
    "title": [
    "<em>Python</em><em>编程</em>应该下载什么软件?"
    ]
    }
    },
    {
    "_index": "articles",
    "_type": "_doc",
    "_id": "1189",
    "_score": 5.788736,
    "_source": {
    "id": 1189,
    "title": "Python编程例子:使用Python编写简单的画图板软件",
    "views": 3266
    },
    "highlight": {
    "title": [
    "<em>Python</em><em>编程</em>例子:使用<em>Python</em>编写简单的画图板软件"
    ]
    }
    },
    {
    "_index": "articles",
    "_type": "_doc",
    "_id": "1012",
    "_score": 5.7351246,
    "_source": {
    "id": 1012,
    "title": "python编程中双斜杠是什么意思?",
    "views": 5075
    },
    "highlight": {
    "title": [
    "<em>python</em><em>编程</em>中双斜杠是什么意思?"
    ]
    }
    },
    {
    "_index": "articles",
    "_type": "_doc",
    "_id": "1174",
    "_score": 5.7270813,
    "_source": {
    "id": 1174,
    "title": "python编程中双斜杠是什么意思?",
    "views": 9410
    },
    "highlight": {
    "title": [
    "<em>python</em><em>编程</em>中双斜杠是什么意思?"
    ]
    }
    },
    {
    "_index": "articles",
    "_type": "_doc",
    "_id": "1185",
    "_score": 5.639056,
    "_source": {
    "id": 1185,
    "title": "Python编程例子: python中计算三次方怎么表示?",
    "views": 4959
    },
    "highlight": {
    "title": [
    "<em>Python</em><em>编程</em>例子: <em>python</em>中计算三次方怎么表示?"
    ]
    }
    }
    ]
    }
    }
  • 搜索标题包含python 爬虫的文档,日期倒序,匹配分数倒序

    请求路径:GET/POST http://127.0.0.1:9903/articles/_search

    请求体:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    {
    "_source": [
    "id",
    "title",
    "views",
    "createtime"
    ],
    "query": {
    "match": {
    "title": "python爬虫"
    }
    },
    "sort":[
    {
    "createtime":{
    "order":"desc"
    }
    },
    {
    "_score":{
    "order":"desc"
    }
    }
    ]

    }

    返回结果:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    84
    85
    86
    87
    88
    89
    90
    91
    92
    93
    94
    95
    96
    97
    98
    99
    100
    101
    102
    103
    104
    105
    106
    107
    108
    109
    110
    111
    112
    113
    114
    115
    116
    117
    118
    119
    120
    121
    122
    123
    124
    125
    126
    127
    128
    129
    130
    131
    132
    133
    134
    135
    136
    137
    138
    139
    140
    141
    142
    143
    144
    145
    146
    147
    148
    149
    150
    151
    152
    153
    154
    155
    156
    157
    158
    159
    160
    161
    162
    163
    164
    165
    166
    167
    168
    169
    170
    171
    172
    173
    174
    175
    176
    177
    178
    179
    {
    "took": 14,
    "timed_out": false,
    "_shards": {
    "total": 3,
    "successful": 3,
    "skipped": 0,
    "failed": 0
    },
    "hits": {
    "total": {
    "value": 307,
    "relation": "eq"
    },
    "max_score": null,
    "hits": [
    {
    "_index": "articles",
    "_type": "_doc",
    "_id": "1331",
    "_score": 3.1511018,
    "_source": {
    "createtime": "2021-05-20T16:00:00.000Z",
    "id": 1331,
    "title": " [Python3网络爬虫开发实战] 1.7-App爬取相关库的安装 ",
    "views": 9039
    },
    "sort": [
    1621526400000,
    3.1511018
    ]
    },
    {
    "_index": "articles",
    "_type": "_doc",
    "_id": "1617",
    "_score": 3.7443776,
    "_source": {
    "createtime": "2021-04-04T16:00:00.000Z",
    "id": 1617,
    "title": " [Python3网络爬虫开发实战] 15.4–Scrapyd 批量部署 ",
    "views": 3554
    },
    "sort": [
    1617552000000,
    3.7443776
    ]
    },
    {
    "_index": "articles",
    "_type": "_doc",
    "_id": "1596",
    "_score": 3.7840004,
    "_source": {
    "createtime": "2021-02-22T16:00:00.000Z",
    "id": 1596,
    "title": " [Python3网络爬虫开发实战] 9.4-ADSL 拨号代理 ",
    "views": 5475
    },
    "sort": [
    1614009600000,
    3.7840004
    ]
    },
    {
    "_index": "articles",
    "_type": "_doc",
    "_id": "1328",
    "_score": 3.7831829,
    "_source": {
    "createtime": "2021-01-27T16:00:00.000Z",
    "id": 1328,
    "title": " [Python3网络爬虫开发实战] 1.2.5-PhantomJS的安装 ",
    "views": 7585
    },
    "sort": [
    1611763200000,
    3.7831829
    ]
    },
    {
    "_index": "articles",
    "_type": "_doc",
    "_id": "1330",
    "_score": 3.6084993,
    "_source": {
    "createtime": "2020-12-20T16:00:00.000Z",
    "id": 1330,
    "title": " [Python3网络爬虫开发实战] 1.6.1-Flask的安装 ",
    "views": 4703
    },
    "sort": [
    1608480000000,
    3.6084993
    ]
    },
    {
    "_index": "articles",
    "_type": "_doc",
    "_id": "1283",
    "_score": 1.9249799,
    "_source": {
    "createtime": "2020-12-02T16:00:00.000Z",
    "id": 1283,
    "title": " Python3 中使用 Pathlib 模块进行文件操作 ",
    "views": 7660
    },
    "sort": [
    1606924800000,
    1.9249799
    ]
    },
    {
    "_index": "articles",
    "_type": "_doc",
    "_id": "1238",
    "_score": 3.6084993,
    "_source": {
    "createtime": "2020-10-16T16:00:00.000Z",
    "id": 1238,
    "title": " [Python3网络爬虫开发实战] 13.5–Downloader Middleware 的用法 ",
    "views": 1392
    },
    "sort": [
    1602864000000,
    3.6084993
    ]
    },
    {
    "_index": "articles",
    "_type": "_doc",
    "_id": "1199",
    "_score": 2.3096707,
    "_source": {
    "createtime": "2020-09-10T16:00:00.000Z",
    "id": 1199,
    "title": "python怎么下载安装re库?",
    "views": 2390
    },
    "sort": [
    1599753600000,
    2.3096707
    ]
    },
    {
    "_index": "articles",
    "_type": "_doc",
    "_id": "1329",
    "_score": 3.4821367,
    "_source": {
    "createtime": "2020-06-14T16:00:00.000Z",
    "id": 1329,
    "title": " [Python3网络爬虫开发实战] 1.6-Web库的安装 ",
    "views": 1492
    },
    "sort": [
    1592150400000,
    3.4821367
    ]
    },
    {
    "_index": "articles",
    "_type": "_doc",
    "_id": "1237",
    "_score": 2.1997697,
    "_source": {
    "createtime": "2020-03-28T16:00:00.000Z",
    "id": 1237,
    "title": " Python 中异常处理库 merry 的用法 ",
    "views": 6264
    },
    "sort": [
    1585411200000,
    2.1997697
    ]
    }
    ]
    }
    }
  • boost 提升权重,优化排序 标题包含python 爬虫,权重变为2倍

    请求路径:GET/POST http://127.0.0.1:9903/articles/_search

    请求体:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    {
    "size": 5,
    "_source": [
    "id",
    "title",
    "views",
    "createtime"
    ],
    "query": {
    "match": {
    "title": {
    "query": "python 爬虫",
    "boost": 2
    }
    }
    }
    }

    返回结果:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    {
    "took": 6,
    "timed_out": false,
    "_shards": {
    "total": 3,
    "successful": 3,
    "skipped": 0,
    "failed": 0
    },
    "hits": {
    "total": {
    "value": 307,
    "relation": "eq"
    },
    "max_score": 12.013023,
    "hits": [
    {
    "_index": "articles",
    "_type": "_doc",
    "_id": "281",
    "_score": 12.013023,
    "_source": {
    "createtime": "2003-11-18T16:00:00.000Z",
    "id": 281,
    "title": "什么是Python爬虫?",
    "views": 7830
    }
    },
    {
    "_index": "articles",
    "_type": "_doc",
    "_id": "1569",
    "_score": 11.6157055,
    "_source": {
    "createtime": "2015-04-18T16:00:00.000Z",
    "id": 1569,
    "title": " Python爬虫入门二之爬虫基础了解 ",
    "views": 880
    }
    },
    {
    "_index": "articles",
    "_type": "_doc",
    "_id": "1206",
    "_score": 11.327656,
    "_source": {
    "createtime": "2005-12-16T16:00:00.000Z",
    "id": 1206,
    "title": "什么是Python爬虫?",
    "views": 6231
    }
    },
    {
    "_index": "articles",
    "_type": "_doc",
    "_id": "1392",
    "_score": 11.009354,
    "_source": {
    "createtime": "1999-10-02T16:00:00.000Z",
    "id": 1392,
    "title": " Python爬虫进阶一之爬虫框架概述 ",
    "views": 5728
    }
    },
    {
    "_index": "articles",
    "_type": "_doc",
    "_id": "1352",
    "_score": 10.239487,
    "_source": {
    "createtime": "2007-07-31T16:00:00.000Z",
    "id": 1352,
    "title": " Python3爬虫视频学习教程 ",
    "views": 7287
    }
    }
    ]
    }
    }

Logstash

Logstash是一个开源数据收集引擎,具有实时管道功能。Logstash可以动态地将来自不同数据源的数据统一起来,并将数据标准化到你所选择的目的地。

除了Logstash,日常我们向elasticsearch中添加数据时候还可以使用哪种方式呢?

Haystack

之前文章里专门介绍了在django中使用elasticsearch,当时我们与elasticsearch对接的是Haystack,它是在Django中对接搜索引擎的框架,搭建了用户和搜索引擎之间的沟通桥梁。我们在Django中可以通过使用 Haystack 来调用 Elasticsearch 搜索引擎。

手动添加数据

Logstash通常用于系统上线初期原始数据的导入,但是上线后我们通常会在对数据库进行增删改的时候同步的对elasticsearch进行操作,其实Haystack做的就是这样的事情,不过它服务于django。我们在其他框架或原生开发中该如何做呢?

其实就是刚才描述的,当向数据库进行数据更新的时候,我们会同步手动对elasticsearch进行同步操作,等同于手动对elasticsearch服务的api接口进行数据交互。

Logstash使用

安装

现在官网Logstash下载,注意版本一定要跟你的elasticsearch版本一致,网上很多关于yum的安装方式这里不在进行介绍

从MySQL导入数据到Elasticsearch

Logstash安装好后不用对elasticsearch进行任何配置,只需要用到它的时候进行配置。

下载mysql驱动包

下载连接mysql的驱动包,放到指定目录下
在地址 https://dev.mysql.com/downloads/connector/j/ 下载最新的Connector。下载完这个Connector后,记住这个驱动报的位置哟,最好跟logstash放在一起。

创建配置文件

创建导入elasticsearch配置文件,logstash_mysql.conf

  • jdbc_connection_string:连接的数据库地址,端口号,数据库名,字符编码,时区等,blog_db为数据库名

  • jdbc_user:连接数据库的用户名

  • jdbc_password:链接数据库的密码

  • jdbc_driver_library: 驱动包路径,若是在logstash指定目录下则留空,若不是则需要指定绝对路径

  • jdbc_driver_class: 最新使用的驱动包类(一般都是:com.mysql.jdbc.Driver

  • statement:sql语句,查询articles数据表所有数据,字段可以使用as,与elasticsearch索引中的设置映射字段保持一致

  • tracking_column:没有看官方文档,我自己的理解是将数据库中查询出来的该字段值,作为elasticsearch索引中的文档id值,与document_id配置刚好对应。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
input{
jdbc {
jdbc_driver_library => "/home/python/mysql-connector-java-8.0.13/mysql-connector-java-8.0.13.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
jdbc_connection_string => "jdbc:mysql://127.0.0.1:3306/blog_db?tinyInt1isBit=false"
jdbc_user => "root"
jdbc_password => "mysql"
jdbc_paging_enabled => "true"
jdbc_page_size => "1000"
jdbc_default_timezone =>"Asia/Shanghai"
statement => "select article_id, author, title, content, createtime, cat, views, fav from articles"
use_column_value => "true"
tracking_column => "article_id"
clean_run => true
}
}
output{
elasticsearch {
hosts => "127.0.0.1:9200"
index => "articles"
document_id => "%{article_id}"
}
stdout {
codec => json_lines
}
}

开始导入elasticsearch

一行命令即可,过会再查看,数据已经都导进去了。

1
/usr/share/logstash/bin/logstash -f ./logstash_mysql.conf

拼写纠错

对于已经建立的索引库,elasticsearch还提供了一种查询模式,suggest建议查询模式

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
curl 127.0.0.1:9200/articles/_search?pretty -d '
{
"from": 0,
"size": 10,
"_source": false,
"suggest": {
"text": "pthony web",
"word-phrase": {
"phrase": {
"field": "content",
"size": 5
}
}
}
}'

当我们输入错误的关键词pthony web时,es可以提供根据索引库映射字段content中的数据得出的正确拼写建议,size为拼写建议的数量,如以上请求得到的响应为:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
{
"took": 16,
"timed_out": false,
"_shards": {
"total": 3,
"successful": 3,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 0,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"suggest": {
"word-phrase": [
{
"text": "pthony web",
"offset": 0,
"length": 10,
"options": [
{
"text": "python web",
"score": 0.0011069108
},
{
"text": "python3 web",
"score": 4.059151E-4
},
{
"text": "phone web",
"score": 1.8401434E-4
},
{
"text": "python- web",
"score": 1.711725E-4
},
{
"text": "python2 web",
"score": 1.5117846E-4
}
]
}
]
}
}

自动补全

使用elasticsearch提供的自动补全功能,因为文档的类型映射要特殊设置,所以原先建立的文章索引库不能用于自动补全,需要再建立一个自动补全的索引库

创建自动补全数据库

请求地址:PUT 127.0.0.1:9200/completions?pretty

请求体:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
 {
"settings":{
"index.number_of_shards":1,
"index.number_of_replicas":1
},
"mappings":{
"properties":{
"suggest": {
"type": "completion",
"analyzer": "ik_max_word"
}
}
}
}

使用logstash导入初始数据

编辑logstash_mysql_completion.conf

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
input{
jdbc {
jdbc_driver_library => "/Users/tony/Desktop/elasticsearch-7.13.2-cluster/mysql-connector-java-8.0.25/mysql-connector-java-8.0.25.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
jdbc_connection_string => "jdbc:mysql://127.0.0.1:3306/elasticsearch_study_db?tinyInt1isBit=false"
jdbc_user => "root"
jdbc_password => ""
jdbc_paging_enabled => "true"
jdbc_page_size => "1000"
jdbc_default_timezone =>"Asia/Shanghai"
statement => "select title as suggest from articles"
clean_run => true
}
}
output{
elasticsearch {
hosts => "127.0.0.1:9901"
index => "completions"
}
stdout {
codec => json_lines
}
}

执行命令导入数据

1
sudo /usr/share/logstash/bin/logstash -f ./logstash_mysql_completion.conf

测试自动补全

请求地址:GET localhost:9902/completions/_search

请求体:

1
2
3
4
5
6
7
8
9
10
{
"suggest": {
"title-suggest": {
"prefix": "pyth",
"completion": {
"field": "suggest"
}
}
}
}

返回结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
{
"took": 1790,
"timed_out": false,
"_shards": {
"total": 3,
"successful": 3,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 0,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"suggest": {
"title-suggest": [
{
"text": "pyth",
"offset": 0,
"length": 4,
"options": [
{
"text": " Python 中 typing 模块和类型注解的使用 ",
"_index": "completions",
"_type": "_doc",
"_id": "m1qDinoB3Gxwgb_JdrR_",
"_score": 1.0,
"_source": {
"suggest": " Python 中 typing 模块和类型注解的使用 ",
"@timestamp": "2021-07-09T09:05:06.950Z",
"@version": "1"
}
},
{
"text": " Python 中拼音库 PyPinyin 的用法 ",
"_index": "completions",
"_type": "_doc",
"_id": "pFqDinoB3Gxwgb_JdrSA",
"_score": 1.0,
"_source": {
"suggest": " Python 中拼音库 PyPinyin 的用法 ",
"@timestamp": "2021-07-09T09:05:06.951Z",
"@version": "1"
}
},
{
"text": " Python 中更优雅的日志记录方案 loguru ",
"_index": "completions",
"_type": "_doc",
"_id": "XFqDinoB3Gxwgb_Jdrfo",
"_score": 1.0,
"_source": {
"suggest": " Python 中更优雅的日志记录方案 loguru ",
"@timestamp": "2021-07-09T09:05:06.991Z",
"@version": "1"
}
},
{
"text": " Python 使用 environs 库来更好地定义环境变量 ",
"_index": "completions",
"_type": "_doc",
"_id": "FVqDinoB3Gxwgb_Jdra9",
"_score": 1.0,
"_source": {
"suggest": " Python 使用 environs 库来更好地定义环境变量 ",
"@timestamp": "2021-07-09T09:05:06.995Z",
"@version": "1"
}
},
{
"text": " Python 序列化和反序列化库 MarshMallow 的用法 ",
"_index": "completions",
"_type": "_doc",
"_id": "GFqDinoB3Gxwgb_Jdra9",
"_score": 1.0,
"_source": {
"suggest": " Python 序列化和反序列化库 MarshMallow 的用法 ",
"@timestamp": "2021-07-09T09:05:06.995Z",
"@version": "1"
}
}
]
}
]
}
}

python对接elasticsearch服务器

查询数据

安装elasticsearch扩展包

1
pip install elasticsearch

连接elasticsearch服务器

1
2
3
4
5
6
7
8
9
10
11
12
13
14
from elasticsearch import Elasticsearch

# elasticsearch集群服务器地址(没有开启集群默认9200)
ES_CLUSTER = [
"127.0.0.1:9901",
"127.0.0.1:9902",
"127.0.0.1:9903"
]

# 创建elasticsearch客户端
# sniff_on_start:是否启动前嗅探es集群服务器
# sniff_on_connection_fail:es集群服务器结点连接异常时是否刷新es结点信息
# sniffer_timeout:每隔多少秒刷新结点信息
es_client = Elasticsearch(ES_CLUSTER, sniff_on_start=True, sniff_on_connection_fail=True, sniffer_timeout=60)

定义搜索条件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# 搜索条件:查询标题包含python web,作者id为2的5个文档(查询结果返回的额字段为id,title,views,createtime)
query = {
"size": 5,
"_source": [
"id",
"title",
"views",
"createtime"
],
"query": {
"bool": {
"must": [
{"match": {"title": "python web"}}
],
"filter": [
{"term": {"author": 2}}
]
}
}
}

开始搜索

1
2
# 搜索并获取结果:从articles索引库按之前的搜索条件进行搜索
ret = es_client.search(index="articles", body=query)

处理搜索结果

1
2
3
4
5
6
7
8
# 循环读取搜索结果
for item in ret['hits']['hits']:
print("文章ID:", item["_source"]['id'])
# 此处可以根据获取的文章id,通过本地数据库或者缓存,查询文章相关的其他信息
print("文章标题:", item["_source"]['title'])
print("文章点击量:", item["_source"]['views'])
print("创建时间:", item["_source"]['createtime'])
print("-------------------------------------")

预览打印结果

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
文章ID: 1009
文章标题: 移动端 web 开发小技巧
文章点击量: 4531
创建时间: 2015-03-05T16:00:00.000Z
-------------------------------------
文章ID: 99
文章标题: JSON Web Token 入门教程
文章点击量: 5623
创建时间: 2017-10-24T16:00:00.000Z
-------------------------------------
文章ID: 833
文章标题: 大势所趋 HTML5成Web开发者最关心的技术
文章点击量: 4193
创建时间: 1993-10-06T16:00:00.000Z
-------------------------------------
文章ID: 1187
文章标题: Python编程应该下载什么软件?
文章点击量: 2521
创建时间: 2011-02-26T16:00:00.000Z
-------------------------------------
文章ID: 1361
文章标题: 自建免费PYTHON爬虫代理IP池
文章点击量: 7618
创建时间: 2010-01-06T16:00:00.000Z
-------------------------------------

添加数据

在发布文章接口中,除了向数据库保存文章外,还要向es库中添加新文章的索引

1
2
3
4
5
6
7
8
9
doc = {
'id': article.id,
'author': article.author,
'title': article.title,
'content': article.content,
'cat': article.cat,
'createtime': article.ctime
}
es_client.index(index='articles', body=doc, id=article.id)