Elastic Stack如何使用
如果你没有听说过Elastic Stack,那你一定听说过ELK,实际上ELK是三款软件的简称,分别是Elasticsearch
、 Logstash
、Kibana
组成,在发展的过程中,又有新成员Beats的加入,所以就形成了Elastic Stack。所以说,ELK是旧的称呼,Elastic Stack是新的名字。
索引
创建索引
创建默认索引
创建索引api
接口地址:127.0.0.1:9200/articles?pretty
(创建articles索引)
请求方式:put
查看刚才创建好了articles状态
127.0.0.1:9200/articles/?pretty
1 | { |
number_of_shards 是指索引要做多少个分片,只能在创建索引时指定,后期无法修改。(创建时未指定,默认为1)
number_of_replicas 是指每个分片有多少个副本,后期可以动态修改。(创建时未指定,默认为1)
primary shard:主分片,每个文档都存储在一个分片中,当你存储一个文档的时候,系统会首先存储在主分片中,然后会复制到不同的副本中。默认情况下,一个索引有5个主分片。你可以在事先制定分片的数量,当分片一旦建立,分片的数量则不能修改。
replica shard:副本分片,每一个分片有零个或多个副本。副本主要是主分片的复制,可以 增加高可用性,提高性能。
默认情况下,一个主分配有一个副本,但副本的数量可以在后面动态的配置增加。
副本必须部署在不同的节点上,不能部署在和主分片相同的节点上。
创建索引时并设置分片
创建索引api接口地址:127.0.0.1:9200/articles?pretty
(创建articles索引)
请求方式:put
请求体:
1 | { |
查看索引状态
1 | { |
创建索引时并设置映射
创建用户索引api地址:127.0.0.1:9200/users?pretty
请求方式:put
请求体:
1 | { |
type:字段类型
analyzer:分析器(这里使用了ik中文分词器,第三方插件需要安装);不设置默认使用standard标准分析器,即逐个字符拆分。
index:禁用索引,这个字段不能被搜索,但是它并不妨碍做聚合。
doc_values:对一个字段进行排序;对一个字段进行聚合;某些过滤,比如地理位置过滤 某些与字段相关的脚本计算; 使用 docvalue_fields 返回搜索结果部分字段值
查询
查询所有文档
语法:elasticsearch服务地址/索引/_search
可选参数
_source
:只获取 _source 部分参数,类似数据库查询中的指定字段,而不是 select * 返回 所有字段(多个字段之间使用逗号分隔)
size
: 要返回的结果数量,默认为 10
from
: 要跳过的结果数量,默认为 0
查询5篇文章,从第10条开始查询,只显示id和title
使用get带参数请求查询
请求方式:get
请求地址:
http://127.0.0.1:9903/articles/_search?_source=title,id&size=5&from=10
返回结果:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69{
"took": 5,
"timed_out": false,
"_shards": {
"total": 3,
"successful": 3,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 2222,
"relation": "eq"
},
"max_score": 1.0,
"hits": [
{
"_index": "articles",
"_type": "_doc",
"_id": "442",
"_score": 1.0,
"_source": {
"id": 442,
"title": "深入学习HTML5的history API"
}
},
{
"_index": "articles",
"_type": "_doc",
"_id": "450",
"_score": 1.0,
"_source": {
"id": 450,
"title": "想让百度删除不想收录的域名或快照的最快解决方法"
}
},
{
"_index": "articles",
"_type": "_doc",
"_id": "466",
"_score": 1.0,
"_source": {
"id": 466,
"title": "PHP采集远程图片保存本地"
}
},
{
"_index": "articles",
"_type": "_doc",
"_id": "490",
"_score": 1.0,
"_source": {
"id": 490,
"title": "8个最佳Web开发资源推荐"
}
},
{
"_index": "articles",
"_type": "_doc",
"_id": "530",
"_score": 1.0,
"_source": {
"id": 530,
"title": "前方高能反应!设计师最常见的五个设计误区"
}
}
]
}
}使用get/post带请求体查询
请求方式:post/get
请求地址:
http://127.0.0.1:9903/articles/_search
请求体:
查询所有,返回指定fields字段,不返回_source,请求条数为5,从第10条开始获取。
1
2
3
4
5
6
7
8
9
10
11
12{
"query": {
"match_all": {}
},
"fields": [
"id",
"title"
],
"_source": false,
"size": 5,
"from": 10
}返回结果:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89{
"took": 4,
"timed_out": false,
"_shards": {
"total": 3,
"successful": 3,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 2222,
"relation": "eq"
},
"max_score": 1.0,
"hits": [
{
"_index": "articles",
"_type": "_doc",
"_id": "442",
"_score": 1.0,
"fields": {
"title": [
"深入学习HTML5的history API"
],
"id": [
442
]
}
},
{
"_index": "articles",
"_type": "_doc",
"_id": "450",
"_score": 1.0,
"fields": {
"title": [
"想让百度删除不想收录的域名或快照的最快解决方法"
],
"id": [
450
]
}
},
{
"_index": "articles",
"_type": "_doc",
"_id": "466",
"_score": 1.0,
"fields": {
"title": [
"PHP采集远程图片保存本地"
],
"id": [
466
]
}
},
{
"_index": "articles",
"_type": "_doc",
"_id": "490",
"_score": 1.0,
"fields": {
"title": [
"8个最佳Web开发资源推荐"
],
"id": [
490
]
}
},
{
"_index": "articles",
"_type": "_doc",
"_id": "530",
"_score": 1.0,
"fields": {
"title": [
"前方高能反应!设计师最常见的五个设计误区"
],
"id": [
530
]
}
}
]
}
}
根据文档id查询
1 | GET <index>/_doc/<_id> 查询指定文档id的文档信息 |
语法:GET elasticsearch服务器地址/索引/_doc/文档id
可选参数:
_source
:只获取 _source 部分参数,类似数据库查询中的指定字段,而不是 select * 返回 所有字段(多个字段之间使用逗号分隔);默认返回所有字段;设为false不返回任何字段
查询id为530的文档,只显示id和title
使用_doc查询,返回文档信息
请求方式:GET
请求地址:
http://127.0.0.1:9903/articles/_doc/530?_source=title,id
返回结果:
1
2
3
4
5
6
7
8
9
10
11
12
13{
"_index": "articles",
"_type": "_doc",
"_id": "530",
"_version": 1,
"_seq_no": 205,
"_primary_term": 4,
"found": true,
"_source": {
"id": 530,
"title": "前方高能反应!设计师最常见的五个设计误区"
}
}使用_source查询,只返回source
请求方式:GET
请求地址:
http://127.0.0.1:9903/articles/_source/530?_source=title,id
或者http://127.0.0.1:9903/articles/_source/530?_source_includes=title,id
返回结果:
1
2
3
4{
"id": 530,
"title": "前方高能反应!设计师最常见的五个设计误区"
}
批量查询
Mutil get
:ES 同时支持批量查询,需要使用 _mget API
查询文档 ID 等于 466 和 490 的文档信息
内容太长,此处只取id和title
请求方式:get/post
请求地址:http://127.0.0.1:9903/articles/_mget?_source=title,id
请求体:
1 | { |
返回结果:
1 | { |
Query DSL
查询索引包括全文本查询、组合查询、结构化查询等。
Search和Filter区别
Query 查询
用于解答文档是否存在,并且告知返回文档与查询条件的匹配度,返回 _score 评分 供用户选择。
Filter 查询
只用于返回文档是否与查询匹配,但是不会告诉你匹配度,即不进行评分。在做聚 合查询时,filter 经常发挥更大的作用。因为没有评分 Elasticsearch 的处理速度就会提高,提升了整体响应时间。同时 filter 可以缓存查询结果,而 Query 则不能缓存。
使用场景
如果涉及到全文检索以及评分相关业务使用 Query,其他场景推荐使用 Filter 查询。
组合查询
Boolean 查询
Boolean 查询包含 must、filter、should、must_not。
must
:必须匹配并且返回评分(文档 必须 匹配这些条件才能被包含进来。);
filter
忽略评分,(必须 匹配,但它以不评分、过滤模式来进行。这些语句对评分没有贡献,只是根据过滤标准来排除或包含文档。)
should
相当于数据库查询中的 or,针对 should 有一个特殊的情况,也就是所有的搜索只有 should ,那么必须满足should 里的其中一个才会被搜索到。(如果满足这些语句中的任意语句,将增加 _score
,否则,无任何影响。它们主要用于修正每个文档的相关性得分。)
must_not
为不匹配,相当于不等于(文档 必须不 匹配这些条件才能被包含进来。)。
查询作者为2;类别为3;浏览量不在2000-8000之间的文档
请求方式:get/post
请求地址:
http://127.0.0.1:9903/articles/_search?_source=title,id,author,views,cat
请求体:
1 | { |
返回结果:
1 | { |
删除
删除所有文档
请求路径:POST /索引名/_delete_by_query
请求体:
1 | { |
示例
查询标题包含python web的文档
请求路径:
GET http://127.0.0.1:9903/articles/_search?_source=title,id,author,views,cat&size=5&q=title:python web
返回结果:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84{
"took": 11,
"timed_out": false,
"_shards": {
"total": 3,
"successful": 3,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 816,
"relation": "eq"
},
"max_score": 4.800988,
"hits": [
{
"_index": "articles",
"_type": "_doc",
"_id": "1385",
"_score": 4.800988,
"_source": {
"author": 13,
"cat": 2,
"id": 1385,
"title": " Python爬虫利器四之PhantomJS的用法 ",
"views": 7790
}
},
{
"_index": "articles",
"_type": "_doc",
"_id": "1392",
"_score": 4.795393,
"_source": {
"author": 20,
"cat": 2,
"id": 1392,
"title": " Python爬虫进阶一之爬虫框架概述 ",
"views": 5728
}
},
{
"_index": "articles",
"_type": "_doc",
"_id": "459",
"_score": 4.630121,
"_source": {
"author": 18,
"cat": 7,
"id": 459,
"title": "web前端规范",
"views": 1731
}
},
{
"_index": "articles",
"_type": "_doc",
"_id": "1329",
"_score": 4.488963,
"_source": {
"author": 19,
"cat": 3,
"id": 1329,
"title": " [Python3网络爬虫开发实战] 1.6-Web库的安装 ",
"views": 1492
}
},
{
"_index": "articles",
"_type": "_doc",
"_id": "342",
"_score": 4.456541,
"_source": {
"author": 5,
"cat": 7,
"id": 342,
"title": "浅谈大型web系统架构",
"views": 4317
}
}
]
}
}请求所有字段中包含python web的文档
请求路径:
GET http://127.0.0.1:9903/articles/_search?_source=title,id,author,views,cat&size=5&q=_all:python web
返回结果:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84{
"took": 7,
"timed_out": false,
"_shards": {
"total": 3,
"successful": 3,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 576,
"relation": "eq"
},
"max_score": 4.630121,
"hits": [
{
"_index": "articles",
"_type": "_doc",
"_id": "459",
"_score": 4.630121,
"_source": {
"author": 18,
"cat": 7,
"id": 459,
"title": "web前端规范",
"views": 1731
}
},
{
"_index": "articles",
"_type": "_doc",
"_id": "342",
"_score": 4.456541,
"_source": {
"author": 5,
"cat": 7,
"id": 342,
"title": "浅谈大型web系统架构",
"views": 4317
}
},
{
"_index": "articles",
"_type": "_doc",
"_id": "812",
"_score": 4.456541,
"_source": {
"author": 1,
"cat": 3,
"id": 812,
"title": "想做web开发 就学JavaScript",
"views": 8885
}
},
{
"_index": "articles",
"_type": "_doc",
"_id": "410",
"_score": 4.3550134,
"_source": {
"author": 16,
"cat": 4,
"id": 410,
"title": "Web开发初学指南",
"views": 4300
}
},
{
"_index": "articles",
"_type": "_doc",
"_id": "578",
"_score": 4.3550134,
"_source": {
"author": 10,
"cat": 1,
"id": 578,
"title": "Web Worker 使用教程",
"views": 2814
}
}
]
}
}全文搜索标题包含python或web的文档,使用请求体的方式
请求路径:
GET/POST http://127.0.0.1:9903/articles/_search
请求体:
1
2
3
4
5
6
7
8
9
10
11
12
13{
"from": 0,
"size": 5,
"_source": [
"id",
"title"
],
"query": {
"match": {
"title": "python web"
}
}
}搜索标题包含
python爬虫
短语的文档请求路径:
GET/POST http://127.0.0.1:9903/articles/_search
请求体:
1
2
3
4
5
6
7
8
9
10
11
12
13{
"from": 0,
"size": 10,
"_source": [
"id",
"title"
],
"query": {
"match_phrase": {
"title": "python爬虫"
}
}
}返回结果:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119{
"took": 9,
"timed_out": false,
"_shards": {
"total": 3,
"successful": 3,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 28,
"relation": "eq"
},
"max_score": 6.0065117,
"hits": [
{
"_index": "articles",
"_type": "_doc",
"_id": "281",
"_score": 6.0065117,
"_source": {
"id": 281,
"title": "什么是Python爬虫?"
}
},
{
"_index": "articles",
"_type": "_doc",
"_id": "1206",
"_score": 5.663828,
"_source": {
"id": 1206,
"title": "什么是Python爬虫?"
}
},
{
"_index": "articles",
"_type": "_doc",
"_id": "1361",
"_score": 5.119743,
"_source": {
"id": 1361,
"title": " 自建免费PYTHON爬虫代理IP池 "
}
},
{
"_index": "articles",
"_type": "_doc",
"_id": "1567",
"_score": 5.084609,
"_source": {
"id": 1567,
"title": " Python爬虫入门一之综述 "
}
},
{
"_index": "articles",
"_type": "_doc",
"_id": "1387",
"_score": 4.8739586,
"_source": {
"id": 1387,
"title": " Python爬虫利器五之Selenium的用法 "
}
},
{
"_index": "articles",
"_type": "_doc",
"_id": "1569",
"_score": 4.863277,
"_source": {
"id": 1569,
"title": " Python爬虫入门二之爬虫基础了解 "
}
},
{
"_index": "articles",
"_type": "_doc",
"_id": "1385",
"_score": 4.863277,
"_source": {
"id": 1385,
"title": " Python爬虫利器四之PhantomJS的用法 "
}
},
{
"_index": "articles",
"_type": "_doc",
"_id": "1386",
"_score": 4.863277,
"_source": {
"id": 1386,
"title": " Python爬虫利器六之PyQuery的用法 "
}
},
{
"_index": "articles",
"_type": "_doc",
"_id": "1537",
"_score": 4.837265,
"_source": {
"id": 1537,
"title": " Python 爬虫利器之 Pyppeteer 的用法 "
}
},
{
"_index": "articles",
"_type": "_doc",
"_id": "1381",
"_score": 4.642378,
"_source": {
"id": 1381,
"title": " Python爬虫利器一之Requests库的用法 "
}
}
]
}
}精确查找term,查找文档字段id为299的文档
请求路径:
GET/POST http://127.0.0.1:9903/articles/_search
请求体:
1
2
3
4
5
6
7
8
9
10
11{
"_source": [
"id",
"title"
],
"query": {
"term": {
"id": 299
}
}
}返回结果:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29{
"took": 3,
"timed_out": false,
"_shards": {
"total": 3,
"successful": 3,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1.0,
"hits": [
{
"_index": "articles",
"_type": "_doc",
"_id": "299",
"_score": 1.0,
"_source": {
"id": 299,
"title": "为什么我要说 JavaScript 对象字面量很酷?"
}
}
]
}
}范围查找 range,查找点击量在9500到10000的文档
请求路径:
GET/POST http://127.0.0.1:9903/articles/_search
请求体:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15{
"_source": [
"id",
"title",
"views"
],
"query": {
"range": {
"views": {
"gte": 9500,
"lte": 10000
}
}
}
}返回结果:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129{
"took": 5,
"timed_out": false,
"_shards": {
"total": 3,
"successful": 3,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 107,
"relation": "eq"
},
"max_score": 1.0,
"hits": [
{
"_index": "articles",
"_type": "_doc",
"_id": "230",
"_score": 1.0,
"_source": {
"id": 230,
"title": "理解矩阵乘法",
"views": 9746
}
},
{
"_index": "articles",
"_type": "_doc",
"_id": "326",
"_score": 1.0,
"_source": {
"id": 326,
"title": "jQuery+JSONP通过调用虾米接口实现类似点点网发布音乐的功能",
"views": 9713
}
},
{
"_index": "articles",
"_type": "_doc",
"_id": "390",
"_score": 1.0,
"_source": {
"id": 390,
"title": "大型网站的 HTTPS 实践(二):HTTPS 对性能的影响",
"views": 9660
}
},
{
"_index": "articles",
"_type": "_doc",
"_id": "711",
"_score": 1.0,
"_source": {
"id": 711,
"title": "HTML特殊符号对照表大全",
"views": 9736
}
},
{
"_index": "articles",
"_type": "_doc",
"_id": "890",
"_score": 1.0,
"_source": {
"id": 890,
"title": "JavaScript易错知识点整理",
"views": 9808
}
},
{
"_index": "articles",
"_type": "_doc",
"_id": "930",
"_score": 1.0,
"_source": {
"id": 930,
"title": "调试 CSS 的方法",
"views": 9824
}
},
{
"_index": "articles",
"_type": "_doc",
"_id": "92",
"_score": 1.0,
"_source": {
"id": 92,
"title": "如何使用 Issue 管理软件项目?",
"views": 9834
}
},
{
"_index": "articles",
"_type": "_doc",
"_id": "1894",
"_score": 1.0,
"_source": {
"id": 1894,
"title": "[静下心来看python]-[11]-[__seq__]",
"views": 9597
}
},
{
"_index": "articles",
"_type": "_doc",
"_id": "1916",
"_score": 1.0,
"_source": {
"id": 1916,
"title": "python 冒泡排序",
"views": 9805
}
},
{
"_index": "articles",
"_type": "_doc",
"_id": "1931",
"_score": 1.0,
"_source": {
"id": 1931,
"title": "yun update 如何更新 安全补丁",
"views": 9825
}
}
]
}
}高亮搜索 highlight 搜索标题包含python web 编程,并高亮显示关键词
请求路径:
GET/POST http://127.0.0.1:9903/articles/_search
请求体:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17{
"_source": [
"id",
"title",
"views"
],
"query": {
"match": {
"title": "python web 编程"
}
},
"highlight":{
"fields":{
"title":{}
}
}
}返回结果:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179{
"took": 91,
"timed_out": false,
"_shards": {
"total": 3,
"successful": 3,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 409,
"relation": "eq"
},
"max_score": 6.9713564,
"hits": [
{
"_index": "articles",
"_type": "_doc",
"_id": "1183",
"_score": 6.9713564,
"_source": {
"id": 1183,
"title": "如何自学python编程入门?",
"views": 9077
},
"highlight": {
"title": [
"如何自学<em>python</em><em>编程</em>入门?"
]
}
},
{
"_index": "articles",
"_type": "_doc",
"_id": "1203",
"_score": 6.8202724,
"_source": {
"id": 1203,
"title": "Python编程游戏有哪些?",
"views": 3612
},
"highlight": {
"title": [
"<em>Python</em><em>编程</em>游戏有哪些?"
]
}
},
{
"_index": "articles",
"_type": "_doc",
"_id": "977",
"_score": 6.6650963,
"_source": {
"id": 977,
"title": "搞懂了这几点,你就学会了Web编程",
"views": 1732
},
"highlight": {
"title": [
"搞懂了这几点,你就学会了<em>Web</em><em>编程</em>"
]
}
},
{
"_index": "articles",
"_type": "_doc",
"_id": "1190",
"_score": 6.639513,
"_source": {
"id": 1190,
"title": "Python函数式编程实例详解",
"views": 3830
},
"highlight": {
"title": [
"<em>Python</em>函数式<em>编程</em>实例详解"
]
}
},
{
"_index": "articles",
"_type": "_doc",
"_id": "1186",
"_score": 6.614889,
"_source": {
"id": 1186,
"title": "Python编程100例—海绵宝宝",
"views": 9480
},
"highlight": {
"title": [
"<em>Python</em><em>编程</em>100例—海绵宝宝"
]
}
},
{
"_index": "articles",
"_type": "_doc",
"_id": "1187",
"_score": 6.3046937,
"_source": {
"id": 1187,
"title": "Python编程应该下载什么软件?",
"views": 2521
},
"highlight": {
"title": [
"<em>Python</em><em>编程</em>应该下载什么软件?"
]
}
},
{
"_index": "articles",
"_type": "_doc",
"_id": "1189",
"_score": 5.788736,
"_source": {
"id": 1189,
"title": "Python编程例子:使用Python编写简单的画图板软件",
"views": 3266
},
"highlight": {
"title": [
"<em>Python</em><em>编程</em>例子:使用<em>Python</em>编写简单的画图板软件"
]
}
},
{
"_index": "articles",
"_type": "_doc",
"_id": "1012",
"_score": 5.7351246,
"_source": {
"id": 1012,
"title": "python编程中双斜杠是什么意思?",
"views": 5075
},
"highlight": {
"title": [
"<em>python</em><em>编程</em>中双斜杠是什么意思?"
]
}
},
{
"_index": "articles",
"_type": "_doc",
"_id": "1174",
"_score": 5.7270813,
"_source": {
"id": 1174,
"title": "python编程中双斜杠是什么意思?",
"views": 9410
},
"highlight": {
"title": [
"<em>python</em><em>编程</em>中双斜杠是什么意思?"
]
}
},
{
"_index": "articles",
"_type": "_doc",
"_id": "1185",
"_score": 5.639056,
"_source": {
"id": 1185,
"title": "Python编程例子: python中计算三次方怎么表示?",
"views": 4959
},
"highlight": {
"title": [
"<em>Python</em><em>编程</em>例子: <em>python</em>中计算三次方怎么表示?"
]
}
}
]
}
}搜索标题包含
python 爬虫
的文档,日期倒序,匹配分数倒序请求路径:
GET/POST http://127.0.0.1:9903/articles/_search
请求体:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26{
"_source": [
"id",
"title",
"views",
"createtime"
],
"query": {
"match": {
"title": "python爬虫"
}
},
"sort":[
{
"createtime":{
"order":"desc"
}
},
{
"_score":{
"order":"desc"
}
}
]
}返回结果:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179{
"took": 14,
"timed_out": false,
"_shards": {
"total": 3,
"successful": 3,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 307,
"relation": "eq"
},
"max_score": null,
"hits": [
{
"_index": "articles",
"_type": "_doc",
"_id": "1331",
"_score": 3.1511018,
"_source": {
"createtime": "2021-05-20T16:00:00.000Z",
"id": 1331,
"title": " [Python3网络爬虫开发实战] 1.7-App爬取相关库的安装 ",
"views": 9039
},
"sort": [
1621526400000,
3.1511018
]
},
{
"_index": "articles",
"_type": "_doc",
"_id": "1617",
"_score": 3.7443776,
"_source": {
"createtime": "2021-04-04T16:00:00.000Z",
"id": 1617,
"title": " [Python3网络爬虫开发实战] 15.4–Scrapyd 批量部署 ",
"views": 3554
},
"sort": [
1617552000000,
3.7443776
]
},
{
"_index": "articles",
"_type": "_doc",
"_id": "1596",
"_score": 3.7840004,
"_source": {
"createtime": "2021-02-22T16:00:00.000Z",
"id": 1596,
"title": " [Python3网络爬虫开发实战] 9.4-ADSL 拨号代理 ",
"views": 5475
},
"sort": [
1614009600000,
3.7840004
]
},
{
"_index": "articles",
"_type": "_doc",
"_id": "1328",
"_score": 3.7831829,
"_source": {
"createtime": "2021-01-27T16:00:00.000Z",
"id": 1328,
"title": " [Python3网络爬虫开发实战] 1.2.5-PhantomJS的安装 ",
"views": 7585
},
"sort": [
1611763200000,
3.7831829
]
},
{
"_index": "articles",
"_type": "_doc",
"_id": "1330",
"_score": 3.6084993,
"_source": {
"createtime": "2020-12-20T16:00:00.000Z",
"id": 1330,
"title": " [Python3网络爬虫开发实战] 1.6.1-Flask的安装 ",
"views": 4703
},
"sort": [
1608480000000,
3.6084993
]
},
{
"_index": "articles",
"_type": "_doc",
"_id": "1283",
"_score": 1.9249799,
"_source": {
"createtime": "2020-12-02T16:00:00.000Z",
"id": 1283,
"title": " Python3 中使用 Pathlib 模块进行文件操作 ",
"views": 7660
},
"sort": [
1606924800000,
1.9249799
]
},
{
"_index": "articles",
"_type": "_doc",
"_id": "1238",
"_score": 3.6084993,
"_source": {
"createtime": "2020-10-16T16:00:00.000Z",
"id": 1238,
"title": " [Python3网络爬虫开发实战] 13.5–Downloader Middleware 的用法 ",
"views": 1392
},
"sort": [
1602864000000,
3.6084993
]
},
{
"_index": "articles",
"_type": "_doc",
"_id": "1199",
"_score": 2.3096707,
"_source": {
"createtime": "2020-09-10T16:00:00.000Z",
"id": 1199,
"title": "python怎么下载安装re库?",
"views": 2390
},
"sort": [
1599753600000,
2.3096707
]
},
{
"_index": "articles",
"_type": "_doc",
"_id": "1329",
"_score": 3.4821367,
"_source": {
"createtime": "2020-06-14T16:00:00.000Z",
"id": 1329,
"title": " [Python3网络爬虫开发实战] 1.6-Web库的安装 ",
"views": 1492
},
"sort": [
1592150400000,
3.4821367
]
},
{
"_index": "articles",
"_type": "_doc",
"_id": "1237",
"_score": 2.1997697,
"_source": {
"createtime": "2020-03-28T16:00:00.000Z",
"id": 1237,
"title": " Python 中异常处理库 merry 的用法 ",
"views": 6264
},
"sort": [
1585411200000,
2.1997697
]
}
]
}
}boost 提升权重,优化排序 标题包含python 爬虫,权重变为2倍
请求路径:
GET/POST http://127.0.0.1:9903/articles/_search
请求体:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17{
"size": 5,
"_source": [
"id",
"title",
"views",
"createtime"
],
"query": {
"match": {
"title": {
"query": "python 爬虫",
"boost": 2
}
}
}
}返回结果:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79{
"took": 6,
"timed_out": false,
"_shards": {
"total": 3,
"successful": 3,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 307,
"relation": "eq"
},
"max_score": 12.013023,
"hits": [
{
"_index": "articles",
"_type": "_doc",
"_id": "281",
"_score": 12.013023,
"_source": {
"createtime": "2003-11-18T16:00:00.000Z",
"id": 281,
"title": "什么是Python爬虫?",
"views": 7830
}
},
{
"_index": "articles",
"_type": "_doc",
"_id": "1569",
"_score": 11.6157055,
"_source": {
"createtime": "2015-04-18T16:00:00.000Z",
"id": 1569,
"title": " Python爬虫入门二之爬虫基础了解 ",
"views": 880
}
},
{
"_index": "articles",
"_type": "_doc",
"_id": "1206",
"_score": 11.327656,
"_source": {
"createtime": "2005-12-16T16:00:00.000Z",
"id": 1206,
"title": "什么是Python爬虫?",
"views": 6231
}
},
{
"_index": "articles",
"_type": "_doc",
"_id": "1392",
"_score": 11.009354,
"_source": {
"createtime": "1999-10-02T16:00:00.000Z",
"id": 1392,
"title": " Python爬虫进阶一之爬虫框架概述 ",
"views": 5728
}
},
{
"_index": "articles",
"_type": "_doc",
"_id": "1352",
"_score": 10.239487,
"_source": {
"createtime": "2007-07-31T16:00:00.000Z",
"id": 1352,
"title": " Python3爬虫视频学习教程 ",
"views": 7287
}
}
]
}
}
Logstash
Logstash是一个开源数据收集引擎,具有实时管道功能。Logstash可以动态地将来自不同数据源的数据统一起来,并将数据标准化到你所选择的目的地。
除了Logstash,日常我们向elasticsearch中添加数据时候还可以使用哪种方式呢?
Haystack
之前文章里专门介绍了在django中使用elasticsearch,当时我们与elasticsearch对接的是Haystack
,它是在Django中对接搜索引擎的框架,搭建了用户和搜索引擎之间的沟通桥梁。我们在Django中可以通过使用 Haystack 来调用 Elasticsearch 搜索引擎。
手动添加数据
Logstash通常用于系统上线初期原始数据的导入,但是上线后我们通常会在对数据库进行增删改的时候同步的对elasticsearch进行操作,其实Haystack做的就是这样的事情,不过它服务于django。我们在其他框架或原生开发中该如何做呢?
其实就是刚才描述的,当向数据库进行数据更新的时候,我们会同步手动对elasticsearch进行同步操作,等同于手动对elasticsearch服务的api接口进行数据交互。
Logstash使用
安装
现在官网Logstash下载,注意版本一定要跟你的elasticsearch版本一致,网上很多关于yum的安装方式这里不在进行介绍
从MySQL导入数据到Elasticsearch
Logstash安装好后不用对elasticsearch进行任何配置,只需要用到它的时候进行配置。
下载mysql驱动包
下载连接mysql的驱动包,放到指定目录下
在地址 https://dev.mysql.com/downloads/connector/j/ 下载最新的Connector。下载完这个Connector后,记住这个驱动报的位置哟,最好跟logstash放在一起。
创建配置文件
创建导入elasticsearch配置文件,logstash_mysql.conf
jdbc_connection_string
:连接的数据库地址,端口号,数据库名,字符编码,时区等,blog_db为数据库名jdbc_user
:连接数据库的用户名jdbc_password
:链接数据库的密码jdbc_driver_library
: 驱动包路径,若是在logstash指定目录下则留空,若不是则需要指定绝对路径jdbc_driver_class
: 最新使用的驱动包类(一般都是:com.mysql.jdbc.Driver
)statement
:sql语句,查询articles数据表所有数据,字段可以使用as,与elasticsearch索引中的设置映射字段保持一致tracking_column
:没有看官方文档,我自己的理解是将数据库中查询出来的该字段值,作为elasticsearch索引中的文档id值,与document_id配置刚好对应。
1 | input{ |
开始导入elasticsearch
一行命令即可,过会再查看,数据已经都导进去了。
1 | /usr/share/logstash/bin/logstash -f ./logstash_mysql.conf |
拼写纠错
对于已经建立的索引库,elasticsearch还提供了一种查询模式,suggest建议查询模式
1 | curl 127.0.0.1:9200/articles/_search?pretty -d ' |
当我们输入错误的关键词pthony web
时,es可以提供根据索引库映射字段content中的数据得出的正确拼写建议,size为拼写建议的数量,如以上请求得到的响应为:
1 | { |
自动补全
使用elasticsearch提供的自动补全功能,因为文档的类型映射要特殊设置,所以原先建立的文章索引库不能用于自动补全,需要再建立一个自动补全的索引库
创建自动补全数据库
请求地址:PUT 127.0.0.1:9200/completions?pretty
请求体:
1 | { |
使用logstash导入初始数据
编辑logstash_mysql_completion.conf
1 | input{ |
执行命令导入数据
1 | sudo /usr/share/logstash/bin/logstash -f ./logstash_mysql_completion.conf |
测试自动补全
请求地址:GET localhost:9902/completions/_search
请求体:
1 | { |
返回结果:
1 | { |
python对接elasticsearch服务器
查询数据
安装elasticsearch扩展包
1 | pip install elasticsearch |
连接elasticsearch服务器
1 | from elasticsearch import Elasticsearch |
定义搜索条件
1 | # 搜索条件:查询标题包含python web,作者id为2的5个文档(查询结果返回的额字段为id,title,views,createtime) |
开始搜索
1 | # 搜索并获取结果:从articles索引库按之前的搜索条件进行搜索 |
处理搜索结果
1 | # 循环读取搜索结果 |
预览打印结果
1 | 文章ID: 1009 |
添加数据
在发布文章接口中,除了向数据库保存文章外,还要向es库中添加新文章的索引
1 | doc = { |