如何在Django中使用ElasticSearch

上篇文章我们说了如何在django中使用FastDFS,其实也是为今天的文章做准备的,我们将ElasticSearch也融入项目中,做一个小的商品搜索系统。

全文搜索属于最常见的需求,开源的 Elasticsearch (以下简称 Elastic)是目前全文搜索引擎的首选。它可以快速地储存、搜索和分析海量数据。维基百科、Stack Overflow、Github 都采用它。Elasticsearch 不支持对中文进行分词建立索引,需要配合扩展elasticsearch-analysis-ik来实现中文分词处理,所以我们跟安装FastDFS一样,也采用配置好的ElasticSearch+elasticsearch-analysis-ik镜像来操作。

安装ElasticSearch镜像

说明一下,我这里还是安装到了虚拟机Ubuntu宿主机上,跟FastDFS在一台服务器上,用来演示作为独立的服务器。

1
2
# 从仓库拉取镜像
$ sudo docker image pull delron/elasticsearch-ik:2.4.6-1.0

ElasticSearch配置文件

宿主机的配置文件目录/home/tony/Desktop/elasticsearch-2.4.6/config,该目录有两个配置文件ElasticSearch配置文件,和日志配置文件。

elasticsearch.yml内容如下,都是注释,基本上就一行代码有效,即:network.host: 172.16.178.129将ip改为宿主机ip就行

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
# ======================== Elasticsearch Configuration =========================
#
# NOTE: Elasticsearch comes with reasonable defaults for most settings.
# Before you set out to tweak and tune the configuration, make sure you
# understand what are you trying to accomplish and the consequences.
#
# The primary way of configuring a node is via this file. This template lists
# the most important settings you may want to configure for a production cluster.
#
# Please see the documentation for further information on configuration options:
# <http://www.elastic.co/guide/en/elasticsearch/reference/current/setup-configuration.html>
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
# cluster.name: my-application
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
# node.name: node-1
#
# Add custom attributes to the node:
#
# node.rack: r1
#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
# path.data: /path/to/data
#
# Path to log files:
#
# path.logs: /path/to/logs
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
# bootstrap.memory_lock: true
#
# Make sure that the `ES_HEAP_SIZE` environment variable is set to about half the memory
# available on the system and that the owner of the process is allowed to use this limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
# Set the bind address to a specific IP (IPv4 or IPv6):
#
network.host: 172.16.178.129
#
# Set a custom port for HTTP:
#
# http.port: 9200
#
# For more information, see the documentation at:
# <http://www.elastic.co/guide/en/elasticsearch/reference/current/modules-network.html>
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when new node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
# discovery.zen.ping.unicast.hosts: ["host1", "host2"]
#
# Prevent the "split brain" by configuring the majority of nodes (total number of nodes / 2 + 1):
#
# discovery.zen.minimum_master_nodes: 3
#
# For more information, see the documentation at:
# <http://www.elastic.co/guide/en/elasticsearch/reference/current/modules-discovery.html>
#
# ---------------------------------- Gateway -----------------------------------
#
# Block initial recovery after a full cluster restart until N nodes are started:
#
# gateway.recover_after_nodes: 3
#
# For more information, see the documentation at:
# <http://www.elastic.co/guide/en/elasticsearch/reference/current/modules-gateway.html>
#
# ---------------------------------- Various -----------------------------------
#
# Disable starting multiple nodes on a single system:
#
# node.max_local_storage_nodes: 1
#
# Require explicit names when deleting indices:
#
# action.destructive_requires_name: true

logging.yml内容:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
# you can override this using by setting a system property, for example -Des.logger.level=DEBUG
es.logger.level: INFO
rootLogger: ${es.logger.level}, console, file
logger:
# log action execution errors for easier debugging
action: DEBUG

# deprecation logging, turn to DEBUG to see them
deprecation: INFO, deprecation_log_file

# reduce the logging for aws, too much is logged under the default INFO
com.amazonaws: WARN
# aws will try to do some sketchy JMX stuff, but its not needed.
com.amazonaws.jmx.SdkMBeanRegistrySupport: ERROR
com.amazonaws.metrics.AwsSdkMetrics: ERROR

org.apache.http: INFO

# gateway
#gateway: DEBUG
#index.gateway: DEBUG

# peer shard recovery
#indices.recovery: DEBUG

# discovery
#discovery: TRACE

index.search.slowlog: TRACE, index_search_slow_log_file
index.indexing.slowlog: TRACE, index_indexing_slow_log_file

additivity:
index.search.slowlog: false
index.indexing.slowlog: false
deprecation: false

appender:
console:
type: console
layout:
type: consolePattern
conversionPattern: "[%d{ISO8601}][%-5p][%-25c] %m%n"

file:
type: dailyRollingFile
file: ${path.logs}/${cluster.name}.log
datePattern: "'.'yyyy-MM-dd"
layout:
type: pattern
conversionPattern: "[%d{ISO8601}][%-5p][%-25c] %.10000m%n"

# Use the following log4j-extras RollingFileAppender to enable gzip compression of log files.
# For more information see https://logging.apache.org/log4j/extras/apidocs/org/apache/log4j/rolling/RollingFileAppender.html
#file:
#type: extrasRollingFile
#file: ${path.logs}/${cluster.name}.log
#rollingPolicy: timeBased
#rollingPolicy.FileNamePattern: ${path.logs}/${cluster.name}.log.%d{yyyy-MM-dd}.gz
#layout:
#type: pattern
#conversionPattern: "[%d{ISO8601}][%-5p][%-25c] %m%n"

deprecation_log_file:
type: dailyRollingFile
file: ${path.logs}/${cluster.name}_deprecation.log
datePattern: "'.'yyyy-MM-dd"
layout:
type: pattern
conversionPattern: "[%d{ISO8601}][%-5p][%-25c] %m%n"

index_search_slow_log_file:
type: dailyRollingFile
file: ${path.logs}/${cluster.name}_index_search_slowlog.log
datePattern: "'.'yyyy-MM-dd"
layout:
type: pattern
conversionPattern: "[%d{ISO8601}][%-5p][%-25c] %m%n"

index_indexing_slow_log_file:
type: dailyRollingFile
file: ${path.logs}/${cluster.name}_index_indexing_slowlog.log
datePattern: "'.'yyyy-MM-dd"
layout:
type: pattern
conversionPattern: "[%d{ISO8601}][%-5p][%-25c] %m%n"

使用Docker运行Elasticsearch-ik

1
$ sudo docker run -dti --name=elasticsearch --network=host -v /home/tony/Desktop/elasticsearch-2.4.6/config:/usr/share/elasticsearch/config delron/elasticsearch-ik:2.4.6-1.0

看下运行状态,可以看出ElasticSearch和之前文章里讲的fastdfs都在正常运行中。

1
2
3
4
5
tony@ubuntu:~/Desktop/elasticsearch-2.4.6/config$ sudo docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
519432a1232b delron/elasticsearch-ik:2.4.6-1.0 "/docker-entrypoint.…" 2 days ago Up About an hour elasticsearch
d7d3a976fa77 delron/fastdfs "/usr/bin/start1.sh …" 5 days ago Up About an hour storage
470dbf71de20 delron/fastdfs "/usr/bin/start1.sh …" 5 days ago Up About an hour tracker

Haystack介绍和安装配置

Haystack 是在Django中对接搜索引擎的框架,搭建了用户和搜索引擎之间的沟通桥梁。我们在Django中可以通过使用 Haystack 来调用 Elasticsearch 搜索引擎。

安装Haystack

1
2
$ pip install django-haystack
$ pip install elasticsearch==2.4.1

注册Haystack应用和路由

1
2
3
INSTALLED_APPS = [
'haystack', # 全文检索
]

路由

1
path('search/', include('haystack.urls'))

Haystack配置

1
2
3
4
5
6
7
8
9
10
11
# Haystack
HAYSTACK_CONNECTIONS = {
'default': {
'ENGINE': 'haystack.backends.elasticsearch_backend.ElasticsearchSearchEngine',
'URL': 'http://172.16.178.129:9200/', # Elasticsearch服务器ip地址,端口号固定为9200
'INDEX_NAME': 'goods', # Elasticsearch建立的索引库的名称
},
}

# 当添加、修改、删除数据时,自动生成索引
HAYSTACK_SIGNAL_PROCESSOR = 'haystack.signals.RealtimeSignalProcessor'

建立数据索引

  • 通过创建索引类,来指明让搜索引擎对哪些字段建立索引,也就是可以通过哪些字段的关键字来检索数据。
  • 本项目中对商品信息进行全文检索,所以在app应用中新建search_indexes.py文件,用于存放索引类。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
from haystack import indexes

from app.models import Goods


class GoodsIndex(indexes.SearchIndex, indexes.Indexable):
"""商品索引数据模型类"""
text = indexes.CharField(document=True, use_template=True)

def get_model(self):
"""返回建立索引的模型类"""
return Goods

def index_queryset(self, using=None):
"""返回要建立索引的数据查询集"""
return self.get_model().objects.all()

索引类GoodsIndex说明:

  • GoodsIndex建立的字段,都可以借助HaystackElasticsearch搜索引擎查询。
  • 其中text字段我们声明为document=True,表名该字段是主要进行关键字查询的字段。
  • text字段的索引值可以由多个数据库模型类字段组成,具体由哪些模型类字段组成,我们用use_template=True表示后续通过模板来指明。

创建text字段索引值模板文件

  • templates目录中创建text字段使用的模板文件
  • 具体在templates/search/indexes/app/goods_text.txt文件中定义

添加搜索入口

我们在首页加入了搜索入口,搜索提交的处理路由就是之前设置的搜索路由,搜索参数为q即可。

  • 请求方法:GET
  • 请求地址:/search/
  • 请求参数:q
1
2
3
4
<form method="get" action="/search/" class="search_con">
<input type="text" class="input_text fl" name="q" placeholder="搜索商品">
<input type="submit" class="input_btn fr" name="" value="搜索">
</form>

添加搜索结果模板

模板应该放在templates/search/search.html中,扩展源码中有要求,我们不重写的情况下就按这个规定放就行了。

  • query:搜索关键字
  • paginator:分页paginator对象
  • page:当前页的page对象(遍历page中的对象,可以得到result对象)
  • result.objects: 当前遍历出来的Goods对象。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Title</title>
</head>
<body>
<div style="width: 1000px; margin: auto">
<form method="get" action="/search/" class="search_con">
<input type="text" class="input_text fl" name="q" placeholder="搜索商品">
<input type="submit" class="input_btn fr" name="" value="搜索">
</form>
<h2>您搜索的关键词为:<span style="color: #f00;">{{ query }}</span></h2>
<table border="1">
<tr>
<td>主键</td>
<td>标题</td>
<td>副标题</td>
<td>图片</td>
</tr>
{% for result in page %}
<tr>
<td>{{ result.object.id }}</td>
<td>{{ result.object.name }}</td>
<td>{{ result.object.title }}</td>
<td><img src="{{ result.object.img.url }}" height="30" alt="{{ result.object.name }}"></td>
</tr>
{% empty %}
<tr>
<td colspan="4">没有找到您要查询的商品。</td>
</tr>
{% endfor %}

</table>
</div>
</body>
</html>

可以看到我们在搜索结果页面也放置了搜索表单,我们从首页进入搜索下”金“试下结果吧,结果正确渲染。

搜索结果分页

从模板参数可以看出来除了渲染的搜索结果,还有分页,我们试试分页是否好用。

  • 通过HAYSTACK_SEARCH_RESULTS_PER_PAGE可以控制每页显示数量
  • 每页显示五条数据:HAYSTACK_SEARCH_RESULTS_PER_PAGE = 5

后端传给我们的paginator专门来做分页的,使用方法类似Django之分页器组件,我这里直接在页面上渲染了不做过多扩展。

1
2
3
4
5
6
7
8
9
10
11
12
<div style="display: flex; margin-top:20px; line-height: 50px; text-align: center">
{% for page_code in paginator.page_range %}
{% if page.number == page_code %}
<div style="width: 50px; height: 50px; margin: 0 3px; background-color: lightblue">
{{ page_code }}
</div>{% else %}
<div style="width: 50px; height: 50px; margin: 0 3px; background-color: darkcyan">
<a href="?q={{ query }}&page={{ page_code }}">{{ page_code }}</a>
</div>
{% endif %}
{% endfor %}
</div>

最终展示效果:

总结

至此我们关于如何在Django中使用ElasticSearch已经展示完,文章所述内容仅为粗略使用ElasticSearch,没做过多完善及美化,不过其基本使用方法大概如此。

我们在上篇文章使用fastdfs中也同样使用docker镜像的方式来达到目的,相对于繁琐的一步一步配置,个人还是很推荐docker的。