如何在Django中使用ElasticSearch

发表于 2021-01-04 更新于 2022-06-16 分类于 rd ， python Changyan：本文字数： 2.1k 阅读时长 ≈ 8 分钟

上篇文章我们说了如何在django中使用FastDFS，其实也是为今天的文章做准备的，我们将ElasticSearch也融入项目中，做一个小的商品搜索系统。

全文搜索属于最常见的需求，开源的 Elasticsearch （以下简称 Elastic）是目前全文搜索引擎的首选。它可以快速地储存、搜索和分析海量数据。维基百科、Stack Overflow、Github 都采用它。Elasticsearch 不支持对中文进行分词建立索引，需要配合扩展elasticsearch-analysis-ik来实现中文分词处理，所以我们跟安装FastDFS一样，也采用配置好的ElasticSearch+elasticsearch-analysis-ik镜像来操作。

安装ElasticSearch镜像

说明一下，我这里还是安装到了虚拟机Ubuntu宿主机上，跟FastDFS在一台服务器上，用来演示作为独立的服务器。

1 2	# 从仓库拉取镜像 $ sudo docker image pull delron/elasticsearch-ik:2.4.6-1.0

ElasticSearch配置文件

宿主机的配置文件目录/home/tony/Desktop/elasticsearch-2.4.6/config，该目录有两个配置文件ElasticSearch配置文件，和日志配置文件。

elasticsearch.yml内容如下，都是注释，基本上就一行代码有效，即：network.host: 172.16.178.129将ip改为宿主机ip就行

# ======================== Elasticsearch Configuration =========================
#
# NOTE: Elasticsearch comes with reasonable defaults for most settings.
#       Before you set out to tweak and tune the configuration, make sure you
#       understand what are you trying to accomplish and the consequences.
#
# The primary way of configuring a node is via this file. This template lists
# the most important settings you may want to configure for a production cluster.
#
# Please see the documentation for further information on configuration options:
# <http://www.elastic.co/guide/en/elasticsearch/reference/current/setup-configuration.html>
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
# cluster.name: my-application
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
# node.name: node-1
#
# Add custom attributes to the node:
#
# node.rack: r1
#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
# path.data: /path/to/data
#
# Path to log files:
#
# path.logs: /path/to/logs
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
# bootstrap.memory_lock: true
#
# Make sure that the `ES_HEAP_SIZE` environment variable is set to about half the memory
# available on the system and that the owner of the process is allowed to use this limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
# Set the bind address to a specific IP (IPv4 or IPv6):
#
network.host: 172.16.178.129
#
# Set a custom port for HTTP:
#
# http.port: 9200
#
# For more information, see the documentation at:
# <http://www.elastic.co/guide/en/elasticsearch/reference/current/modules-network.html>
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when new node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
# discovery.zen.ping.unicast.hosts: ["host1", "host2"]
#
# Prevent the "split brain" by configuring the majority of nodes (total number of nodes / 2 + 1):
#
# discovery.zen.minimum_master_nodes: 3
#
# For more information, see the documentation at:
# <http://www.elastic.co/guide/en/elasticsearch/reference/current/modules-discovery.html>
#
# ---------------------------------- Gateway -----------------------------------
#
# Block initial recovery after a full cluster restart until N nodes are started:
#
# gateway.recover_after_nodes: 3
#
# For more information, see the documentation at:
# <http://www.elastic.co/guide/en/elasticsearch/reference/current/modules-gateway.html>
#
# ---------------------------------- Various -----------------------------------
#
# Disable starting multiple nodes on a single system:
#
# node.max_local_storage_nodes: 1
#
# Require explicit names when deleting indices:
#
# action.destructive_requires_name: true

logging.yml内容：

# you can override this using by setting a system property, for example -Des.logger.level=DEBUG
es.logger.level: INFO
rootLogger: ${es.logger.level}, console, file
logger:
  # log action execution errors for easier debugging
  action: DEBUG

  # deprecation logging, turn to DEBUG to see them
  deprecation: INFO, deprecation_log_file

  # reduce the logging for aws, too much is logged under the default INFO
  com.amazonaws: WARN
  # aws will try to do some sketchy JMX stuff, but its not needed.
  com.amazonaws.jmx.SdkMBeanRegistrySupport: ERROR
  com.amazonaws.metrics.AwsSdkMetrics: ERROR

  org.apache.http: INFO

  # gateway
  #gateway: DEBUG
  #index.gateway: DEBUG

  # peer shard recovery
  #indices.recovery: DEBUG

  # discovery
  #discovery: TRACE

  index.search.slowlog: TRACE, index_search_slow_log_file
  index.indexing.slowlog: TRACE, index_indexing_slow_log_file

additivity:
  index.search.slowlog: false
  index.indexing.slowlog: false
  deprecation: false

appender:
  console:
    type: console
    layout:
      type: consolePattern
      conversionPattern: "[%d{ISO8601}][%-5p][%-25c] %m%n"

  file:
    type: dailyRollingFile
    file: ${path.logs}/${cluster.name}.log
    datePattern: "'.'yyyy-MM-dd"
    layout:
      type: pattern
      conversionPattern: "[%d{ISO8601}][%-5p][%-25c] %.10000m%n"

  # Use the following log4j-extras RollingFileAppender to enable gzip compression of log files. 
  # For more information see https://logging.apache.org/log4j/extras/apidocs/org/apache/log4j/rolling/RollingFileAppender.html
  #file:
    #type: extrasRollingFile
    #file: ${path.logs}/${cluster.name}.log
    #rollingPolicy: timeBased
    #rollingPolicy.FileNamePattern: ${path.logs}/${cluster.name}.log.%d{yyyy-MM-dd}.gz
    #layout:
      #type: pattern
      #conversionPattern: "[%d{ISO8601}][%-5p][%-25c] %m%n"

  deprecation_log_file:
    type: dailyRollingFile
    file: ${path.logs}/${cluster.name}_deprecation.log
    datePattern: "'.'yyyy-MM-dd"
    layout:
      type: pattern
      conversionPattern: "[%d{ISO8601}][%-5p][%-25c] %m%n"

  index_search_slow_log_file:
    type: dailyRollingFile
    file: ${path.logs}/${cluster.name}_index_search_slowlog.log
    datePattern: "'.'yyyy-MM-dd"
    layout:
      type: pattern
      conversionPattern: "[%d{ISO8601}][%-5p][%-25c] %m%n"

  index_indexing_slow_log_file:
    type: dailyRollingFile
    file: ${path.logs}/${cluster.name}_index_indexing_slowlog.log
    datePattern: "'.'yyyy-MM-dd"
    layout:
      type: pattern
      conversionPattern: "[%d{ISO8601}][%-5p][%-25c] %m%n"

使用Docker运行Elasticsearch-ik

1	$ sudo docker run -dti --name=elasticsearch --network=host -v /home/tony/Desktop/elasticsearch-2.4.6/config:/usr/share/elasticsearch/config delron/elasticsearch-ik:2.4.6-1.0

看下运行状态，可以看出ElasticSearch和之前文章里讲的fastdfs都在正常运行中。

tony@ubuntu:~/Desktop/elasticsearch-2.4.6/config$ sudo docker ps
CONTAINER ID   IMAGE                               COMMAND                  CREATED      STATUS             PORTS     NAMES
519432a1232b   delron/elasticsearch-ik:2.4.6-1.0   "/docker-entrypoint.…"   2 days ago   Up About an hour             elasticsearch
d7d3a976fa77   delron/fastdfs                      "/usr/bin/start1.sh …"   5 days ago   Up About an hour             storage
470dbf71de20   delron/fastdfs                      "/usr/bin/start1.sh …"   5 days ago   Up About an hour             tracker

Haystack介绍和安装配置

Haystack 是在Django中对接搜索引擎的框架，搭建了用户和搜索引擎之间的沟通桥梁。我们在Django中可以通过使用 Haystack 来调用 Elasticsearch 搜索引擎。

安装Haystack

1 2	$ pip install django-haystack $ pip install elasticsearch==2.4.1

注册Haystack应用和路由

1
2
3

INSTALLED_APPS = [
    'haystack', # 全文检索
]

路由

1	path('search/', include('haystack.urls'))

Haystack配置

# Haystack
HAYSTACK_CONNECTIONS = {
    'default': {
        'ENGINE': 'haystack.backends.elasticsearch_backend.ElasticsearchSearchEngine',
        'URL': 'http://172.16.178.129:9200/', # Elasticsearch服务器ip地址，端口号固定为9200
        'INDEX_NAME': 'goods', # Elasticsearch建立的索引库的名称
    },
}

# 当添加、修改、删除数据时，自动生成索引
HAYSTACK_SIGNAL_PROCESSOR = 'haystack.signals.RealtimeSignalProcessor'

建立数据索引

通过创建索引类，来指明让搜索引擎对哪些字段建立索引，也就是可以通过哪些字段的关键字来检索数据。
本项目中对商品信息进行全文检索，所以在app应用中新建search_indexes.py文件，用于存放索引类。

from haystack import indexes

from app.models import Goods


class GoodsIndex(indexes.SearchIndex, indexes.Indexable):
    """商品索引数据模型类"""
    text = indexes.CharField(document=True, use_template=True)

    def get_model(self):
        """返回建立索引的模型类"""
        return Goods

    def index_queryset(self, using=None):
        """返回要建立索引的数据查询集"""
        return self.get_model().objects.all()

索引类GoodsIndex说明：

在GoodsIndex建立的字段，都可以借助Haystack由Elasticsearch搜索引擎查询。
其中text字段我们声明为document=True，表名该字段是主要进行关键字查询的字段。
text字段的索引值可以由多个数据库模型类字段组成，具体由哪些模型类字段组成，我们用use_template=True表示后续通过模板来指明。

创建`text`字段索引值模板文件

在templates目录中创建text字段使用的模板文件
具体在templates/search/indexes/app/goods_text.txt文件中定义

添加搜索入口

我们在首页加入了搜索入口，搜索提交的处理路由就是之前设置的搜索路由，搜索参数为q即可。

请求方法：GET
请求地址：/search/
请求参数：q

<form method="get" action="/search/" class="search_con">
        <input type="text" class="input_text fl" name="q" placeholder="搜索商品">
        <input type="submit" class="input_btn fr" name="" value="搜索">
    </form>

添加搜索结果模板

模板应该放在templates/search/search.html中，扩展源码中有要求，我们不重写的情况下就按这个规定放就行了。

query：搜索关键字
paginator：分页paginator对象
page：当前页的page对象（遍历page中的对象，可以得到result对象）
result.objects: 当前遍历出来的Goods对象。

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>Title</title>
</head>
<body>
<div style="width: 1000px; margin: auto">
     <form method="get" action="/search/" class="search_con">
        <input type="text" class="input_text fl" name="q" placeholder="搜索商品">
        <input type="submit" class="input_btn fr" name="" value="搜索">
    </form>
<h2>您搜索的关键词为：<span style="color: #f00;">{{ query }}</span></h2>
<table border="1">
    <tr>
        <td>主键</td>
        <td>标题</td>
        <td>副标题</td>
        <td>图片</td>
    </tr>
    {% for result in page %}
        <tr>
        <td>{{ result.object.id }}</td>
        <td>{{ result.object.name }}</td>
        <td>{{ result.object.title }}</td>
        <td><img src="{{ result.object.img.url }}" height="30" alt="{{ result.object.name }}"></td>
        </tr>
    {% empty %}
        <tr>
        <td colspan="4">没有找到您要查询的商品。</td>
        </tr>
            {% endfor %}

</table>
</div>
</body>
</html>

可以看到我们在搜索结果页面也放置了搜索表单，我们从首页进入搜索下”金“试下结果吧，结果正确渲染。

搜索结果分页

从模板参数可以看出来除了渲染的搜索结果，还有分页，我们试试分页是否好用。

通过HAYSTACK_SEARCH_RESULTS_PER_PAGE可以控制每页显示数量
每页显示五条数据：HAYSTACK_SEARCH_RESULTS_PER_PAGE = 5

后端传给我们的paginator专门来做分页的，使用方法类似Django之分页器组件，我这里直接在页面上渲染了不做过多扩展。

<div style="display: flex; margin-top:20px; line-height: 50px; text-align: center">
        {% for page_code in paginator.page_range %}
            {% if page.number == page_code %}
                <div style="width: 50px; height: 50px; margin: 0 3px; background-color: lightblue">
                    {{ page_code }}
                </div>{% else %}
                <div style="width: 50px; height: 50px; margin: 0 3px; background-color: darkcyan">
                    <a href="?q={{ query }}&page={{ page_code }}">{{ page_code }}</a>
                </div>
            {% endif %}
        {% endfor %}
    </div>

最终展示效果：

总结

至此我们关于如何在Django中使用ElasticSearch已经展示完，文章所述内容仅为粗略使用ElasticSearch，没做过多完善及美化，不过其基本使用方法大概如此。

我们在上篇文章使用fastdfs中也同样使用docker镜像的方式来达到目的，相对于繁琐的一步一步配置，个人还是很推荐docker的。