Web Apps

Web Development

How to Create a Fuzzy Search-as-You-Type Feature with Elasticsearch and Django

April 25, 2018

Travis Luong

By Travis Luong

Tutorial: How to Create a Fuzzy Search-as-you-type Feature with Elasticsearch and Django

Recently, I had to figure out how to implement a fuzzy search-as-you-type feature for one of our Django web APIs. I couldn’t find any comprehensive tutorial on how to build this specific feature, so I decided to combine multiple sources and document the path I ended up taking.

In this tutorial, we will be using the elasticsearch-dsl library to implement fuzzy search-as-you-type functionality into a Django web app. Elasticsearch-dsl is a high-level library around elasticsearch-py, which is a low-level library for interacting with Elasticsearch.

Randall Tateishi, Django wizard at Fresh, helped me with the high-level approach to implementing this feature.

Prerequisites

Before starting this tutorial, you should already be familiar with Docker, Django, and Django Rest Framework. There are many ways you can set this all up, but this was the path I ended up taking.

I’d recommend digging through the official Elasticsearch documentation and working through the tutorials there before attempting to use elasticsearch-dsl.

Step 1: Install Elasticsearch and elasticsearch-dsl

Add the following to requirements.txt.

requirements.txt

elasticsearch
elasticsearch-dsl

You may need to run docker-compose build to install the packages.

Step 2: Add Elasticsearch container to your docker setup

Your docker-compose.yml file should look something like this. When you run docker-compose up, it should automatically pull the official Elasticsearch image and spin up an Elasticsearch server.

docker-compose.yml

services:
  db:
    image: postgres
    environment:
      - POSTGRES_USER=fresh_artichoke
      - POSTGRES_PASSWORD=fresh_artichoke
      - POSTGRES_DB=fresh_artichoke
  web:
    build: .
    environment:
      - ENVIRONMENT=local
    env_file:
      - .env
    volumes:
      - .:/app
    ports:
      - 8000:8000
    depends_on:
      - db
      - elastic
    links:
      - db
  elastic:
    image: docker.elastic.co/elasticsearch/elasticsearch:6.1.1
    ports:
      - 9200:9200
      - 9300:9300
    expose:
      - "9200"
      - "9300"

Step 3: Verify the elasticsearch server is working

To do this, you can use curl, Postman, or any other http client of your choice. Hit http://127.0.0.1:9200/ with a GET request and make sure your response looks something like this:

{
    "name": "mO2x_2W",
    "cluster_name": "docker-cluster",
    "cluster_uuid": "KPapsLdrQSiwRJvQjJaFcg",
    "version": {
        "number": "6.1.1",
        "build_hash": "bd92e7f",
        "build_date": "2017-12-17T20:23:25.338Z",
        "build_snapshot": false,
        "lucene_version": "7.1.0",
        "minimum_wire_compatibility_version": "5.6.0",
        "minimum_index_compatibility_version": "5.0.0"
    },
    "tagline": "You Know, for Search"
}

If you see this, it means your Elasticsearch instance is up and running.

Step 4: Define a DocType for your model

For the purposes of this tutorial, assume you already have a model named Skill. Here, we will define a DocType for your Skill model. DocType is an elasticsearch-dsl abstraction for defining your Elasticsearch mappings. (A mapping is a way to define how your data should be indexed and how the search should behave.)

First we create an analyzer that tells us how we want the name field to be analyzed when it is indexed and searched. In this case, the edge_ngram option gives us the fuzziness factor, so we will still get back relevant results even when there is a typo. For more details on how that all works, check out the Elasticsearch docs.

The using='art' meta specifies the Elasticsearch connection we are using, which we haven’t defined yet.

skills/doc_type.py

from elasticsearch_dsl import DocType, Text, Integer, Completion, analyzer, tokenizer

my_analyzer = analyzer('my_analyzer',
    tokenizer=tokenizer('trigram', 'edge_ngram', min_gram=1, max_gram=20),
    filter=['lowercase']
)

class SkillDoc(DocType):
  name = Text(
    analyzer=my_analyzer
  )
  id = Integer()

  class Meta:
    index = 'skill'
    using = 'art'

In our model, we add an indexing instance method that adds the object instance to the Elasticsearch index via the DocType we just created. I borrowed the idea from this article.

skills/models.py

from django.db import models
from elasticsearch_dsl import Index

from .doc_type import SkillDoc

class Skill(models.Model):
    name = models.CharField(max_length=30)

    def __str__(self):
        return self.name

    class Meta:
        ordering = ('name',)

    def indexing(self):
        doc = SkillDoc(
            meta={'id': self.id},
            name=self.name,
            id=self.id
        )
        doc.save()
        return doc.to_dict(include_meta=True)

Step 5: Set up signal to update index whenever object is saved

We create a signals.py file where we define a post save hook to update the index whenever an instance is saved.

skills/signals.py

from django.db.models.signals import post_save
from django.dispatch import receiver

from .models import Skill
from .doc_type import SkillDoc

@receiver(post_save, sender=Skill)
def my_handler(sender, instance, **kwargs):
    instance.indexing()

In the app ready method, we import the signals and then create the connection to Elasticsearch. We give our connection an alias of art, which we can reference from other parts of our app. We also wrap our connection code in try block in case the connection fails.

skills/apps.py

from django.apps import AppConfig
from elasticsearch_dsl import connections
from django.conf import settings
class SkillsConfig(AppConfig):
    name = 'skills'
    def ready(self):
        import skills.signals
        try:
          connections.create_connection(
              'art',
              hosts=[{'host': settings.ES_HOST, 'port': settings.ES_PORT}])
        except Exception as e:
          print(e)

Don’t forget this line in __init__.py, or else the signals won’t be properly loaded.

skills/init.py

default_app_config = 'skills.apps.SkillsConfig'

Step 6: Write a management command to index data

The next step is to write a management command that will create an Elasticsearch index and then do a bulk indexing of your data into that index.

skills/management/commands/index_skills.py

import time
import os
from django.conf import settings
from django.core.management.base import BaseCommand, CommandError
from elasticsearch_dsl import Search, Index, connections
from elasticsearch.helpers import bulk
from elasticsearch import Elasticsearch
from skills.models import Skill
from skills.doc_type import SkillDoc

class Command(BaseCommand):
    help = 'Indexes Skills in Elastic Search'
    def handle(self, *args, **options):
        es = Elasticsearch(
            [{'host': settings.ES_HOST, 'port': settings.ES_PORT}],
            index="skill"
        )
        skill_index = Index('skill', using='art')
        skill_index.doc_type(SkillDoc)
        if skill_index.exists():
            skill_index.delete()
            print('Deleted skill index.')
        SkillDoc.init()
        result = bulk(
            client=es,
            actions=(skill.indexing() for skill in Skill.objects.all().iterator())
        )
        print('Indexed skills.')
        print(result)

Make sure you set the correct environment variables for Elasticsearch.

.env

ELASTIC_SEARCH_HOST=elastic
ELASTIC_SEARCH_PORT=9200

settings.py

import os

ES_HOST = os.environ.get('ES_HOST')
ES_PORT = os.environ.get('ES_PORT')

Next, you will want to "ssh" into your docker container. To do that, run this command docker ps to see a list of your running containers. Then find your container’s name and then run docker exec -it name_of_your_container bash. After that you can run python manage.py index_skills to run the management command.

Step 7: Verify the search endpoint is working

Now you can make a POST request to http://127.0.0.1:9200/skill/_search with a body of:

{
  "query": {
    "match": {
      "name": {
        "query": "anglar",
        "max_expansions": 3
      }
    }
  }
}

As you can see, we purposely included a typo in the query parameter, and the search will still return the best results it can find. You can also test this by adding one letter at a time to your query parameter. For example, a, an, ang, etc. to see more precise results as you "type."
According to the docs, max_expansions is the maximum number of terms that the query will expand to.

Step 8: Create a Django endpoint to return Elasticsearch results

Create a view to make a request to Elasticsearch based on the query param that was passed through. (This code assumes you already have a serializer set up for your model. If not, first follow the documentation for Django Rest Framework.)

skills/views.py

import json
import os
from rest_framework.response import Response
from rest_framework.views import APIView
from elasticsearch_dsl import connections
import django_filters.rest_framework
from .models import Skill
from .serializers import SkillSerializer
from .doc_type import SkillDoc

class SkillSearchView(APIView):
    def get(self, request):
        query = request.query_params.get('q')
        ids = []
        if query:
            try:
                s = SkillDoc.search()
                s = s.query('match', name=query)
                response = s.execute()
                response_dict = response.to_dict()
                hits = response_dict['hits']['hits']
                ids = [hit['_source']['id'] for hit in hits]
                queryset = Skill.objects.filter(id__in=ids)
                skill_list = list(queryset)
                skill_list.sort(key=lambda skill: ids.index(skill.id))
                serializer = SkillSerializer(skill_list, many=True)
            except Exception as e:
                skills = Skill.objects.filter(name__icontains=query)
                serializer = SkillSerializer(skills, many=True)
            return Response(serializer.data)

The code makes a search request to the Elasticsearch index, which returns a list of documents sorted by best match. First it parses for the ids of the objects, then it makes an "in" query for all the skills that match the ids in the list.

However, Django ORM returns the results in a different order, so we’ll have to reorder them with a sort based on the original ordering of the ids.

We also wrap the Elasticsearch query in a try block and if it fails we fall back to a standard Django ORM query.

Next, register the route for that view.

urls.py

...
from skills import views as skills_api
...
urlpatterns = [
  ...
  url(r'^api/skills_search/', skills_api.SkillSearchView.as_view()),
  ...
]

Step 9: Test your Django API endpoint

Make a GET request to http://127.0.0.1:8000/api/skills_search/?q=angular. This should return a list of objects sorted by most relevant match.

I hope this tutorial helped you to get a fuzzy search-as-you-type functionality going for your Django web API! Are there any tips you’d add?

Travis Luong

Travis Luong

Unless otherwise specified, source code in this post is licensed under a
Creative Commons Attribution 4.0 International license (CC BY 4.0).

You might also like...

7

Feb.

Jeff Dance

10 Factors for Choosing a CMS

There are dozens of Content Management System (CMS) platforms available to to help you manage the content, marketing, and SEO on your website. But with all the good options out there, how do you know how to choose the right CMS? Consider the following 10 factors when choosing your CMS. #1 Price Some CMS licenses start … Continued

4

Jan.

Ben Spencer

You’ve Beaten B.O.C.O. – Now, Deliver Your Product to the World

Over the course of 2018, Fresh’s designers and developers created an interactive digital journey to educate companies and teams on the UX Design Process. In the research, design, and test worlds, users undertook an adventure to beat B.O.C.O., or “The Beast of Conflicting Opinions.” The beginning of 2019 sees B.O.C.O. defeated at last. Now, you … Continued

12

Apr.

Steve Hulet

6 Things to Look For in a Web Hosting Provider

People often ask us who we recommend for web hosting. Different needs dictate different solutions but there are some universal requirements which almost always need to be met. We have client sites running on many different hosting providers but rather than discuss specific companies let’s go over what we recommend looking for in a web … Continued