Article

Elasticsearch: Creating a fuzzy search-as-you-type feature

hero-banner

Tutorial: How to Create a Fuzzy Search-as-you-type Feature with Elasticsearch and Django

Recently, I had to figure out how to implement a fuzzy search-as-you-type feature for one of our Django web APIs. I couldn’t find any comprehensive tutorial on how to build this specific feature, so I decided to combine multiple sources and document the path I ended up taking.

In this tutorial, we will be using the elasticsearch-dsl library to implement fuzzy search-as-you-type functionality into a Django web app. Elasticsearch-dsl is a high-level library around elasticsearch-py, which is a low-level library for interacting with Elasticsearch.

Randall Tateishi, Django wizard at Fresh, helped me with the high-level approach to implementing this feature.

Prerequisites

Before starting this tutorial, you should already be familiar with Docker, Django, and Django Rest Framework. There are many ways you can set this all up, but this was the path I ended up taking.

I’d recommend digging through the official Elasticsearch documentation and working through the tutorials there before attempting to use elasticsearch-dsl.

Step 1: Install Elasticsearch and elasticsearch-dsl

Add the following to requirements.txt.

requirements.txt

elasticsearch
elasticsearch-dsl

You may need to run docker-compose build to install the packages.

Step 2: Add Elasticsearch container to your docker setup

Your docker-compose.yml file should look something like this. When you run docker-compose up, it should automatically pull the official Elasticsearch image and spin up an Elasticsearch server.

docker-compose.yml

services:
  db:
    image: postgres
    environment:
      - POSTGRES_USER=fresh_artichoke
      - POSTGRES_PASSWORD=fresh_artichoke
      - POSTGRES_DB=fresh_artichoke
  web:
    build: .
    environment:
      - ENVIRONMENT=local
    env_file:
      - .env
    volumes:
      - .:/app
    ports:
      - 8000:8000
    depends_on:
      - db
      - elastic
    links:
      - db
  elastic:
    image: docker.elastic.co/elasticsearch/elasticsearch:6.1.1
    ports:
      - 9200:9200
      - 9300:9300
    expose:
      - "9200"
      - "9300"

Step 3: Verify the elasticsearch server is working

To do this, you can use curl, Postman, or any other http client of your choice. Hit https://127.0.0.1:9200/ with a GET request and make sure your response looks something like this:

{
    "name": "mO2x_2W",
    "cluster_name": "docker-cluster",
    "cluster_uuid": "KPapsLdrQSiwRJvQjJaFcg",
    "version": {
        "number": "6.1.1",
        "build_hash": "bd92e7f",
        "build_date": "2017-12-17T20:23:25.338Z",
        "build_snapshot": false,
        "lucene_version": "7.1.0",
        "minimum_wire_compatibility_version": "5.6.0",
        "minimum_index_compatibility_version": "5.0.0"
    },
    "tagline": "You Know, for Search"
}

If you see this, it means your Elasticsearch instance is up and running.

Step 4: Define a DocType for your model

For the purposes of this tutorial, assume you already have a model named Skill. Here, we will define a DocType for your Skill model. DocType is an elasticsearch-dsl abstraction for defining your Elasticsearch mappings. (A mapping is a way to define how your data should be indexed and how the search should behave.)

First we create an analyzer that tells us how we want the name field to be analyzed when it is indexed and searched. In this case, the edge_ngram option gives us the fuzziness factor, so we will still get back relevant results even when there is a typo. For more details on how that all works, check out the Elasticsearch docs.

The using='art' meta specifies the Elasticsearch connection we are using, which we haven’t defined yet.

skills/doc_type.py

from elasticsearch_dsl import DocType, Text, Integer, Completion, analyzer, tokenizer

my_analyzer = analyzer('my_analyzer',
    tokenizer=tokenizer('trigram', 'edge_ngram', min_gram=1, max_gram=20),
    filter=['lowercase']
)

class SkillDoc(DocType):
  name = Text(
    analyzer=my_analyzer
  )
  id = Integer()

  class Meta:
    index = 'skill'
    using = 'art'

In our model, we add an indexing instance method that adds the object instance to the Elasticsearch index via the DocType we just created. I borrowed the idea from this article.

skills/models.py

from django.db import models
from elasticsearch_dsl import Index

from .doc_type import SkillDoc

class Skill(models.Model):
    name = models.CharField(max_length=30)

    def __str__(self):
        return self.name

    class Meta:
        ordering = ('name',)

    def indexing(self):
        doc = SkillDoc(
            meta={'id': self.id},
            name=self.name,
            id=self.id
        )
        doc.save()
        return doc.to_dict(include_meta=True)

Step 5: Set up signal to update index whenever object is saved

We create a signals.py file where we define a post save hook to update the index whenever an instance is saved.

skills/signals.py

from django.db.models.signals import post_save
from django.dispatch import receiver

from .models import Skill
from .doc_type import SkillDoc

@receiver(post_save, sender=Skill)
def my_handler(sender, instance, **kwargs):
    instance.indexing()

In the app ready method, we import the signals and then create the connection to Elasticsearch. We give our connection an alias of art, which we can reference from other parts of our app. We also wrap our connection code in try block in case the connection fails.

skills/apps.py

from django.apps import AppConfig
from elasticsearch_dsl import connections
from django.conf import settings
class SkillsConfig(AppConfig):
    name = 'skills'
    def ready(self):
        import skills.signals
        try:
          connections.create_connection(
              'art',
              hosts=[{'host': settings.ES_HOST, 'port': settings.ES_PORT}])
        except Exception as e:
          print(e)

Don’t forget this line in __init__.py, or else the signals won’t be properly loaded.

skills/init.py

default_app_config = 'skills.apps.SkillsConfig'

Step 6: Write a management command to index data

The next step is to write a management command that will create an Elasticsearch index and then do a bulk indexing of your data into that index.

skills/management/commands/index_skills.py

import time
import os
from django.conf import settings
from django.core.management.base import BaseCommand, CommandError
from elasticsearch_dsl import Search, Index, connections
from elasticsearch.helpers import bulk
from elasticsearch import Elasticsearch
from skills.models import Skill
from skills.doc_type import SkillDoc

class Command(BaseCommand):
    help = 'Indexes Skills in Elastic Search'
    def handle(self, *args, **options):
        es = Elasticsearch(
            [{'host': settings.ES_HOST, 'port': settings.ES_PORT}],
            index="skill"
        )
        skill_index = Index('skill', using='art')
        skill_index.doc_type(SkillDoc)
        if skill_index.exists():
            skill_index.delete()
            print('Deleted skill index.')
        SkillDoc.init()
        result = bulk(
            client=es,
            actions=(skill.indexing() for skill in Skill.objects.all().iterator())
        )
        print('Indexed skills.')
        print(result)

Make sure you set the correct environment variables for Elasticsearch.

.env

ELASTIC_SEARCH_HOST=elastic
ELASTIC_SEARCH_PORT=9200

settings.py

import os

ES_HOST = os.environ.get('ES_HOST')
ES_PORT = os.environ.get('ES_PORT')

Next, you will want to “ssh” into your docker container. To do that, run this command docker ps to see a list of your running containers. Then find your container’s name and then run docker exec -it name_of_your_container bash. After that you can run python manage.py index_skills to run the management command.

Step 7: Verify the search endpoint is working

Now you can make a POST request to https://127.0.0.1:9200/skill/_search with a body of:

{
  "query": {
    "match": {
      "name": {
        "query": "anglar",
        "max_expansions": 3
      }
    }
  }
}

As you can see, we purposely included a typo in the query parameter, and the search will still return the best results it can find. You can also test this by adding one letter at a time to your query parameter. For example, a, an, ang, etc. to see more precise results as you “type.”
According to the docs, max_expansions is the maximum number of terms that the query will expand to.

Step 8: Create a Django endpoint to return Elasticsearch results

Create a view to make a request to Elasticsearch based on the query param that was passed through. (This code assumes you already have a serializer set up for your model. If not, first follow the documentation for Django Rest Framework.)

skills/views.py

import json
import os
from rest_framework.response import Response
from rest_framework.views import APIView
from elasticsearch_dsl import connections
import django_filters.rest_framework
from .models import Skill
from .serializers import SkillSerializer
from .doc_type import SkillDoc

class SkillSearchView(APIView):
    def get(self, request):
        query = request.query_params.get('q')
        ids = []
        if query:
            try:
                s = SkillDoc.search()
                s = s.query('match', name=query)
                response = s.execute()
                response_dict = response.to_dict()
                hits = response_dict['hits']['hits']
                ids = [hit['_source']['id'] for hit in hits]
                queryset = Skill.objects.filter(id__in=ids)
                skill_list = list(queryset)
                skill_list.sort(key=lambda skill: ids.index(skill.id))
                serializer = SkillSerializer(skill_list, many=True)
            except Exception as e:
                skills = Skill.objects.filter(name__icontains=query)
                serializer = SkillSerializer(skills, many=True)
            return Response(serializer.data)

The code makes a search request to the Elasticsearch index, which returns a list of documents sorted by best match. First it parses for the ids of the objects, then it makes an “in” query for all the skills that match the ids in the list.

However, Django ORM returns the results in a different order, so we’ll have to reorder them with a sort based on the original ordering of the ids.

We also wrap the Elasticsearch query in a try block and if it fails we fall back to a standard Django ORM query.

Next, register the route for that view.

urls.py

...
from skills import views as skills_api
...
urlpatterns = [
  ...
  url(r'^api/skills_search/', skills_api.SkillSearchView.as_view()),
  ...
]

Step 9: Test your Django API endpoint

Make a GET request to https://127.0.0.1:8000/api/skills_search/?q=angular. This should return a list of objects sorted by most relevant match.

I hope this tutorial helped you to get a fuzzy search-as-you-type functionality going for your Django web API! Are there any tips you’d add?

Travis-Luong-Default-BW_optimize.jpg

Travis Luong

Full-Stack Developer

Travis is a Full-Stack Developer with 5 years of professional programming experience. Having worked as a freelance consultant, web contractor, and full-time engineer prior to joining Fresh, he excels at both server-side and client-side web development. While his expertise is in Ruby on Rails and Javascript, Travis is familiar with a variety of tools, languages, and frameworks. Most recently, he has worked on web and mobile apps for CBRE and Fornetix.