Article

From Keywords to Natural Language Query: How Typesense Brings the Power of LLMs to Your Search

Typesense Hero

Typesense: An ideal choice to enhance search functionality

Typesense is an open-source search engine that delivers fast, efficient search capabilities for applications and websites. With features like full-text search, real-time indexing, and typo tolerance, Typesense is an ideal choice for developers and project managers looking to enhance their search functionality.

In this guide, we’ll introduce you to Typesense, walk you through the setup process, and share insights on how to use it in your projects effectively.

What is Typesense?

Typesense is an advanced search engine built for performance that features:

  • Full-text search capabilities: Text indexing, tokenization, relevance ranking, synonym and stemming functionality, and phrase and proximity searches enhance the user experience by enabling seamless information retrieval from large data sets.
  • Real-time indexing: Instant updates, reduced latency, background processing, data growth scalability for high traffic, and an event-driven approach make new or updated documents within a search database immediately available, without noticeable delay for the user.
  • Typo tolerance: Error detection, “fuzzy matching,” configurable tolerance levels, and handling for multiple error types create more robust search functionality, ensuring that users can still find the information they need regardless of small typographical mistakes in their queries.

Typesense Use Cases

Typesense can be beneficial in various scenarios, including:

E-commerce sites

  • Product Search: Typesense allows users to search for products based on various attributes, such as name, description, category, or tags. Its full-text search capabilities ensure that relevant products are retrieved quickly.
  • Faceted Search: E-commerce platforms often provide faceted search options that allow users to filter products by specific attributes like size, color, price range, or brand. This powerful search engine tool supports these filters, making it easier for users to narrow their choices.
  • Relevance Ranking: This search engine provides the ability to mix relevance and ranking search, allowing e-commerce to customize the search experience.
  • Improved User Experience: Enhanced search capabilities lead to a more satisfying and efficient shopping experience, reducing frustration from poorly performing search functions; this search engine alternative is also designed for speed and efficiency, providing instantaneous search results even as the product catalog grows.
  • Configurability: The robust search engine’s features can be tailored to meet specific business needs, allowing e-commerce platforms to adapt their search mechanisms based on customer feedback and market trends.
  • Cost-Effective: Implementing this search engine can be more cost-effective compared to some other search solutions, especially for mid-sized to large e-commerce platforms looking for robust search capabilities.

Blogs and content management systems

  • Content Search: Typesense enables powerful full-text search capabilities, allowing users to search for articles based on titles, authors, keywords, and content within the articles. This helps users quickly find the information they are looking for.
  • Faceted Search: For blogs or CMS with diverse content categories (e.g., different topics, authors, or tags), this tool supports faceted search, allowing users to filter and refine results based on these classifications.
  • Content Recommendation: This tool can also be used to facilitate content recommendations based on user searches and interactions, encouraging readers to explore related articles.
  • Increased Page Views and Engagement: By implementing effective search features, blogs, and CMS platforms can encourage users to explore more articles and content, resulting in increased page views and overall engagement.
  • Improved Content Discoverability: By supporting advanced search features, Typesense enhances the discoverability of older or less-popular articles, allowing users to benefit from the entirety of the content library.
  • Flexibility and Customization: Typesense offers customizable configurations, allowing bloggers and content managers to tailor search functionalities to align with their specific audience needs and search behaviors.

Documentation and knowledge bases​

  • Enhanced Search Functionality: Typesense provides robust full-text search capabilities that allow users to search across documents by keywords, topics, sections, or phrases, helping them find relevant information quickly.
  • Indexing Documentation: This search engine tool can index a wide array of documentation formats (e.g., Markdown, HTML, PDF), making it possible to search and retrieve information from multiple sources seamlessly.
  • Faceted Search and Filtering: Users can filter search results based on categories, tags, document types, or relevance, enabling them to narrow down their search to the most pertinent documentation.
  • Real-Time Updates: When documentation gets updated or new articles are added, Typesense indexes these changes in real time, ensuring users have access to the latest information without delays.
  • Support for Complex Queries: Typesense allows for complex query formulations, enabling users to utilize Boolean operators or phrase searches to refine their queries and obtain more accurate results.
  • Content Recommendations: Typesense can facilitate recommendations for related articles or documentation based on user search behavior, encouraging users to explore related topics or expand their knowledge.
  • Faster Content Updates: The real-time indexing capability means that users will always see the most current information, which is critical for maintaining accurate and up-to-date documentation.

How to Use Typesense

Installation

To get started with Typesense, follow these steps for installation:

Step 1: Set Up the Typesense Server

You can set up the Typesense server using Docker or Docker Compose. Choose the method that best fits your environment:

Using Docker: This method is quick and ideal for testing or small-scale deployments.

docker run -p 8108:8108 -v ./typesense-data:/data typesense/typesense:27.1 --data-dir /data --api-key=YOUR_API_KEY --enable-cors

Note: Replace YOUR_API_KEY with a secure API key.

Using Docker Compose: Suitable for more complex setups, especially when integrating with other services.

version: '3'
services:
  typesense:
    image: typesense/typesense:27.1
    ports:
      - "8108:8108"
    volumes:
      - ./typesense-data:/data
    command: "--data-dir /data --api-key=YOUR_API_KEY --enable-cors"
  • Note: Ensure your docker-compose.yml file is correctly configured and replace YOUR_API_KEY with a secure API key.

Step 2: Install the Typesense Client

Add the Typesense client to your project to enable communication with the server:

npm install typesense

Step 3: Configure Typesense in Your Project

Connect your project to the Typesense server by configuring the client. This involves specifying the server details and API key in your code.

Example configuration code will be provided in the next section.

Tips:

  • Ensure Docker is installed and running on your machine before starting.
  • Use a strong, unique API key for security purposes.
  • Check the Typesense documentation for more detailed setup instructions and troubleshooting tips.

Example

Javascript:

import Typesense from "typesense";
import { CollectionCreateSchema } from "typesense/lib/Typesense/Collections";

// Initial TypeSense client
const client = new Typesense.Client({
    nodes: [
        {
            host: "typesense", // Change to "localhost" or "host.docker.internal" if you didn't use Docker Compose or other services doesn't in the same Docker network
            port: 8108,
            protocol: "http",
        },
    ],
    apiKey: "YOUR_API_KEY"
});

// Define schema
const schema: CollectionCreateSchema = {
    name: "posts",
    fields: [
        { name: "id", type: "string" },
        { name: "title", type: "string" },
        { name: "content", type: "string" },
        { name: "category", type: "string", facet: true },
    ],
};

// Create a collection
await client.collections().create(schema);

// Add document to the "posts" collection
await client
    .collections("posts")
    .documents().create({
        id: '1',
        title: 'Document Title',
        content: 'Document content',
        category: "searchEngine"
    })

Javascript:

client.collections('posts').documents().search({
    q: 'search term',
    query_by: 'title,content'
});

Additional Typesense Features

Search Relevance and Speed

Typesense optimizes search results and enhances performance for users.

Filtering

Implementing filtering in search queries with Typesense is straightforward.

Javascript:

client.collections('posts').documents().search({
  q: '*',
  query_by: 'title',
  filter_by: 'category:="searchEngine"'
});

How We Use Typesense at Fresh

Our primary use case is enhancing search functionality. We utilize Typesense to significantly improve our website’s search capabilities. By implementing Hybrid Search, users can efficiently search through various resources such as files, slides, and web content.

Implementation Process

To integrate Typesense into our system, we followed these steps:

Step 1: Design a Schema:

  • Begin by designing a schema that reflects the data you want to search. This can include multiple schemas if you have diverse data types.
export const collectionSchema: CollectionCreateSchema = {
   name: TypesenseCollection.RESOURCES, // We create an enum for the Collection name to avoid typo issues and make it easier to track reference
   fields: [
     { name: "title", type: "string" },
     { name: "category", type: "string" },
     { name: "status", type: "string" },
     { name: "publishedDate", type: "string", facet: true },
     { name: "contentHTML", type: "string", facet: true },
     { name: "contentText", type: "string", facet: true },
     { name: "contentEmbedding", type: "string" }, // The same data as contentText but we separate it to put maximum text length to the data to avoid maximum token issue when generating embedding value
     { name: "excerpt", type: "string", facet: true },
     {
       name: "embedding",
       type: "float[]",
       embed: {
         from: ["title", "excerpt", "contentEmbedding"], // These three are fields that we used for Vector Search
         model_config: {
           model_name: "openai/text-embedding-ada-002",
           api_key: environmentConfig.OPENAI_API_KEY,
         },
       },
     },
   ],
 }

Step 2: Data Synchronization:

  • Establish a method to sync data from your main database to Typesense. This involves handling Create, Update, and Delete operations to ensure data consistency.

For bulk data import, use the emplace action to efficiently update the Typesense index.

Javascript:

client
  .collections(TypesenseCollection.RESOURCES)
  .documents()
  .import(jsonl, { action: "emplace" });
  • Our main data source is WordPress, so we have two batch scripts
    • Sync data from WordPress endpoint to Postgres by converting blog post HTML to Text with Cheerio
    • Sync data from Postgres to Typesense which uses bulk data import

Step 3: Search Implementation

  • Implement search functionality using Typesense’s API to allow users to query the indexed data.

In our case, we decided to use Hybrid Search for flexibility and add Stopwords to avoid matching unnecessary words.

Javascript:

client
  .collections(TypesenseCollection.RESOURCES)
  .documents()
  .search({
    q: "Healthcare",
    query_by: "embedding,title,category,excerpt,contentText",
    filter_by: "status: [active]",
    per_page: 100,
    page: 1,
    stopwords: "en-set",
    vector_query: "embedding:([], distance_threshold: 0.3, k: 100)",
    highlight_affix_num_tokens: 10,
    prefix: false,
  });

A Retrospective: Lessons Learned

Setting up Typesense was straightforward and quick, taking less than a day to integrate into our existing project. However, managing API keys and permissions required careful attention. As advised by Typesense, avoid using the root API key in production environments for security reasons. Instead, generate scoped keys tailored to your application’s needs. For more details, refer to the Typesense API Key Documentation.

Key Benefits

  • Speed and Efficiency: Typesense provides fast search results, enhancing user experience.
  • Flexibility: Easily adaptable to various data types and search requirements.
  • Ease of Setup: Quick integration with existing projects.

By following these steps, we successfully leveraged Typesense to enhance our search functionality, providing users with a robust and efficient search experience.

Additional resources

keng (1)

Thanan Meepalang (Keng)

Sr. Full Stack Developer

Keng is a Sr. Full-stack Developer with over 6 years of experience building end-to-end applications using technologies like JavaScript, ReactJS, and Python. He has a background as a technical consultant and team lead, working with organizations to analyze, frame, and solve business challenges.

Keng received his Bachelor’s in Information technology from Siam University. In his free time, he enjoys reading novels, playing video games, and learning new technologies.