header_pic
Back to blog home

Full-Text Search Thanks To Elasticsearch

by Adam on 2020-01-20

This post will go into setting up full-text search and search suggestions for a blog application hosted on Heroku.

There is a free tier of Elasticsearch on Heroku provided by SearchBox. It supports one index and allows up to 20MB of storage. Once you've provisioned it you'll get an API key. From there you can create the index:

curl -X PUT "https://paas:[your_api_key]@oin-us-east-1.searchly.com/blog-index" -H 'Content-Type: application/json' -d'
{
    "settings": {
        "number_of_shards" : 1,
        "number_of_replicas": 0,
        "analysis": {
            "filter": {
                "autocomplete_filter": {
                    "type": "edge_ngram",
                    "min_gram": 2,
                    "max_gram": 15
                }
            },
            "analyzer": {
                "autocomplete": {
                    "type": "custom",
                    "tokenizer": "standard",
                    "filter": [
                        "lowercase",
                        "autocomplete_filter" 
                    ]
                },
                "autocomplete_search": {
                  "tokenizer": "lowercase"
                }
            },
            "tokenizer": {
                "autocomplete": {
                    "type": "edge_ngram",
                    "min_gram": 2,
                    "max_gram": 15,
                    "token_chars": [
                        "letter","digit"
                    ]
                }
            }
        }
    },
    "mappings": {
        "properties" : {
            "suggest" : {
                "type" : "completion"
            },
            "title" : {
                "type": "keyword"
            }
        }
    }
}
'

From there you can populate the index with code like this:

suggest = [word for word in title_slug.split('-')]
suggest += [tag_slug for tag_slug in tag_slugs]
doc = {
        'post_author': g.user['id'],
        'post_body': body,
        'post_title': [word for word in title_slug.split('-')],
        'post_tags': [tag_slug for tag_slug in tag_slugs],
        'post_timestamp': dt.datetime.now(),
        'suggest': suggest,
    }
result = current_app.es.index(index="blog-index", 
                              id=post_id, 
                              body=doc)
current_app.es.indices.refresh(index='blog-index')

You can search on that index like this:

query = {
    "query": {
        "multi_match": {
            "query": search,
            "fields": ["post_body", "post_title", "post_tags"]
        }
    }
}
results = current_app.es.search(index="blog-index", 
                                #doc_type="post", 
                                body=query)
scan = helpers.scan(current_app.es,query=query,scroll='1m',
                    index='blog-index')
ids = tuple(sorted(scan_result['_id'] for scan_result in scan))

And pass those ids to a SQL query:

qry = f"""
    SELECT p.id, title, body, created, author_id, username, role_id,
        pt.tags, pt.tag_slugs 
     FROM post p 
     JOIN usr u ON p.author_id = u.id
     LEFT JOIN (
        SELECT pt.post_id, string_agg(t.title, ' ') tags, 
            string_agg(t.slug, ' ') tag_slugs
        FROM tag t
            JOIN post_tag pt
            ON pt.tag_id = t.id
        GROUP BY pt.post_id
     ) pt
     ON pt.post_id = p.id
     WHERE p.id IN ({placeholders})
     ORDER BY created DESC
     LIMIT %s
     OFFSET %s;"""
db.execute(qry,ids+(PAGINATION_SIZE,offset,))
posts = db.fetchall()

I'll leave as an exercise to the reader how to get the data displayed in a template.

tags: full-text search elasticsearch python flask

Post thanked 0 time(s).

Related posts: