Elasticsearch Search Features & Capabilities

Kaushik Jeyaraman
5 min readMay 17, 2021

--

What is Elasticsearch

Elasticsearch is a distributed, RESTful search and analytics engine capable of storing data and searching it in near real-time. It is based on the Lucene library.

It Provides all search features required by a search engine and is often used for enabling search functionality for applications.

How Does a traditional search work

Traditional search works by iterating through every word one by one and comparing it with the search key. As you can see it is definitely not an efficient way to do it.

Traditional search

How Elasticsearch works

Elasticsearch is capable of providing near real-time search results. It achieves this by using a special data structure called an inverted index. An inverted index consists of a list of all unique words that appear for any document and every word is mapped to its corresponding position in the document. The inverted index is to allow fast full-text searches, at a cost of increased processing when a document is added to the database.

Elasticsearch Inverted Index working

In the above example when the user searches for “blue sky”. Elasticsearch looks up the inverted index and filters document containing the word “blue”. Then it looks up to find documents containing the word “sky” and delivers the result.

How Relevance Score is Calculated

Tf(term frequency in the document): A measure of how frequent a term appears in the document. The higher the frequency, the higher the Relevance

tf(t in d) = sqrt(frequency)

idf(inverse document frequency): The measure of how rare the term appears across all documents. Rarer the term, higher the relevance.

idf(t) = 1 + log(numDocs/(docFreq + 1))

Field Length Norm: How long the field is? The shorter the field, the higher the relevance.

norm(d) = 1 / sqrt(numTerms)

Search Features

Filter Query: All it does is filter the documents containing the search key. It ignores relevance and just considers if the search key is found. No relevance score is assigned

Filter based search

Search with Relevance Score: This is different from the filter query. It tells how relevant the document is with respect to the search term. Higher the relevance score the more relevant the match is.

relevance based Search

Boosting terms: When we have multiple search keys, we might want to prioritize which is word should be given higher priority. This is achieved with Boost Query.

Boosting

Negative Boosting: Sometimes there are cases where we might want to reduce the relevance score because we are sure it talks about something else. For example, consider the below case, we know if we search for “stage 4”, we must depreciate “stage 3.”

Negative Boosting

Field-Based Boosting: We can tell elasticsearch to prioritize relevance based on the field. For example, Title is more important than the body/

Field-based boosting

Word search: If we search for “spicy” & “sauce” we get the following.

Match individual words

Phrase search: Now if we search for the phrase “spicy sauce” we get the following.

Match phrase

Proximity search: We can also specify the proximity of terms. i.e how far the given words are from each other.

Proximity with slop 1
Proximity with slop 2

That's not it.

Synonyms

How about different ways in the same things is communicated.

Synonyms

There are 2 types of synonyms

Bi-Directional: [A, B]represents both are A & B synonyms of each other

Uni-Directional: [A=>B ] represents A is a synonym of B, but B is not a synonym of A

Bi-Directional Synonyms
Uni-Directional Synonyms

English Language Variations

Tenses

Full-text search fails to match a word with different tenses

Words with the same root

variants of a word

These are handled with elasticsearch with the help of Stemming. Stemming is the process of reducing a word to its root form. This ensures variants of a word match during a search. There are various variations of stemming available, you have to ensure you chose the best one based on your use case

stemming

Fuzzy Search: It is also known as Approximate string matching. It is a technique of finding strings that match a pattern approximately

Fuzzy Matching

Provides clarity to end-user

The above result might surprise the end-user and can often lead him wondering why did the results end matching the search terms.

Well As a developer you might be aware of synonyms, But how will end-user know it?

Highlighting Results To the rescue

Elasticsearch search provides ready-made features to highlight search results. You can show users where the query matches are.

Trimming highlighted snippets

Now imagine your search result is huge and you can't effort to show the entire snippet in your web page. This is where elasticsearch enables you to smartly chose the best snippet. You can find various configurations here

Conclusion

Elasticsearch is an excellent choice for providing search functionality to your application. There are a ton of features available to choose from. We have covered only a few popular ones.

--

--

No responses yet