Elasticsearch Search Features & Capabilities
What is Elasticsearch
Elasticsearch is a distributed, RESTful search and analytics engine capable of storing data and searching it in near real-time. It is based on the Lucene library.
It Provides all search features required by a search engine and is often used for enabling search functionality for applications.
How Does a traditional search work
Traditional search works by iterating through every word one by one and comparing it with the search key. As you can see it is definitely not an efficient way to do it.
How Elasticsearch works
Elasticsearch is capable of providing near real-time search results. It achieves this by using a special data structure called an inverted index. An inverted index consists of a list of all unique words that appear for any document and every word is mapped to its corresponding position in the document. The inverted index is to allow fast full-text searches, at a cost of increased processing when a document is added to the database.
In the above example when the user searches for “blue sky”. Elasticsearch looks up the inverted index and filters document containing the word “blue”. Then it looks up to find documents containing the word “sky” and delivers the result.
How Relevance Score is Calculated
Tf(term frequency in the document): A measure of how frequent a term appears in the document. The higher the frequency, the higher the Relevance
tf(t in d) = sqrt(frequency)
idf(inverse document frequency): The measure of how rare the term appears across all documents. Rarer the term, higher the relevance.
idf(t) = 1 + log(numDocs/(docFreq + 1))
Field Length Norm: How long the field is? The shorter the field, the higher the relevance.
norm(d) = 1 / sqrt(numTerms)
Search Features
Filter Query: All it does is filter the documents containing the search key. It ignores relevance and just considers if the search key is found. No relevance score is assigned
Search with Relevance Score: This is different from the filter query. It tells how relevant the document is with respect to the search term. Higher the relevance score the more relevant the match is.
Boosting terms: When we have multiple search keys, we might want to prioritize which is word should be given higher priority. This is achieved with Boost Query.
Negative Boosting: Sometimes there are cases where we might want to reduce the relevance score because we are sure it talks about something else. For example, consider the below case, we know if we search for “stage 4”, we must depreciate “stage 3.”
Field-Based Boosting: We can tell elasticsearch to prioritize relevance based on the field. For example, Title is more important than the body/
Word search: If we search for “spicy” & “sauce” we get the following.
Phrase search: Now if we search for the phrase “spicy sauce” we get the following.
Proximity search: We can also specify the proximity of terms. i.e how far the given words are from each other.
That's not it.
Synonyms
How about different ways in the same things is communicated.
There are 2 types of synonyms
Bi-Directional: [A, B]represents both are A & B synonyms of each other
Uni-Directional: [A=>B ] represents A is a synonym of B, but B is not a synonym of A
English Language Variations
Tenses
Words with the same root
These are handled with elasticsearch with the help of Stemming. Stemming is the process of reducing a word to its root form. This ensures variants of a word match during a search. There are various variations of stemming available, you have to ensure you chose the best one based on your use case
Fuzzy Search: It is also known as Approximate string matching. It is a technique of finding strings that match a pattern approximately
Provides clarity to end-user
The above result might surprise the end-user and can often lead him wondering why did the results end matching the search terms.
Well As a developer you might be aware of synonyms, But how will end-user know it?
Highlighting Results To the rescue
Elasticsearch search provides ready-made features to highlight search results. You can show users where the query matches are.
Trimming highlighted snippets
Now imagine your search result is huge and you can't effort to show the entire snippet in your web page. This is where elasticsearch enables you to smartly chose the best snippet. You can find various configurations here
Conclusion
Elasticsearch is an excellent choice for providing search functionality to your application. There are a ton of features available to choose from. We have covered only a few popular ones.