Elastic Search Cheatsheet
What is it?
- a convenience wrapper around Lucene to allow for fast searching in a distributed system
- it stores data itself across multiple horizontal nodes
- data are stored in JSON documents
- excels at full-text search
- near real time search results, making it suitable afor applications that require up-to-the-minute information
- analytics capabilities: aggregations, filtering, and visualzations
- Searching through partitioned database shards
- Elastic Search Caching
What is Lucene?
-
Full text search (like searching for a product on amazon)
-
databases don't work for this
- wildcard searches like 'where ... like "%item%"' is simple, doesn't handle complex queries, not performant, lacks advanced features like stemming, synonym expansion, and fuzzy matching
-
Writes first sent to memory
-
can't read until written to disk
-
then to SSTable on disk, and then compacted => LSM Tree
-
When a document is added, it has to be tokenized
-
Inverted index: given a document id, maintain a token to document id mapping, where tokens are from the doucments. For example, document 1 contains "Apple Computer", then inverted index is: {apple:1, computer: 1}
-
Search Index
-
Prefix Searching (order by prefix):
{
apple: 10, 15
banana: 31, 6
canteloupe: 4, 67
cherry: 3, 98
}
- suffix searching (ordered by suffix):
{
ananab: 31, 6
elppa: 10, 15
epuoletnac: 4, 67
yrrehc: 3, 98
}
- Lucene handles these and a lot of other complex search indexes
- Lucene runs on a single node
- Elastic Search builds on top of lucene to achieve distributed search!
Common Use Cases:
- Log management
- E-commerce Search: search relevant products fast
- Site Search: enabling users to quickly find information within a website or application
- Security Intelligence: detect and analyze security threats
- Business Analytics: analyzing customer data, website traffic, and other business metrics to gain insights
- Geospatial Data