Elastic Search Cheatsheet

a convenience wrapper around Lucene to allow for fast searching in a distributed system
it stores data itself across multiple horizontal nodes
data are stored in JSON documents
excels at full-text search
near real time search results, making it suitable afor applications that require up-to-the-minute information
analytics capabilities: aggregations, filtering, and visualzations
Searching through partitioned database shards
Elastic Search Caching

Full text search (like searching for a product on amazon)
databases don't work for this
- wildcard searches like 'where ... like "%item%"' is simple, doesn't handle complex queries, not performant, lacks advanced features like stemming, synonym expansion, and fuzzy matching
Writes first sent to memory
can't read until written to disk
then to SSTable on disk, and then compacted => LSM Tree
When a document is added, it has to be tokenized
Inverted index: given a document id, maintain a token to document id mapping, where tokens are from the doucments. For example, document 1 contains "Apple Computer", then inverted index is: {apple:1, computer: 1}
Search Index
Prefix Searching (order by prefix):

  {
    apple: 10, 15
    banana: 31, 6
    canteloupe: 4, 67
    cherry: 3, 98
  }

  {
    ananab: 31, 6
    elppa: 10, 15
    epuoletnac: 4, 67
    yrrehc: 3, 98
  }

Log management
E-commerce Search: search relevant products fast
Site Search: enabling users to quickly find information within a website or application
Security Intelligence: detect and analyze security threats
Business Analytics: analyzing customer data, website traffic, and other business metrics to gain insights
Geospatial Data