Skip to main content

Elastic Search Cheatsheet

What is it?

  • a convenience wrapper around Lucene to allow for fast searching in a distributed system
  • it stores data itself across multiple horizontal nodes
  • data are stored in JSON documents
  • excels at full-text search
  • near real time search results, making it suitable afor applications that require up-to-the-minute information
  • analytics capabilities: aggregations, filtering, and visualzations
  • Searching through partitioned database shards
  • Elastic Search Caching

What is Lucene?

  • Full text search (like searching for a product on amazon)

  • databases don't work for this

    • wildcard searches like 'where ... like "%item%"' is simple, doesn't handle complex queries, not performant, lacks advanced features like stemming, synonym expansion, and fuzzy matching
  • Writes first sent to memory

  • can't read until written to disk

  • then to SSTable on disk, and then compacted => LSM Tree

  • When a document is added, it has to be tokenized

  • Inverted index: given a document id, maintain a token to document id mapping, where tokens are from the doucments. For example, document 1 contains "Apple Computer", then inverted index is: {apple:1, computer: 1}

  • Search Index

  • Prefix Searching (order by prefix):

  {
    apple: 10, 15
    banana: 31, 6
    canteloupe: 4, 67
    cherry: 3, 98
  }
  • suffix searching (ordered by suffix):
  {
    ananab: 31, 6
    elppa: 10, 15
    epuoletnac: 4, 67
    yrrehc: 3, 98
  }
  • Lucene handles these and a lot of other complex search indexes
  • Lucene runs on a single node
  • Elastic Search builds on top of lucene to achieve distributed search!

Common Use Cases:

  • Log management
  • E-commerce Search: search relevant products fast
  • Site Search: enabling users to quickly find information within a website or application
  • Security Intelligence: detect and analyze security threats
  • Business Analytics: analyzing customer data, website traffic, and other business metrics to gain insights
  • Geospatial Data