Skip to main content

Definitions

Bloom Filter

  • Searching for a key in a cache system might take a long time in a LSM tree because it would require to look through all SSTables before finally realizing the key doesn't exists
  • Bloom filter uses multiple different hash values to determine if the key probably exists. Like 2 dimensional array, if all hash values of the key has a flag set to 1, then the key probably exists. "probably" because hash collision is possible, let's say 3 other keys combined have the same exist hash values flaged to 1, then this is a false positive.
  • It is not a perfect solution, but it does eliminate out some of the long-reads for non-existing keys.
  • Another draw back is that it is impossible to delete a key from the hash values. Because of hash collisions, unset those flags might messes up other keys.

Hashing - Consistent Hashing

  • Using a hash ring to divide hashing space into consistent servers
  • Goal is to minimize the impact when adding/removing servers due to expected or unexpected reasons.
  • Key => hash value => position in hash ring => move in clockwise until reaching the first avaialble server
  • in case of hot spots, dynamically adding more server into that section has limited impact to other servers
  • used in Memcached and Redis, Cassandra and Riak, CDN, Load Balancer

Tree - Inorder Traversal (DFS)

  • Left subtree => tree node => right subtree
  • in BST => prints order

Tree - Preorder Traversal

  • tree node => left subtree => right subtree
  • used in copy tree?

Tree - Postorder Traversal

  • left subtree => right subtree => tree node
  • used in delete the tree?

Tree - B Tree

  • Fast read, slow writes because it might involve updating multiple pages on disk

Tree - B+ Tree

Tree - LSM Tree (Log-Structured Merge-Tree)

  • used by NoSQL
  • fast writes

Tree - Levelorder Traversal (BFS)

  • level1 left to right => level2 left to right => ...