Skip to main content

Definitions

Bloom Filter

  • Searching for a key in a cache system might take a long time in a LSM tree because it would require to look through all SSTables before finally realizing the key doesn't exists
  • Bloom filter uses multiple different hash values to determine if the key probably exists. Like 2 dimensional array, if all hash values of the key has a flag set to 1, then the key probably exists. "probably" because hash collision is possible, let's say 3 other keys combined have the same exist hash values flaged to 1, then this is a false positive.
  • It is not a perfect solution, but it does eliminate out some of the long-reads for non-existing keys.
  • Another draw back is that it is impossible to delete a key from the hash values. Because of hash collisions, unset those flags might messes up other keys.

Hashing - Consistent Hashing

  • Using a hash ring to divide hashing space into consistent servers
  • Goal is to minimize the impact when adding/removing servers due to expected or unexpected reasons.
  • Key => hash value => position in hash ring => move in clockwise until reaching the first avaialble server
  • in case of hot spots, dynamically adding more server into that section has limited impact to other servers
  • used in Memcached and Redis, Cassandra and Riak, CDN, Load Balancer

Mutex lock

What is it
  • mutual exclusion
  • a synchronization primitive used in concurrent programming to ensure that only one thread can access a shared resource at a time
How it works
  • Locking:
    • a thread that needs to access a sahred resource acquires the mutex lock
    • if the mutex is already locked by another thread, the requesting thread is blocked until the lock is released
  • Critical Section
    • once the lock is acquired, the thread enters a critical section, where it accesses and modifies the shared resource
  • Unlocking
    • After the thread finishes with the shared resource, it releases the mutex lock
Key Points:
  • Atomic: mutexes ensures that operations on shared resources are atomic, meaning they are indivisible and cannot be interupted by other threads
  • Deadlocks: Improper use of mutexes can lead to deadlocks, where two or more threads are waiting for each other to release locks, resulting in a stalemate
  • Starvation: In some cases a thread may be repeatedly denied access to a shared resource, even if it's available, due to other threads continuously acquiring and releasing the lock.
Example:
package main
import (
  "fmt"
  "sync"
)

var counter int
var mu sync.Mutex

func increment() {
  mu.Lock()
  defer mu.Unlock()
  counter++
}

func main() {
  for i:=0; i<1000; i++ {
    go increment()
  }

  // wait for al lgoroutines to finish

  fmt.Println("Final counter value: ", counter)
}

Tree - Inorder Traversal (DFS)

  • Left subtree => tree node => right subtree
  • in BST => prints order

Tree - Preorder Traversal

  • tree node => left subtree => right subtree
  • used in copy tree?

Tree - Postorder Traversal

  • left subtree => right subtree => tree node
  • used in delete the tree?

Tree - B Tree

  • Fast read, slow writes because it might involve updating multiple pages on disk

Tree - B+ Tree

Tree - LSM Tree (Log-Structured Merge-Tree)

  • used by NoSQL
  • fast writes

Tree - Levelorder Traversal (BFS)

  • level1 left to right => level2 left to right => ...