Fundamentals
ACID Principle
-
Atomicity: the entire transaction happens at once, or doesn't happen at all
-
Consistency: database must be consistent before and after transactions (follow all rules, no violations)
-
Isoluation: multiple transactions can occur independently without inteference
-
Durability: changes of a successful transaction occurs even if the system failure occurs
-
Isolation Levels
- Dirty Read: read the updated value in a transaction that hasn't been commited yet
- Non-repeatable Read: during the course of a transaction, data in a row changes
- Phantom Read: two identical queries return different results (a new row has just been added by another transaction)
- MySQL is repeatable read, PostgreSQL is read commited
- https://www.youtube.com/watch?v=GAe5oB742dw
Bloom Filter
- in situations where read is not efficient (like a LSM tree), use bloom filter can eliminate some of the non-existing keys, thus return "no, key is not there" to save read time.
- it may produce falst positives (it says it's there but it is actually not there) because of hash collisions
- How to choose an appropriate size for bloom filter?
- for a chosen expected number of keys and acceptable false positive rates, use this formula:
-
m = -n * log2(p) / (log(2))^2 - n = number of keys
- p = acceptable false positive rates
- m = size of bloom filter in bits
-
- optimal number of hash functions (k):
-
k = (m / n) * log(2)
-
- for a chosen expected number of keys and acceptable false positive rates, use this formula:
C10K problem?
- the challenge of designing and implementing a server that can efficiently handle a large number of concurrent client connections, specifically 10,000.
- solution:
- asynchronous I/O
- Non-Blocking I/O
- Efficient Data Structures
- Connection Pooling
Change Data Capture
- captures changes made within a database
- Implmentations:
- Query based
- Trigger based
- Log based
- Proprietary (developed by database vendor) based
Idempotent
- some operation is idempotent if it always returns the same result.
- Ex, x=5 is idempotent while x++ is not
Thundering Herd problem?
- when a large number of requests try to access a resources in a small time frame due to reasons like cache eviction, hardware restart, etc.
- Flooded backend may further cause system failures
- To mitigate this issue
- randomized cache expiration time
- rate limiting
- backend sharding
- asynchronous cache updates
- backoff time on client side