Database Comparisons
SQL Databases
- relational normalized data (minimum duplicate data)
- Uses B Tree or B+ Tree indexing, where B+ Tree is more common
- Fast reads but slow writes
- good for situations where correctness is more important than performance, like banking, job scheduling, etc.
How does SQL handle unexpected system shutdowns?
- SQL uses WAL (write ahead log) to logs every commands. Insert/Update/Delete are first written to WAL before executed
- Only a "COMMIT" command in WAL indicates commit, others are ignored
- It is possible that there maybe inconsistencies between WAL and actual data state, for example WAL was written but actual data is not, this is why all play backs in WAL are done in an idempotent way, that is running the query
twicemultiple times willnot affecthave theactualsamestate.effect as if the query was run only once- for example: insert with an id, so insert it again will not succeed
- for example: record the state before the update, and only run update if the data state is identical to the before state
- for example: delete with an id
NoSQL Database
- Some uses B Tree (mongoDB)
- Mostly uses LSM Tree (Memtable + SSTable)
- Memtable is a balanced binary search tree in memory
- Fast writes to memory
- Search is slow because it may have to search many SSTables for the target key. Each SSTable requires O(Log(n)) to search (binary search because each SSTable is sorted), and O(n) for all the SSTables
MongoDB
- a NoSQL database with B+ Tree indexing
- document data model
- denomalized: data that need to be accessed together should be stored together, minor data duplication is ok
- supports transaction
- flexiby schema
- usually used in chats, but rarely makes sense in other system design situations. nothing is "special" about it.
- more like a SQL database without the need of table joins, or a "NoSQL" style of SQL database
Cassandra
- wide column store (like a spreadsheet)
- good for many situation
- multileader/leaderless replication (configurable)
- conflict write is resolved with "last one wins", this is not a very ideal conflict resolution choice
- uses LSM tree
- good for high write volume, where consistency is not as important, all writes and reads go to the same shard
- good for chat app
Riak
- Key-value store
- multileader/leaderless replication
- uses LSM tree
- CRDT for write conflict resolution
- good for high write volume, where consistency is not as important, all writes and reads go to the same shard
HBase
- wide column store
- single leader replication, no write conflicts
- uses LSM tree
- uses column wide storage (data in the same column are stored together)
- good for application that need fast column reads
- for example instagram, all images of the post are stored in the same column, so can be quickly queried, compared to row-wide storage where you need to read the image column from all rows
Memcahced and Redis
- Key-value stores in memory
- uses hash map
- good for extremely quick read and writes, like cache, or geo spatial index for Lyft
- but can be expensive, because storing in memory is more expensive than storing on disk
Neo4j
- graph database
Timeseries
- TimeScaleDB / Apache Druid
- IoT sensor data
- uses LSM tree
NewSQL
- VoltDB
- SQL in memory
- Spanner
- SQL with GPS clocks
What is partitioning and what is sharding?
- partitioning: divide a table into subsets within the same node
- sharding: a more advanced and distributed form of horizontal partitioning
- range-based sharding: good locality, but bad if hot spots exist
- hash-based sharding: evenly distributed, but no data locality for range queries
- to mitigate data locality issue, we can use a secondary index
- geo-based sharding:
What is secondary indexes?
-
additional index on a different column
-
local secondary index:
- an additional index on the column in each of the sharded nodes
- reads need to query from all nodes
- write is same as before, still write to one shard
-
global secondary index:
- copy all data, order them by the new index, and shard those
- two copies of the data, possibly belonging to two differents nodes
- faster reads, because no need to read from all nodes anymore,
- horrible write, because we need to maintain consistency between the two seconds of data possiblly located on two nodes. Distributed transaction and 2 phase commit is required in this case.
-
example of local secondary index:
- table is sharded based on user_id
- secondary index is based on age
- query users whose age is in (30, 60)
- since shard key was user_id, there is no guarantee where those users are distributed, so all shards needs to be queried, and results needs to be merged
-
example of global secondary index:
- "table1" is sharded based on user_id
- another duplicated table "table2" is sharded based on age
- merge(all_nodes("table1")) = merge(all_nodes("table2"))
- but on any specific node, table1 != table2 because they're indexed on diffrent columns
- reading is fast, but writing requires distributed transaction, or 2 phase commit
What is 2 phase commit?
- We need atomicity
- Two phases
- Growing Phase: the transaction acquires all the necessary locks on data items it needs to access
- Shrinking Phase: the transaction holds all acquired locks and begins releasing htem
- distributed transactions
- It prevents data anomalies like dirty reads, lost updates, and phantom reads
- but it may lead to deadlocks