Cassandra Cheatsheet
ref: Jordan
Key characteristics
- WideColumnStore
- Eventual Consistency
- Tunable ACID
- LSM Tree Index
- Bloom filter
Designed for single partition read and write
- It's recommended that read and write of the same data go to the same shard
Index
- no global secondary index, only local indexes
Leaderless Replication
- Using a quorum (can be configured), can even use 1 which will read and write from 1 partition
- Each write goes to all replicas, writes are considered successful if at least a quorum of nodes succeededs
- read from quorum nodes, if there are differences, use the latest timestamp
- read repair, other outdated values will be overwritten
- This is not save, Riak uses CRDT (Conflict Resolution Data Type) for write conflict resolution
- There is also a background process call Anti Entropy (merkle tree) that syncs the differences between each node
Hinted Handoff
- if some replicas cannot handle writes, the coordinator node will store the write to be sent to them later
Gossip Protocol
- used to detech node failures
Single Node
-
Memtable + SSTables. Fast write and slow read
-
only row level locking, no ACID transaction
Use Cases
- Write heavy Applications
- if data is generally self contained, and only needs to be fetched with other data from its partition, then use Cassandra
- ex): Sensor Readings, Chat Messages, User Activity Tracking, etc.
Pros
- good when data is generally self contained, and only needs to be fetched with other data from its partition (no joins)
- useful for write heavy applications like sensor readings, chat messages, user activity tracking, etc.
- designed for massive scale (sharded with leaderless replica)
Cons
- lack of strong consistency (quorum aren't perfect)
- lack of support for data relationships
- lack of global secondary indexes