Cassandra Cheatsheet

ref: Jordan

Key characteristics

WideColumnStore
Eventual Consistency
Tunable ACID
LSM Tree Index
Bloom filter

Designed for single partition read and write

It's recommended that read and write of the same data go to the same shard

Index

no global secondary index, only local indexes

Leaderless Replication

Using a quorum (can be configured), can even use 1 which will read and write from 1 partition
Each write goes to all replicas, writes are considered successful if at least a quorum of nodes succeededs
read from quorum nodes, if there are differences, use the latest timestamp
- read repair, other outdated values will be overwritten
- This is not save, Riak uses CRDT (Conflict Resolution Data Type) for write conflict resolution
There is also a background process call Anti Entropy (merkle tree) that syncs the differences between each node

Hinted Handoff

if some replicas cannot handle writes, the coordinator node will store the write to be sent to them later

Gossip Protocol

used to detech node failures

Single Node

Memtable + SSTables. Fast write and slow read
only row level locking, no ACID transaction

Use Cases

Write heavy Applications
if data is generally self contained, and only needs to be fetched with other data from its partition, then use Cassandra
ex): Sensor Readings, Chat Messages, User Activity Tracking, etc.

Pros

good when data is generally self contained, and only needs to be fetched with other data from its partition (no joins)
useful for write heavy applications like sensor readings, chat messages, user activity tracking, etc.
designed for massive scale (sharded with leaderless replica)

Cons

lack of strong consistency (quorum aren't perfect)
lack of support for data relationships
lack of global secondary indexes

Back to top