Skip to main content

Cassandra Cheatsheet

ref: Jordan

Key characteristics

  • WideColumnStore
  • Eventual Consistency
  • Tunable ACID
  • LSM Tree Index
  • Bloom filter

Designed for single partition read and write

Index

  • no global secondary index, only local indexes

Leaderless Replication

  • Using a quorum (can be configured), can even use 1 which will read and write from 1 partition
  • Each write goes to all replicas, writes are considered successful if at least a quorum of nodes succeededs
  • read from quorum nodes, if there are differences, use the latest timestamp
    • read repair, other outdated values will be overwritten
    • This is not save, Riak uses CRDT (Conflict Resolution Data Type) for write conflict resolution
  • There is also a background process call Anti Entropy (merkle tree) that syncs the differences between each node

Hinted Handoff

  • if some replicas cannot handle writes, the coordinator node will store the write to be sent to them later

Gossip Protocol

  • used to detech node failures

Single Node

  • Memtable + SSTables. Fast write and slow read

  • only row level locking, no ACID transaction

Use Cases

  • Write heavy Applications
  • if data is generally self contained, and only needs to be fetched with other data from its partition, then use Cassandra
  • ex): Sensor Readings, Chat Messages, User Activity Tracking, etc.

Pros

  • good when data is generally self contained, and only needs to be fetched with other data from its partition (no joins)
  • useful for write heavy applications like sensor readings, chat messages, user activity tracking, etc.
  • designed for massive scale (sharded with leaderless replica)

Cons

  • lack of strong consistency (quorum aren't perfect)
  • lack of support for data relationships
  • lack of global secondary indexes