Skip to content

Latest commit

 

History

History
186 lines (110 loc) · 10.8 KB

File metadata and controls

186 lines (110 loc) · 10.8 KB

Consistency Models

What is a Consistency Model?

A consistency model defines the rules for how and when the changes made to one replica of data in a distributed system become visible to other replicas or users. It specifies how data reads and writes are managed across multiple nodes to maintain data consistency.

In distributed systems, achieving consistency while ensuring performance, availability, and fault tolerance is challenging. Various consistency models provide different levels of guarantees based on the specific needs of the system.


Why are Consistency Models Important?

Consistency models are crucial in distributed systems because they dictate how the system behaves under various failure scenarios, network delays, or node failures. Depending on the consistency model chosen, the system may prioritize strong guarantees (e.g., all users see the same data instantly) or performance (e.g., faster responses but with the possibility of seeing outdated data).


Types of Consistency Models

1. Strong Consistency

  • Definition: Every read returns the most recent write. After a write is performed, all subsequent reads will return that written value across all nodes in the system.

  • Analogy: Imagine writing in a shared notebook. As soon as you write something, everyone else who opens the notebook sees the latest update instantly.

  • Pros:

    • Provides the strongest guarantee of consistency.
    • Ideal for systems where accuracy is critical, such as banking or financial transactions.
  • Cons:

    • High latency: All replicas must agree on the latest value, which can be slow in geographically distributed systems.
    • Poor availability under network partitions (as per the CAP Theorem).
  • Use Case: Banking systems where account balances must always reflect the most recent transactions.

Example: In a strongly consistent system, if you withdraw money from your bank account, any attempt to check your balance afterward will always show the updated amount, even if that means waiting for all replicas to sync up.


2. Eventual Consistency

  • Definition: In an eventually consistent system, all replicas will eventually converge to the same value, but not necessarily immediately. After a write, different nodes may return outdated data until they eventually synchronize.

  • Analogy: Imagine multiple copies of a book distributed to different libraries. When a new chapter is added, it takes some time before all libraries receive the update, but eventually, all libraries will have the same version.

  • Pros:

    • Low latency: Allows fast reads and writes since the system doesn’t wait for all replicas to synchronize immediately.
    • Scales well in distributed systems and remains available during network partitions.
  • Cons:

    • Stale data: Reads may return outdated or stale data until replicas sync.
    • Complex conflict resolution if two nodes make different updates to the same data.
  • Use Case: Social media platforms where immediate consistency is not critical, and it is acceptable for users to see slightly outdated posts for a short time.

Example: In a social media app, when you post a new status, it may take a few seconds for all your friends to see it due to eventual consistency across data replicas.


3. Causal Consistency

  • Definition: In causal consistency, operations that are causally related (e.g., one operation depends on the result of another) are guaranteed to be seen by all replicas in the same order. Unrelated operations can be seen in different orders.

  • Analogy: Imagine that you post on social media, and then someone comments on your post. Causal consistency ensures that people will see your post before the comment. However, unrelated posts or comments might be seen in a different order.

  • Pros:

    • Captures the causal relationship between operations, preserving a natural flow of data.
    • Offers better performance than strong consistency while still preserving some ordering guarantees.
  • Cons:

    • More complex to implement than eventual consistency.
    • Can lead to inconsistencies for unrelated operations, as they might appear in a different order across nodes.
  • Use Case: Messaging applications where it's crucial that replies are seen after the original message but where unrelated messages can be seen in different orders.

Example: In a messaging app, if Alice sends "Hello" and then Bob replies "Hi", all users will see Alice's message before Bob's reply, even if they don’t receive the messages at the same time.


4. Read-Your-Writes Consistency

  • Definition: A system that ensures read-your-writes consistency guarantees that after you perform a write, all subsequent reads by you will reflect that write.

  • Analogy: Imagine writing a note on a whiteboard. After you write it, you will always see that note, but others might not see it immediately.

  • Pros:

    • Ideal for session-based applications where users expect to see their own updates immediately.
    • Improves user experience by ensuring consistency for individual users.
  • Cons:

    • Other users might see stale data, as the guarantee only applies to the user who performed the write.
  • Use Case: Profile update systems where users expect to see their changes immediately after they save them.

Example: If you update your profile picture on a website with read-your-writes consistency, you will always see the updated picture, even if others are temporarily seeing the old one.


5. Monotonic Reads Consistency

  • Definition: A system with monotonic reads ensures that once you read a value, all subsequent reads will return the same or more recent data, even if the system is eventually consistent.

  • Analogy: Imagine checking the stock price of a company. Once you see the price at $100, you will never see an earlier price (e.g., $95), even if the system is eventually consistent.

  • Pros:

    • Provides a stronger guarantee than eventual consistency by ensuring data doesn’t "jump back" to a previous state.
    • Reduces confusion by maintaining a consistent read history for users.
  • Cons:

    • Can increase latency compared to purely eventual consistency.
    • Does not guarantee immediate visibility of writes across replicas.
  • Use Case: Stock trading platforms where it’s important to ensure that users don’t see older prices after they’ve seen a more recent price.

Example: In a stock trading app, after you see that the stock price of Apple is $150, you won’t see a price lower than $150, even if the system is asynchronously updating prices.


6. Monotonic Writes Consistency

  • Definition: A system with monotonic writes consistency ensures that writes by the same client are applied in the order they were issued. This prevents writes from being applied out of order across replicas.

  • Analogy: Imagine submitting edits to a shared document. Monotonic writes ensure that the changes are applied in the order you made them, so no edits are "out of sequence."

  • Pros:

    • Prevents out-of-order writes, ensuring that changes are applied in the correct sequence.
    • Essential for systems where write order affects the correctness of data (e.g., incrementing a counter).
  • Cons:

    • Does not guarantee immediate propagation of writes across replicas.
    • More complex to implement in distributed environments.
  • Use Case: Financial transactions where it's critical that all debits and credits are applied in the correct order.

Example: If you deposit $100 into your bank account and then withdraw $50, the system will always ensure that the deposit happens before the withdrawal, even if there’s a delay in synchronizing replicas.


Choosing a Consistency Model

When designing a distributed system, selecting the right consistency model depends on the specific requirements of the application. Here are some key considerations:

  1. Data Freshness: If your application requires the freshest data at all times, you might prefer strong consistency. For less critical systems, eventual consistency can provide better performance.

  2. Latency Sensitivity: If low latency is critical, eventual consistency models may be more suitable, as they reduce the overhead of waiting for synchronization across nodes.

  3. Conflict Resolution: Systems with strong consistency handle conflicts by ensuring only one value is accepted. In eventually consistent systems, conflicts may arise, and conflict resolution strategies need to be in place.

  4. Use Case: If users expect to see their changes immediately (e.g., profile updates), read-your-writes consistency might be the right choice. If the order of operations matters (e.g., financial transactions), monotonic writes consistency is important.


Real-World Examples of Consistency Models

  1. Amazon DynamoDB:

    • Eventual Consistency: By default, DynamoDB offers eventual consistency for faster reads and writes.
    • Strong Consistency: DynamoDB also supports optional strong consistency if needed.
  2. Google Spanner:

    • Strong Consistency: Google Spanner uses TrueTime to ensure strong consistency across globally distributed databases, guaranteeing that all nodes see the same data simultaneously.
  3. Cassandra:

    • Eventual Consistency: Cassandra is designed to prioritize availability and partition tolerance (AP system) and provides eventual consistency with tunable consistency levels.

Important Terms

  • Consistency: Ensuring that all nodes in a distributed system see the same data at the same time.
  • Eventual Consistency: A form of consistency where all nodes will converge to the same value, but there may be temporary discrepancies.
  • Causal Consistency: A model where causally related operations are guaranteed to be seen in the correct order.
  • Read-Your-Writes Consistency: Guarantees that after a client writes data, they will always see their own update.
  • Monotonic Reads/Writes: Ensures that reads or writes are seen in a consistent order, without reverting to previous values.

Conclusion

Understanding consistency models is critical when designing distributed systems. Different models offer trade-offs between data freshness, latency, and availability. By selecting the appropriate consistency model for your system's requirements, you can balance performance and reliability in large-scale distributed environments.


Further Reading