The CAP theorem series

Let me introduce a new series of posts on the CAP theorem. CAP is a well known theorem conjectured and proven by recognized researchers in distributed systems, namely Eric Brewer, Seth Gilbert and Nancy Lynch. It is also widely used to categorize distributed applications.

A distributed system to be categorized with the CAP theorem

With CAP, do computer scientists actually have

a better classification tool than biologists?

Why you want to read this series on CAP theorem and databases

This series covers some points often forgotten or misunderstood and sometimes never mentioned, such as:

‘42’ is not a valid answer for an eventually consistent database;
Node failures and packet loss are not partitions;
Consistent in ACID and Consistent in CAP are two different things;
The CAP definition of availability is key but very different from what most people think;
There are some systems that do not fit into the three CA/CP/CA categories;
It’s possible to use the CA category, really possible,
There is no such thing as an eventually consistent and available big data database.

The posts, in recommended reading order

Comparing Eventually Consistent and CP-as-in-CAP stores: Introduces the definitions of “Consistent” and “Available” in the CAP theorem. By comparing them with the definition of “Eventual Consistency” we show the limits of these definitions.
The confusing ACID and CAP wording: Explains the terminologies of ACID and CAP, and how they overlap. Covers as well the difference between the theoretical definition of the ‘I’ in ACID and its actual implementation with multiple “isolation degrees”. It’s a good starting point if you know the “Consistency” definition of ACID but not the “Consistency” definition of CAP.
Don't use the CAP theorem for node failures: Looks at a common misunderstanding of the CAP theorem: node failures are not partitions. It’s also a pretext to look at the actual proof of CAP, and to go into the definition of “Partition” in CAP.
Don't use the CAP theorem for packet losses: Looks at another common misunderstanding. And it’s also a pretext to look at the network model used in the CAP theorem proof, and to show its difference with the TCP network protocol we know.
The unclear CP vs. CA case in CAP: Shows that there are some valid distributed systems that do not fit into any of the CAP categories. Explains what CA really means. Gives some examples on real-life systems, including the two-phase commit with heuristic decisions.
CAP, Availability, High-Availability and Big Data Databases in a World of Partitions: CAP and big data: shows that AP is impossible in an eventually consistent big data store. Explains the difference between high-availability and availability as defined by the CAP theorem. Contains a summary table with multiple types of distributed systems, their CAP-categories and their real availability.
CAP: if all you have is a timeout, everything looks like a partition: Explains how a ZooKeeper user can take advantage of the difference between a network partition and a process crash.
Real-time CAP: Delays can be included in the consistency/availability trade-off if we change the definition of availability to have timing requirements, like in a real-time system.
Forfeiting Partition Tolerance in Distributed Systems: Choosing CA makes sense... sometimes.

Copied from the storageMojo blog: courteous comments welcome, of course.

This long run

Tuesday, March 10, 2015

The CAP theorem series

Why you want to read this series on CAP theorem and databases

No comments:

Post a Comment