Last week I had presentation about the relevancy of CAP theorem in modern distributed system design. This presentation is based on an article titled “Consistency Tradeoffs in Modern Distributed Database System Design” by Daniel J. Abadi from Yale University.
CAP theorem is widely used in Distributed Database System(DDBS) design. In a nutshell, it says that in designing modern DDBS, we only can choose two properties out of three properties that are crucial for DDBS. The aforementioned properties are Consistency (C), Availability (A) and Partition Tolerance (P). And this diagram below summarize the available combination of CAP:
Now, the question here are, is there something wrong with CAP theorem? Is it still relevant with modern DDBS design? Well, the answer for second question is yes, of course, it is still relevant BUT the answer for the first question is also yes! CAP has a flaw in explaining why modern DDBSs reduce consistency.
Well, according to CAP theorem, consistency is reduced because
- DBSS must have Partition Tolerance (P)
- High Availability (A) is desired or part of the requirements of the database.
The next question is “what does partition tolerance really mean?”
The P in CAP has two elements
- The partition tolerance itself ~ commonly used as justification in reducing consistency
- The existence of network partition itself ~ often forgotten in justification
The second elements of CAP leads to following question:
To answer Philosoraptor question above, we need to revisit what are the design goals of modern DDBSs. It turns out to be Availability and Latency, as we all see in Dynamo (used by Amazon), Cassandra (used by Facebook), Voldemort (used by LinkedIn) and PNUTS ( used by Yahoo). To achieve Availability, they will use replica. And unfortunately, when replication is used, tradeoff between Latency and Consistency starts to arise!
To capture Consistency/Latency tradeoff, the writer proposes new metrics to be used in modern DBSS design which is called PACELC. It consists of two element PAC and ELC. PAC means when there is Partition, tradeoff between Availability and Consistency occurs. And ELC means, Else (when there is no Partition), tradeoff between Latency and Consistency occurs. Lastly, table below shows the classification of modern DBSS using the new metrics:
Overall, I believe that CAP theorem is still important for designing modern DB system. We should not abandon it. Exploring other metric to keep up to date with latest technology and consideration is good practice. And I think PACELC is worth to consider and applied in designing modern DB system.
Here is the slide that I used during presentation: