Flume Event Scalability Analysis

This is the follow up of the project in this post. Our professor asked us to perform scalability analysis of the technology that we used, and write simple report on it. I decided to analyze Flume scalability in term of the number of events that can be supported by the Flume configuration. The project itself is inspired from Mike Percy’s Flume-NG performance measurement, and I re-used some of his software components. There are two main differences between this project and Mike’s work, which are:

  •  This experiment introduces one-to-one relationship between the nodes and Flume load generator. Tht means, each Flume load generator process exists in an independent node (which is Amazon EC2 medium instance).
  • This experiment introduces cascading setup, which will verify whether there is improvement in scalability or not compared to non-cascading setup

Flume-based Independent News Aggregator

It has been more than two weeks since my last post! 🙁 I was busy with exam, project reports and my trip preparation. Finally, I manage to find time (on my journey from Warsaw to Prague) to update my blog :p

Well, I would like to cover our SDS project titled “Flume-based Independent News Aggregator” but my project-mate, Mario, has covered as well in his blog. So in this case, I’ll just give you the link to Mario’s post, which is here.

As a follow up of the project, our professor asked us to experiment with the system with regards to its scalability. It is individual project, and I plan to experiment with Flume scalability in term of number of event it can support. Mario will do something related to its realiability and fault tolerance. I plan to update this blog once I finished the project 🙂

*Update: Post about the mini project of Flume Scalability can be found here.

Consistency Tradeoff in Modern Distributed DB

Last week I had presentation about the relevancy of CAP theorem in modern distributed system design. This presentation is based on an article titled “Consistency Tradeoffs in Modern Distributed Database System Design” by Daniel J. Abadi from Yale University.

CAP theorem is widely used in Distributed Database System(DDBS) design. In a nutshell, it says that in designing modern DDBS, we only can choose two properties out of three properties that are crucial for DDBS. The aforementioned properties are Consistency (C), Availability (A) and Partition Tolerance (P).  And this diagram below summarize the available combination of CAP:

CAP Diagram
CAP Diagram

Now, the question here are, is there something wrong with CAP theorem? Is it still relevant with modern DDBS design? Continue reading Consistency Tradeoff in Modern Distributed DB