This is the follow up of the project in this post. Our professor asked us to perform scalability analysis of the technology that we used, and write simple report on it. I decided to analyze Flume scalability in term of the number of events that can be supported by the Flume configuration. The project itself is inspired from Mike Percy’s Flume-NG performance measurement, and I re-used some of his software components. There are two main differences between this project and Mike’s work, which are:
- This experiment introduces one-to-one relationship between the nodes and Flume load generator. Tht means, each Flume load generator process exists in an independent node (which is Amazon EC2 medium instance).
- This experiment introduces cascading setup, which will verify whether there is improvement in scalability or not compared to non-cascading setup