Flume Event Scalability Analysis

This is the follow up of the project in this post. Our professor asked us to perform scalability analysis of the technology that we used, and write simple report on it. I decided to analyze Flume scalability in term of the number of events that can be supported by the Flume configuration. The project itself is inspired from Mike Percy’s Flume-NG performance measurement, and I re-used some of his software components. There are two main differences between this project and Mike’s work, which are:

  •  This experiment introduces one-to-one relationship between the nodes and Flume load generator. Tht means, each Flume load generator process exists in an independent node (which is Amazon EC2 medium instance).
  • This experiment introduces cascading setup, which will verify whether there is improvement in scalability or not compared to non-cascading setup

Flume-based Independent News Aggregator

It has been more than two weeks since my last post! 🙁 I was busy with exam, project reports and my trip preparation. Finally, I manage to find time (on my journey from Warsaw to Prague) to update my blog :p

Well, I would like to cover our SDS project titled “Flume-based Independent News Aggregator” but my project-mate, Mario, has covered as well in his blog. So in this case, I’ll just give you the link to Mario’s post, which is here.

As a follow up of the project, our professor asked us to experiment with the system with regards to its scalability. It is individual project, and I plan to experiment with Flume scalability in term of number of event it can support. Mario will do something related to its realiability and fault tolerance. I plan to update this blog once I finished the project 🙂

*Update: Post about the mini project of Flume Scalability can be found here.

Weekend with Flume – Part 2

After covering some of basic configurations of Flume in Part 1 of Weekend with Flume series, I’ll cover Avro Source, Avro Sink and HDFS sink in this post. Let’s pick a scenario from our school project below. We setup this configuration in Amazon Web Service (AWS), but I will not discuss about our experiences with AWS in this post.

Flume-based News Agregator
Flume-based News Aggregator

Avro Sink Configuration

We have Agent#1 connecting to Collector through a pair of Avro Sink and Avro Source. To achieve this configuration, we have this following flume configuration file in Agent#1. Continue reading Weekend with Flume – Part 2