Flume Event Scalability Analysis

This is the follow up of the project in this post. Our professor asked us to perform scalability analysis of the technology that we used, and write simple report on it. I decided to analyze Flume scalability in term of the number of events that can be supported by the Flume configuration. The project itself is inspired from Mike Percy’s Flume-NG performance measurement, and I re-used some of his software components. There are two main differences between this project and Mike’s work, which are:

  •  This experiment introduces one-to-one relationship between the nodes and Flume load generator. Tht means, each Flume load generator process exists in an independent node (which is Amazon EC2 medium instance).
  • This experiment introduces cascading setup, which will verify whether there is improvement in scalability or not compared to non-cascading setup

Setup 1: One-to-one relationship between nodes and Flume load generator

Figure below shows the diagram of this setup:

Setup 1
Setup1: One to one relationship between nodes and Flume
load generator

TCP Hammer sends TCP events with configurable size. In this case, we set TCP event size to 300 bytes. Flume node is configured with SyslogTcpSource which will listen to TCP event in configurable port and generate its own Flume event to be transmitted via Memory channel to the HDFS sink. The HDFS in this setup consists of three replicated nodes. We will check the relation between channel capacity and maximum rate of events that Flume node can support.

Setup 2: Cascading Setup

Figure below shows the diagram of Cascading Setup:

Setup 2
Setup 2: Cascading setup

We still used event size of 300 bytes. Two Flume scale nodes are aggregated into one single collector node before writing the data into HDFS cluster. Two separate load generators reside in different independent nodes. We will compare accumulated number of event-per-seconds of two Flume scale nodes with first setup with same settings of channel capacity.

Experiment Result

We inspected Flume log file in /var/log/flume-ng/flume.log to check whether the configuration is able to support given event rates. If Flume process can continue without Java exceptions that cause it to stop or without regularly generated exceptions, then we will treat it as able to handle the event rates applied to it.

Setup 1 Result

Setup 1 Result

Table 1 shows the result for Setup 1. In this experiment, we increased the channel capacity and we also observed that by doubling the channel capacity, the maximum events per seconds increase much less. The increment is only around 25% when we doubled from 100000 to 200000, and it’s only 10% when we increased the channel capacity from 200000 to 400000.

Setup 2 Result

In setup 2, we used channel capacity of 100000 in Scale1-1 and Scale1-2 nodes, and we used channel capacity of 200000 in Collector node.

Setup 2 Result

From Setup 1 experiment, we found that maximum events that can be supported is 200. In setup 2, we can easily see that the cumulative maximum event rates are 400. And it is very much expected because we increased the resource by adding new nodes with more processing resources. Note that when event arrives in node Scale1-1 or Scale1-2, the event is not directly forwarded into Collector node. Node Scale1-1 and Scale1-2 will hold the events until certain number of events are accumulated. This amount of accumulated events are configurable using flume.conf configuration file.

Well, actually there are some perl scripts using Pig in Mike Percy’s codes that can be used to process the data further. The scripts are utilized to analyze the data in HDFS and provide more insightful result such as total number of retrieved events and total number of retry in sending events. Unfortunately, due to limited time of experiment, we did not able to make scripts work. We have tried to hack the scripts and HDFS configuration, but we still receive error message when the scripts are executed against the HDFS data.

In conclusion, Flume is promising in term of scalability. By using cascading setup, we are able to improve the scalability of Flume. However, more analysis is needed especially using additional existing Pig script to process the data. :D.

Well that’s it for this project. Really wish to have more time to hack the Pig script and make the additional data analysis works! 🙂 For actual report, please refer to the link below:

5 thoughts on “Flume Event Scalability Analysis”

  1. Hey, I am interested with this project
    I want to do something similar, but when I build (using maven) flume-load-gen from mpercy github repository, I got some errors.
    Did you get it too when doing this?
    Thanks!

    1. Hi philemon,
      I’ve just cloned the project from Github and I’m able to build it perfectly using maven (mvn package). What’s the error that you encounter?

  2. I got error like this after : mvn clean package

    [ERROR] /home/phil/Documents/flume-load-gen-master/src/main/java/org/apache/flume/tcphammer/Hammer.java:[67,12] error: no suitable method found for info(String,String,int)
    [ERROR]
    [ERROR] method Logger.info(Marker,String,Throwable) is not applicable
    [ERROR] (actual argument String cannot be converted to Marker by method invocation conversion)
    [ERROR] method Logger.info(Marker,String,Object[]) is not applicable
    [ERROR] (actual argument String cannot be converted to Marker by method invocation conversion)
    [ERROR] method Logger.info(Marker,String,Object,Object) is not applicable
    [ERROR] (actual and formal argument lists differ in length)
    [ERROR] method Logger.info(Marker,String,Object) is not applicable
    [ERROR] (actual argument String cannot be converted to Marker by method invocation conversion)
    [ERROR] method Logger.info(Marker,String) is not applicable
    [ERROR] (actual and formal argument lists differ in length)
    [ERROR] method Logger.info(String,Throwable) is not applicable
    [ERROR] (actual and formal argument lists differ in length)
    [ERROR] method Logger.info(String,Object[]) is not applicable
    [ERROR] (actual and formal argument lists differ in length)
    [ERROR] method Logger.info(String,Object,Object) is not applicable
    [ERROR] (actual argument int cannot be converted to Object by method invocation conversion)
    [ERROR] method Logger.info(String,Object) is not applicable
    [ERROR] (actual and formal argument lists differ in length)
    [ERROR] method Logger.info(String) is not applicable
    [ERROR] (actual and formal argument lists differ in length)
    [ERROR] /home/phil/Documents/flume-load-gen-master/src/main/java/org/apache/flume/tcphammer/Hammer.java:[137,25] error: incompatible types
    [ERROR]
    [ERROR] could not parse error message: required: Object
    [ERROR] found: long
    [ERROR] /home/phil/Documents/flume-load-gen-master/src/main/java/org/apache/flume/tcphammer/Hammer.java:138: error: incompatible types
    [ERROR] curLogIntervalNanosSlept / 1000000,
    [ERROR] ^
    [ERROR]
    [ERROR] could not parse error message: required: Object
    [ERROR] found: long
    [ERROR] /home/phil/Documents/flume-load-gen-master/src/main/java/org/apache/flume/tcphammer/Hammer.java:139: error: cannot find symbol
    [ERROR] String.format(“%.2f”, (100 * (double) curLogIntervalNanosSlept / (double) logDelta)),
    [ERROR] ^
    [ERROR]
    [ERROR] could not parse error message: symbol: method format(String,double)
    [ERROR] location: class String
    [ERROR] /home/phil/Documents/flume-load-gen-master/src/main/java/org/apache/flume/tcphammer/Hammer.java:140: error: incompatible types
    [ERROR] curLogIntervalEventsSent,
    [ERROR] ^
    [ERROR]
    [ERROR] could not parse error message: required: Object
    [ERROR] found: long
    [ERROR] /home/phil/Documents/flume-load-gen-master/src/main/java/org/apache/flume/tcphammer/Hammer.java:141: error: cannot find symbol
    [ERROR] String.format(“%.2f”, (double) curLogIntervalEventsSent / ((double) logDelta / 1000000000D)),
    [ERROR] ^
    [ERROR]
    [ERROR] could not parse error message: symbol: method format(String,double)
    [ERROR] location: class String
    [ERROR] /home/phil/Documents/flume-load-gen-master/src/main/java/org/apache/flume/tcphammer/Hammer.java:142: error: incompatible types
    [ERROR] eventCount,
    [ERROR] ^
    [ERROR] -> [Help 1]

    I am using maven 3.04 with java 1.7.0_45
    I don’t know why I always get this error when trying to build the source.

  3. I’m using maven 3.0.5 with Java 1.7.0_51. Could you try to clean your local maven repository, i. e. delete $HOME/.m2 folder, and run again the maven?

Leave a Reply