This time, our group needed to prepare presentation about Apache Flume for EEDC homework. Flume is intended to solve challenges in safely transferring huge set of data from a node (example: log files in company web servers) to data store (example: HDFS, Hives, HBase, Cassandra etc etc).
Well, for a simple system with relatively small data set, we usually customize our own solution to do this job, such as to create some script to transfer the log to database. However, this kind of ad-hoc solution is difficult to make it scalable because usually it is created very tailored into our system. It sometimes suffers from problem in manageability, especially when the original programmer or engineer who created the system left the company. It is also often difficult to extend and, furthermore it may have problem in reliability due to some bugs during the implementation.
And Apache Flume comes into the rescue!!! Continue reading Apache Flume