Identity Management as a Service

I read an article titled “Architecting Cloud Scale Identity Fabric” about concept of Identity Management of a Service as part of EEDC assignment. This article is written by Eric Olden, from Symplified. The main thing about this article is about the need of a service to manage user identity in the cloud. Well, I think this diagram from Symplified website is worth more than thousand words:

Symplified's Access Management providing

Original source of the image:

As our business grows, we may require multiple solutions from SAAS providers, private clouds or public clouds. Continue reading Identity Management as a Service

Consistency Tradeoff in Modern Distributed DB

Last week I had presentation about the relevancy of CAP theorem in modern distributed system design. This presentation is based on an article titled “Consistency Tradeoffs in Modern Distributed Database System Design” by Daniel J. Abadi from Yale University.

CAP theorem is widely used in Distributed Database System(DDBS) design. In a nutshell, it says that in designing modern DDBS, we only can choose two properties out of three properties that are crucial for DDBS. The aforementioned properties are Consistency (C), Availability (A) and Partition Tolerance (P).  And this diagram below summarize the available combination of CAP:

CAP Diagram
CAP Diagram

Now, the question here are, is there something wrong with CAP theorem? Is it still relevant with modern DDBS design? Continue reading Consistency Tradeoff in Modern Distributed DB

Large-Scale Decentralized Storage Systems for Volunter Computing Systems – Pilot

Well, I would like to write something about our Decentralized System (DS) project and what we are going to do in this project. Since it is the first post about the big picture of our DS project in this blog, I named it “Pilot” :p. Our group is G007, which consists of Julia, Diego, Enkhjin and myself.

After two weeks of paper reading and some brainstorming sessions, we are proposing these following directions for our project, Large-Scale Decentralized Storage Systems for Volunter Computing Systems. Continue reading Large-Scale Decentralized Storage Systems for Volunter Computing Systems – Pilot

Wuala – P2P Distributed Storage

When I was looking for example of P2P distributed storage system, I came across video from Google Tech Talk about Wuala. Wuala is an example of distributed peer-to-peer storage system.  It used to be startup company, but LaCiE bought it in 2009. It allows you to store your data in the cloud, set up online back up and access your files, share your files with your friends easily, and access them from other computer.

Continue reading Wuala – P2P Distributed Storage

Apache Flume

This time, our group needed to prepare presentation about Apache Flume for EEDC homework. Flume is intended to solve challenges in safely transferring huge set of data from a node (example: log files in company web servers) to data store (example: HDFS, Hives, HBase, Cassandra etc etc).

Apache Flume
Apache Flume

Well, for a simple system with relatively small data set, we usually customize our own solution to do this job, such as to create some script to transfer the log to database. However, this kind of ad-hoc solution is difficult to make it scalable because usually it is created very tailored into our system. It sometimes suffers from problem in manageability, especially when the original programmer or engineer who created the system left the company. It is also often difficult to extend and, furthermore it may have problem in reliability due to some bugs during the implementation.

And Apache Flume comes into the rescue!!! Continue reading Apache Flume

How available are they?

Well, it’s not really about relationship status :p.. since “they” are referring to distributed systems 😀 #geek

But it is about my latest review for a paper titled titled “Exploiting Availability Prediction in Distributed Systems“, by James W. Mickens and Brian D Noble. As we all know, availability is one of the important properties of distributed systems. Availability is concerned with the capability of a distributed system to serving its client properly although there are some component failures inside the system.They argued that availability modeling is crucial to (generally) make the system better, in term of resource efficiency and understanding system-wide phenomenon. Therefore, they propose a new way to predict availability, and they applied the predictor to three case studies. They found that their predictor works well under test data and they successfully shows that good predictor can improve the systems. Continue reading How available are they?

Why file sharing network is dangerous?

Recently I have just finished paper review assignment about file sharing network. The paper title is Why File Sharing Network is Dangerous? by M. Eric Johnson, Dan Mcguire and Nicholas D. Wiley. And I’ll present you, my summary:


The paper explains about security aspect of P2P file sharing network in term of confidential data exposure. P2P file sharing network is widely used now, and according to their research, the user base is doubled from 2003 to 2007. (yes it is paper from 2007!.. so not really the latest and greatest paper:)). Its analysis is based on 1st generation P2P file sharing network that are utilized by P2P file application such as KaZaA, Frostwire and eMule. Those application has common characteristics which is user needs to explicitly share the file and folder for downloading and uploading purpose. Continue reading Why file sharing network is dangerous?