Within the past decades, the enterprise-level IT infrastructure in many businesses have grown from a few to thousands of servers, increasing the digital footprints they produce. These digital footprints include access logs that contain information about different events such as activity related to usage patterns, networks and any hostile activity affecting the network. Apache Hadoop has been one of the most standardized frameworks and is used by many Information Technology (IT) companies for analyzing these log files in distributed batch mode using MapReduce programming model. As these access logs include important information related to security and usage patterns, companies are now looking for an architecture that allows analyzing these l...