AbstractThe applications running on Hadoop clusters are increasing day by day. This is due to the fact that organizations have found a simple and efficient model that works well in distributed environment. The model is built to work efficiently on thousands of machines and massive data sets using commodity hardware. HDFS and MapReduce is a scalable and fault-tolerant model that hides all the complexities for Big Data analytics. Since Hadoop is becoming increasingly popular, understanding technical details becomes essential. This fact inspired us to explore Hadoop and its components in-depth. The process of analysing, examining and processing huge amount of unstructured data to extract required information has been a challenge. In this paper...