An application wikipedia xml corpus deployed on the Hadoop platform and analysis of the output in HDFS
Deployed Hadoop(An open-source framework that allows to store and process big data in a distributed environment across clusters of computers using simple programming models) on a cluster of three computers. One as master and two acting as a slave. And tested the Inverted Index in HDFS using MapReduce paradigm for Wikipedia xml corpus.