1) Submit a single encrypted file "username-se256-beta-src.tar.zip" to ~se256/submission-beta/ and send an email to Ravikant with the MD5 checksum password. The email timestamp will be counted as the submission time. 2) When the file is unpacked, the following must be the folder structure: username-se256-beta/pom.xml username-se256-beta/src/ username-se256-beta/bin/ username-se256-beta/build-index-pr.sh username-se256-beta/search-terms.sh <"search terms"> username-se256-beta/train-classifier.sh username-se256-beta/classify-pages.sh NOTE: The pom.xml file should be capable of building all the source files. The /src/ folder should have all necessary source files. Include the generated jars in the/bin/ folder, such that it is sufficient to run your MR programs directly using the next two scripts. The build-index-pr.sh script should take a wildcard list of input CC files, run all relevant MR apps required to generate the index and pagerank files, and store them in the provided folder in HDFS. Similarly, the train-classifier.sh should take the CC files to train upon and return a trained model in the given output folder. The search-terms.sh script should take the location of the index/PR folder generated above and the search terms, generate the search output in the folder provided. Each line of the web search output file should have the format "rank (1-100), PageRank (0.0-1.0), URL". Each line of the classifier output file should have the format "URL, country". The classify-pages script should take the location of the trained classifier folder generated above and the test CC files to classify, and generate the classification output in the folder provided. Configure the scripts such that they work with qsub. 3) The evaluation of the program will be automated. So make sure we can unzip and run the following that sample works on 10 CC files. If the program fails to automatically build/run with these commands, you will be penalized. cd username-se256-beta mvn clean compile package build-index-pr.sh '/SE256/CC/*-0003?-*.gz' 'beta-username/index_pr_out' search-terms.sh 'beta-username/index_pr_out' 'French crepes' 'beta-username/search_out' hadoop fs -cat beta-username/search_out/* train-classifier.sh '/SE256/CC/*-0006?-*.gz' 'beta-username/classifier_out' classify-pages.sh 'beta-username/classifier_out' '/SE256/CC/*-0002?-*.gz' 'beta-username/classify_out' hadoop fs -cat beta-username/classify_out/* 4) Equal weightage will be given to the program and the report. You will be evaluated on correctness of the program and its outputs, its speed/scalability, accuracy of the results, and analysis of the algorithm/results/scalability in the report.