YARN commands: https://hadoop.apache.org/docs/r2.7.3/hadoop-yarn/hadoop-yarn-site/YarnCommands.html Hadoop commands: https://hadoop.apache.org/docs/r2.7.3/hadoop-project-dist/hadoop-common/CommandsManual.html HDFS commands: https://hadoop.apache.org/docs/r2.7.3/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html Mapreduce commands: https://hadoop.apache.org/docs/r2.7.3/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapredCommands.html Examples for setting up MRunit and running it for wordcount applications: https://cwiki.apache.org/confluence/display/MRUNIT/MRUnit+Tutorial https://cwiki.apache.org/confluence/display/MRUNIT/Testing+Word+Count Other important things to know: * The twitter data set is located at /user/hadoop27/twitter/twitter * Each enrolled student will have a directory in HDFS located at /user/ * General Hadoop and Yarn logs can be found at /scratch/hadoop27/logs * Application specific logs for a particular Mapreduce job can be accessed using "yarn logs -applicationId * Do not kill other people's jobs. Contact either me or professor if there are jobs running for a very long time Following are the commands con the turing cluster (You can replace /home/hadoop27/hadoop-2.7.3/bin/ with just hadoop since I have added the bin directory to PATH variable) Web interface for namenode: http://turing.cds.iisc.ac.in:50070/ Web interface for YARN http://turing.cds.iisc.ac.in:8088/cluster Check block locations on HDFS: /home/hadoop27/hadoop-2.7.3/bin/hadoop fsck /user/hadoop27/input/capacity-scheduler.xml -files -locations -blocks List files in HDFS: /home/hadoop27/hadoop-2.7.3/bin/hadoop fs -ls /user/hadoop27/twitter/twitter/tweets-9_1476698103878.txt Show HDFS file content: /home/hadoop27/hadoop-2.7.3/bin/hadoop fs -cat /user/hadoop27/twitter/twitter/tweets-9_1476698103878.txt Get HDFS file to local filesystem: /home/hadoop27/hadoop-2.7.3/bin/hadoop fs -get /user/hadoop27/twitter/twitter/tweets-9_1476698103878.txt ./ Copy file from one location to another (in HDFS): /home/hadoop27/hadoop-2.7.3/bin/hadoop fs -cp /user/hadoop27/twitter/twitter/tweets-9_1476698103878.txt /user/jayanth/tweetfile Put local file to HDFS: /home/hadoop27/hadoop-2.7.3/bin/hadoop fs -put input /user/jayanth/partitionerinput Similar commands exist for mkdir,du,tail,find,mv,rm,rmdir List running jobs: /home/hadoop27/hadoop-2.7.3/bin/mapred job -list Kill a job using its job ID: /home/hadoop27/hadoop-2.7.3/bin/mapred job -kill job_1484773908932_0008 View application logs: /home/hadoop27/hadoop-2.7.3/bin/yarn logs -applicationId application_1484773908932_0009 View application logs for a specific container: /home/hadoop27/hadoop-2.7.3/bin/yarn logs -applicationId application_1484773908932_0009 -containerId container_1484773908932_0009_01_000008 -nodeAddress node13.local