To submit giraph jobs please do the following: Compile your java code using: bin/hadoop com.sun.tools.javac.Main SimpleShortestPathsComputation.java GIRAPH: Create a jar as follows: jar cf sspc.jar SimpleShortestPathsComputation*.class Now add this jar to your hadoop classpath export HADOOP_CLASSPATH="/home//sspc.jar" and then run the following command: hadoop jar giraph.jar org.apache.giraph.GiraphRunner [ -D option ]* [ GiraphRunner option e.g. -vip -vif etc. ]* jar jarname (giraph.jar) - where jarname is the path to the jar that has your compute code org.apache.giraph.GiraphRunner - Helper class to run Giraph applications by specifying the actual class name to use (i.e. vertex, vertex input/output format, combiner, etc.) (provided in the giraph-core.jar, you don't need to write this class) -D options - command line params your.package.ComputationClass - The fully qualified name of your compute class E.G org.apache.giraph.examples.SimpleShortestPathsComputation GiraphRunnerOptions - • -vif : Vertex input format (supported formats: https://giraph.apache.org/apidocs/org/apache/giraph/io/formats/package-summary.html) • -vip : Path in HDFS where graph is stored in the format specified by vif • -vof : Vertex output format (most commonly used is org.apache.giraph.io.formats.IdWithValueTextOutputFormat which is vertex followed by value E.G: in case of SSSP it would be vertex ID followed by its distance from source) • -op : Output directory where output is stored in vof format • -w : Number of giraph workers • -ca giraph.userPartitionCount : Number of partitions to split the graph into (By default this number is the square of the number of workers) (Note: Default partitioner is HashPartitioner, you can find classes related to partitioning at https://giraph.apache.org/apidocs/org/apache/giraph/partition/package-summary.html) • -ca giraph.logLevel : Specify log level (INFO, WARN,ERROR, DEBUG etc) • -ca giraph.checkpointFrequency : Checkpoint after specified number of supersteps • -ca giraph.zkList - always set this to turing.cds.iisc.ac.in:2181 • -yh : Heap memory for a single giraph worker • -yj : Provide the location of the jar file you created so that it gets distributed onto the yarn containers • All other options available can be viewed at http://giraph.apache.org/options.html For E.G to run the SSSP code provided in giraph examples jar (The examples jar and giraph core jar are at /home/hadoop27/hadoop-2.7.3/share/hadoop/yarn/lib/giraph-examples-1.2.0-for-hadoop-2.7.3-jar-with-dependencies.jar and /home/hadoop27/hadoop-2.7.3/share/hadoop/yarn/lib/giraph-1.2.0-for-hadoop-2.7.3-jar-with-dependencies.jar): hadoop jar giraph-examples-1.2.0-for-hadoop-2.7.3-jar-with-dependencies.jar org.apache.giraph.GiraphRunner -Dgiraph.metrics.enable=true org.apache.giraph.examples.SimpleShortestPathsComputation -vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat -vip /user/jayanth/tiny_graph.txt -vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op /user/jayanth/output -w 8 -ca giraph.zkList=orion-00:2181 -ca giraph.userPartitionCount=16 -ca SimpleShortestPathsVertex.sourceId=1 -ca giraph.logLevel=debug,giraph.checkpointFrequency=0 -yj giraph-examples-1.2.0-for-hadoop-2.7.3-jar-with-dependencies.jar GOFFISH: The instructions to download and install goffish are https://github.com/dream-lab/goffish_v3/tree/master/hama Similar to giraph once you have the jar file to be executed, you can run it on the cluster using the following command: export HAMA_CLASSPATH=/home/jayanth/goffish-sample-3.1.jar hama in.dream_lab.goffish.job.DefaultJob /home/jayanth/VertexCount.properties /user/jayanth/fb4/Job2/ /user/jayanth/output The required goffish hama jars have already been added to the turing cluster at /home/hadoop27/hama/lib/ For any questions regarding GOFFISH please subscribe to goffish-user@googlegroups.com