STEPS:
1. run make in the directory
2. edit the sample-script to change the paths whereever specified. 
3. qsub sample-script
4. a file by the name cudamatvectmulp.csv will be created in your HOME directory
5. /usr/local/cuda/computeprof/bin/computeprof
6. under file -> import -> (open the cudamatvectmulp.csv file)
7. there are buttons in the gui that give the gpu time barchart for the various methods (pls refer the pictures attached)

NOTE:
1. login with ssh -X on tesla2 (10.16.28.50)
2. add /usr/local/cuda/computeprof/bin to the LD_LIBRARY_PATH in your .cshrc file. then do source .cshrc
