CUDA Optimizations
Illustrate the performance benefits of the following optimizations on GPU with a matrix-vector multiplication program.
block sizes as a multiple of warp size (i.e. all warps fully populated) and
coalesced memory access
Show the CUDA profiling output.