Sparse Matrix-Vector Multiplication on GPUs

Implement parallel sparse matrix-vector multiplication on GPUs. The sparse matrix should be in Compressed Sparse Row (CSR) format.

Compare the performance (in terms of execution time) of your implementation with a sequential CPU implementation for the maximum size that can fit in the GPU device memory. For the sparse matrix, use Harwell-Boeing matrices.

Prepare a report giving descriptions of your algorithm/strategies, results, and observations.