Optimizing sparse Matrix-Vector Multiplication on GPU

Shah, Monika; Patel, Vibha; Chaudhari, M.B.

Please use this identifier to cite or link to this item: http://10.1.7.192:80/jspui/handle/123456789/5584

Full metadata record

DC Field	Value	Language
dc.contributor.author	Shah, Monika	-
dc.contributor.author	Patel, Vibha	-
dc.contributor.author	Chaudhari, M.B.	-
dc.date.accessioned	2015-07-14T07:22:27Z	-
dc.date.available	2015-07-14T07:22:27Z	-
dc.date.issued	2011-11-11	-
dc.identifier.citation	National level Conference on Information Processing & Computing. Proceedings of National conference in Association with Techno Forum Group, Department of Information Technology, PSG Polytechnique , Coimbatore, November 17 - 18, 2011	en_US
dc.identifier.isbn	78-81-920575-9-0,	-
dc.identifier.uri	http://hdl.handle.net/123456789/5584	-
dc.description.abstract	In era of today’s revolutionary world, we are witnessing increasing demand of parallel devices like modern microprocessors for High-performance scientific computing. GPUs have attracted attention at leading edge of this trend. The computational power of GPUs has increased at a much higher rate than CPUs. Large size Sparse Matrix-Vector Multiplication (SpMV) is one of most important operation in scientific and engineering computing. Implementation of SpMV has become complex because of indirection used in its storage presentation to exploit matrix sparsity. Thus, Designing parallel algorithm for SpMV presents interesting challenge for academia to achieve higher performance. Hence, sparse matrix-vector multiplication is an important and challenging GPU kernel. Many storage formats and efficient implementation for Sparse Matrix –Vector multiplication have been proposed in past. Sparse matrix problem has main two challenges: (i) optimized storage format (ii) efficient method to access nonzero elements form low latency and low bandwidth memory. Implementing SpMV on GPU has additional challenges: (i) best resource (thread) utilization to achieve high parallelization (ii) Avoidance of thread divergence, and (iii) coalesced memory access. In this paper, we have analyzed various proposed storage formats for optimized storage of sparse matrix like Coordinate format (COO), Compressed Row Storage (CRS), Block Compressed Row Storage (BCRS), ELLPACK, Sparse Compressed Row Storage (SBCRS). In this paper, we have proposed an alternative optimized storage format BCRS-combo and its implementation for SpMV to achieve high-performance for SpMV kernel on NVIDIA GPU considering parameters like excessive padding, storage compaction, Memory bandwidth, reloading of input and output vector, time to reorder elements, and execution time. We show that BCRS-combo not only improves performance over these methods, but it also improves storage space management.	en_US
dc.relation.ispartofseries	ITFCE012-4;	-
dc.subject	GPU	en_US
dc.subject	Sparse Matrix Storage Format	en_US
dc.subject	Parallelization	en_US
dc.subject	Matrix-Vector Multiplication	en_US
dc.subject	Computer Faculty Paper	en_US
dc.subject	Faculty Paper	en_US
dc.subject	ITFCE012	en_US
dc.title	Optimizing sparse Matrix-Vector Multiplication on GPU	en_US
dc.type	Faculty Papers	en_US
Appears in Collections:	Faculty Papers, CE

Files in This Item:

File	Description	Size	Format
ITFCE012-4.pdf	ITFCE012-4	320.44 kB	Adobe PDF	View/Open

Show simple item record

IR @ Nirma University