Please use this identifier to cite or link to this item:
http://10.1.7.192:80/jspui/handle/123456789/5584
Full metadata record
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Shah, Monika | - |
dc.contributor.author | Patel, Vibha | - |
dc.contributor.author | Chaudhari, M.B. | - |
dc.date.accessioned | 2015-07-14T07:22:27Z | - |
dc.date.available | 2015-07-14T07:22:27Z | - |
dc.date.issued | 2011-11-11 | - |
dc.identifier.citation | National level Conference on Information Processing & Computing. Proceedings of National conference in Association with Techno Forum Group, Department of Information Technology, PSG Polytechnique , Coimbatore, November 17 - 18, 2011 | en_US |
dc.identifier.isbn | 78-81-920575-9-0, | - |
dc.identifier.uri | http://hdl.handle.net/123456789/5584 | - |
dc.description.abstract | In era of today’s revolutionary world, we are witnessing increasing demand of parallel devices like modern microprocessors for High-performance scientific computing. GPUs have attracted attention at leading edge of this trend. The computational power of GPUs has increased at a much higher rate than CPUs. Large size Sparse Matrix-Vector Multiplication (SpMV) is one of most important operation in scientific and engineering computing. Implementation of SpMV has become complex because of indirection used in its storage presentation to exploit matrix sparsity. Thus, Designing parallel algorithm for SpMV presents interesting challenge for academia to achieve higher performance. Hence, sparse matrix-vector multiplication is an important and challenging GPU kernel. Many storage formats and efficient implementation for Sparse Matrix –Vector multiplication have been proposed in past. Sparse matrix problem has main two challenges: (i) optimized storage format (ii) efficient method to access nonzero elements form low latency and low bandwidth memory. Implementing SpMV on GPU has additional challenges: (i) best resource (thread) utilization to achieve high parallelization (ii) Avoidance of thread divergence, and (iii) coalesced memory access. In this paper, we have analyzed various proposed storage formats for optimized storage of sparse matrix like Coordinate format (COO), Compressed Row Storage (CRS), Block Compressed Row Storage (BCRS), ELLPACK, Sparse Compressed Row Storage (SBCRS). In this paper, we have proposed an alternative optimized storage format BCRS-combo and its implementation for SpMV to achieve high-performance for SpMV kernel on NVIDIA GPU considering parameters like excessive padding, storage compaction, Memory bandwidth, reloading of input and output vector, time to reorder elements, and execution time. We show that BCRS-combo not only improves performance over these methods, but it also improves storage space management. | en_US |
dc.relation.ispartofseries | ITFCE012-4; | - |
dc.subject | GPU | en_US |
dc.subject | Sparse Matrix Storage Format | en_US |
dc.subject | Parallelization | en_US |
dc.subject | Matrix-Vector Multiplication | en_US |
dc.subject | Computer Faculty Paper | en_US |
dc.subject | Faculty Paper | en_US |
dc.subject | ITFCE012 | en_US |
dc.title | Optimizing sparse Matrix-Vector Multiplication on GPU | en_US |
dc.type | Faculty Papers | en_US |
Appears in Collections: | Faculty Papers, CE |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
ITFCE012-4.pdf | ITFCE012-4 | 320.44 kB | Adobe PDF | ![]() View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.