Regu2D: Accelerating Vectorization of SpMV on Intel Processors through 2D-partitioning and Regular Arrangement
ough 2D-partitioning and Regular Arrangement
DESCRIPTION:Conference Paper\n\nRegu2D: Accelerating Vectorization of SpMV
on Intel Processors through 2D-partitioning and Regular Arrangement\n\nFe
i, Zhang\n\nSparse matrix-vector multiplication (SpMV) is an elementary ke
rnel of many high-performance computing (HPC) applications, and it is ofte
n one of the performance bottlenecks of them. Accelerating SpMV on vector
processors usually faces several issues including irregular data accesses,
memory bandwidth limitation, and the short vector problem. Based on a det
ailed analysis of the effects and interactions of various technologies int
roduced by state-ofthe-art studies (ALBUS, CVR, CSR5, SELL-C-σ etc.),
we propose Regu2D, a comprehensive solution to accelerate vectorization o
f SpMV through three methods: adaptive 2D-partitioning, the regular arrang
ement of matrix elements, and indices compression. Dynamic programming alg
orithms are used to optimize the first two methods. We conduct experiments
on Intel Xeon processors (Skylake architecture) which support AVX-512 SIM
D instructions and use sparse matrices from the University of Florida Spar
se Matrix Collection. Experiments show that Regu2D achieves an average spe
edup of 1.69X, 1.93X, 1.40X, and 1.20X over ALBUS, CVR, CSR5, and SELL-C-&
#963; for 30 scale-free sparse matrices, respectively. For 16 HPC sparse m
atrices, Regu2D achieves an average speedup of 1.34X, 1.89X, 1.34X, and 1.
50X over them, respectively.
