BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Chicago
X-LIC-LOCATION:America/Chicago
BEGIN:DAYLIGHT
TZOFFSETFROM:-0600
TZOFFSETTO:-0500
TZNAME:CDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0500
TZOFFSETTO:-0600
TZNAME:CST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20210808T235336Z
LOCATION:Room A
DTSTART;TZID=America/Chicago:20210812T104500
DTEND;TZID=America/Chicago:20210812T110000
UID:icpp_ICPP 2021_sess116_pap229@linklings.com
SUMMARY:Regu2D: Accelerating Vectorization of SpMV on Intel Processors thr
ough 2D-partitioning and Regular Arrangement
DESCRIPTION:Conference Paper\n\nRegu2D: Accelerating Vectorization of SpMV
on Intel Processors through 2D-partitioning and Regular Arrangement\n\nFe
i, Zhang\n\nSparse matrix-vector multiplication (SpMV) is an elementary ke
rnel of many high-performance computing (HPC) applications, and it is ofte
n one of the performance bottlenecks of them. Accelerating SpMV on vector
processors usually faces several issues including irregular data accesses,
memory bandwidth limitation, and the short vector problem. Based on a det
ailed analysis of the effects and interactions of various technologies int
roduced by state-ofthe-art studies (ALBUS, CVR, CSR5, SELL-C-σ etc.),
we propose Regu2D, a comprehensive solution to accelerate vectorization o
f SpMV through three methods: adaptive 2D-partitioning, the regular arrang
ement of matrix elements, and indices compression. Dynamic programming alg
orithms are used to optimize the first two methods. We conduct experiments
on Intel Xeon processors (Skylake architecture) which support AVX-512 SIM
D instructions and use sparse matrices from the University of Florida Spar
se Matrix Collection. Experiments show that Regu2D achieves an average spe
edup of 1.69X, 1.93X, 1.40X, and 1.20X over ALBUS, CVR, CSR5, and SELL-C-&
#963; for 30 scale-free sparse matrices, respectively. For 16 HPC sparse m
atrices, Regu2D achieves an average speedup of 1.34X, 1.89X, 1.34X, and 1.
50X over them, respectively.
END:VEVENT
END:VCALENDAR