BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Chicago
X-LIC-LOCATION:America/Chicago
BEGIN:DAYLIGHT
TZOFFSETFROM:-0600
TZOFFSETTO:-0500
TZNAME:CDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0500
TZOFFSETTO:-0600
TZNAME:CST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20210808T235336Z
LOCATION:Room A
DTSTART;TZID=America/Chicago:20210811T120000
DTEND;TZID=America/Chicago:20210811T121500
UID:icpp_ICPP 2021_sess124_pap258@linklings.com
SUMMARY:Tridiagonal GPU Solver with Scaled Partial Pivoting at Maximum Ban
dwidth
DESCRIPTION:Conference Paper\n\nTridiagonal GPU Solver with Scaled Partial
Pivoting at Maximum Bandwidth\n\nKlein, Strzodka\n\nPartial pivoting is t
he method of choice to ensure stability in matrix factorizations performed
on CPUs. For sparse matrices, this has not been implemented on GPUs so fa
r because of problems with data-dependent execution flow. This work incorp
orates scaled partial pivoting into a tridiagonal GPU solver in such a fas
hion that despite the data-dependent decisions no SIMD divergence occurs.
The cost of the computation is completely hidden behind the data movement
which itself runs at maximum bandwidth. Therefore, the cost of the tridiag
onal GPU solver is no more than the minimally required data movement. For
large single precision systems with $2^{25}$ unknowns, speedups of 5 are r
eported in comparison to the numerically stable tridiagonal solver (\textt
t{gtsv2}) of cuSPARSE. The proposed tridiagonal solver is also evaluated a
s a preconditioner for Krylov solvers of large sparse linear equation syst
ems. As expected it performs best for problems with strong anisotropies.
END:VEVENT
END:VCALENDAR