BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Chicago
X-LIC-LOCATION:America/Chicago
BEGIN:DAYLIGHT
TZOFFSETFROM:-0600
TZOFFSETTO:-0500
TZNAME:CDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0500
TZOFFSETTO:-0600
TZNAME:CST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20210808T235332Z
LOCATION:Room A
DTSTART;TZID=America/Chicago:20210811T114500
DTEND;TZID=America/Chicago:20210811T124500
UID:icpp_ICPP 2021_sess124@linklings.com
SUMMARY:5A: Linear Algebra Algorithms
DESCRIPTION:Conference Paper\n\nEfficiently Parallelizable Strassen-Based
Multiplication of a Matrix by its Transpose\n\nArrigoni, Maggioli, Massini
, RodolĂ \n\nThe multiplication of a matrix by its transpose, A^TA, appears
as an intermediate operation in the solution of a wide set of problems.\n
In this paper, we propose a new cache-oblivious algorithm (ATA) for comput
ing this product, based upon the classical Strassen algorithm as a sub-rou
tine. In particul...\n\n---------------------\nTridiagonal GPU Solver with
Scaled Partial Pivoting at Maximum Bandwidth\n\nKlein, Strzodka\n\nPartia
l pivoting is the method of choice to ensure stability in matrix factoriza
tions performed on CPUs. For sparse matrices, this has not been implemente
d on GPUs so far because of problems with data-dependent execution flow. T
his work incorporates scaled partial pivoting into a tridiagonal GPU sol..
.\n\n---------------------\nProcessor-Aware Cache-Oblivious Algorithms\n\n
Tang, Gao\n\nFrigo et al. proposed an ideal cache model and a recursive\nt
echnique to design sequential cache-efficient algorithms in a\ncache-obliv
ious fashion.\n%\nBallard et al. pointed out that it is a fundamental open
\nproblem to extend the technique to an arbitrary architecture.\n%\nBallar
d et al. raised another ...\n\n---------------------\nFast and Scalable Sp
arse Triangular Solver for Multi-GPU Based HPC Architectures\n\nXIE, Chen,
Firoz, Li, Song...\n\nDesigning efficient and scalable sparse linear alge
bra kernels on modern multi-GPU based HPC systems is a daunting task due t
o significant irregular memory references and workload imbalance across th
e GPUs. This is particularly the case for \textit{Sparse Triangular Solver
(SpTRSV)} which introduces...\n
END:VEVENT
END:VCALENDAR