Conference Paper

Efficiently Parallelizable Strassen-Based
Multiplication of a Matrix by its Transpose\n\nArrigoni, Maggioli, Massini
, RodolĂ \n\nThe multiplication of a matrix by its transpose, A^TA, appears
as an intermediate operation in the solution of a wide set of problems.\n
In this paper, we propose a new cache-oblivious algorithm (ATA) for comput
ing this product, based upon the classical Strassen algorithm as a sub-rou
tine. In particul...\n\n---------------------\nTridiagonal GPU Solver with
Scaled Partial Pivoting at Maximum Bandwidth\n\nKlein, Strzodka\n\nPartia
l pivoting is the method of choice to ensure stability in matrix factoriza
tions performed on CPUs. For sparse matrices, this has not been implemente
d on GPUs so far because of problems with data-dependent execution flow. T
his work incorporates scaled partial pivoting into a tridiagonal GPU sol..
.\n\n---------------------\nProcessor-Aware Cache-Oblivious Algorithms\n\n
Tang, Gao\n\nFrigo et al. proposed an ideal cache model and a recursive\nt
echnique to design sequential cache-efficient algorithms in a\ncache-obliv
ious fashion.\n%\nBallard et al. pointed out that it is a fundamental open
\nproblem to extend the technique to an arbitrary architecture.\n%\nBallar
d et al. raised another ...\n\n---------------------\nFast and Scalable Sp
arse Triangular Solver for Multi-GPU Based HPC Architectures\n\nXIE, Chen,
Firoz, Li, Song...\n\nDesigning efficient and scalable sparse linear alge
bra kernels on modern multi-GPU based HPC systems is a daunting task due t
o significant irregular memory references and workload imbalance across th
e GPUs. This is particularly the case for \textit{Sparse Triangular Solver
(SpTRSV)} which introduces...\n
