BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Chicago
X-LIC-LOCATION:America/Chicago
BEGIN:DAYLIGHT
TZOFFSETFROM:-0600
TZOFFSETTO:-0500
TZNAME:CDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0500
TZOFFSETTO:-0600
TZNAME:CST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20210808T235336Z
LOCATION:Room C
DTSTART;TZID=America/Chicago:20210811T111500
DTEND;TZID=America/Chicago:20210811T113000
UID:icpp_ICPP 2021_sess125_pap188@linklings.com
SUMMARY:Parallel Tucker Decomposition with Numerically Accurate SVD
DESCRIPTION:Conference Paper\n\nParallel Tucker Decomposition with Numeric
ally Accurate SVD\n\nLi, Fang, Ballard\n\nTucker decomposition is a low-ra
nk tensor approximation that generalizes a truncated matrix singular value
decomposition (SVD). Existing parallel software has shown that Tucker dec
omposition is particularly effective at compressing terabyte-sized multidi
mensional scientific simulation datasets, computing reduced representation
s that satisfy a specified approximation error. The general approach is to
get a low-rank approximation of the input data by performing a sequence o
f matrix SVDs of tensor unfoldings, which tend to be short-fat matrices. I
n the existing approach, the SVD is performed by computing the eigendecomp
osition of the Gram matrix of the unfolding. This method sacrifices some n
umerical stability in exchange for lower computation costs and easier para
llelization. We propose using a numerically more stable though more comput
ationally expensive way to compute the SVD by preprocessing with a QR deco
mposition step and computing an SVD of only the small triangular factor. T
he more numerically stable approach allows us to achieve nearly the same a
ccuracy with half the working precision (for example, single rather than d
ouble precision). We demonstrate that our method scales as well as the exi
sting approach, and the use of lower precision leads to an overall reducti
on in running time when using 10s to 1000s of processors. Using the same w
orking precision, we are also able to compute Tucker decompositions of the
scientific datasets with much smaller approximation error.
END:VEVENT
END:VCALENDAR