BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Los_Angeles
X-LIC-LOCATION:America/Los_Angeles
BEGIN:DAYLIGHT
TZOFFSETFROM:-0800
TZOFFSETTO:-0700
TZNAME:PDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0700
TZOFFSETTO:-0800
TZNAME:PST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20180810T143148Z
LOCATION:Erb Memorial Union (EMU) Ballroom, 2nd Floor
DTSTART;TZID=America/Los_Angeles:20180814T201800
DTEND;TZID=America/Los_Angeles:20180814T202200
UID:icpp_ICPP 2018_sess141_pos136@linklings.com
SUMMARY:Fast and generic concurrent message-passing
DESCRIPTION:Fast and generic concurrent message-passing\n\nDang, Snir\n\n\
nCommunication hardware and software have a significant impact on the perf
ormance
of clusters and supercomputers. The Message-Passing Interface
(MPI) has become a
de-facto standard API for communication in HPC. H
owever, it recently faces a new
challenge due to the emergence of man
y-core nodes and of programming models that
provide dynamic task para
llelism and assume large numbers of concurrent,
light-weight threads.
Using MPI atop of these languages/runtimes is inefficient
because MP
I implementation is not able to perform well with threads. Using MPI
as a communication middleware is also not efficient since MPI has to provi
de
many abstractions that are not needed for many of the frameworks,
thus having
the extra overhead.
Our research focuses on im
proving the communication of applications and
frameworks in three fro
nts. First, although MPI performance is lagging behind,
we show that
it can be improved using more advanced techniques of thread
synchroni
zation and appropriate semantic relaxations. Our proposal and
techniq
ues are being incorporated into the MPICH implementation. Second, we
develop a generic and low-level communication interface (LCI) which target
s
emerging applications such as asynchronous task-model where the cur
rent MPI
semantics are less ideal. LCI design allows for generic and
low-overhead
producer and consumer matching, better thread integratio
n, and maps more direct
to the network interface. In previous works L
CI has improved the
state-of-the-art performance of graph analytics f
rameworks. Lastly, we develop
FULT, a fast User-level thread scheduli
ng technique using bit-vectors to further
lower the overheads of sign
al/wait operations and improves the inter-operation
between thread an
d communication runtime.
END:VEVENT
END:VCALENDAR