BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Los_Angeles
X-LIC-LOCATION:America/Los_Angeles
BEGIN:DAYLIGHT
TZOFFSETFROM:-0800
TZOFFSETTO:-0700
TZNAME:PDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0700
TZOFFSETTO:-0800
TZNAME:PST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20180810T143148Z
LOCATION:Erb Memorial Union (EMU) Ballroom, 2nd Floor
DTSTART;TZID=America/Los_Angeles:20180814T201800
DTEND;TZID=America/Los_Angeles:20180814T202200
UID:icpp_ICPP 2018_sess141_pos136@linklings.com
SUMMARY:Fast and generic concurrent message-passing
DESCRIPTION:Fast and generic concurrent message-passing\n\nDang, Snir\n\n\
 nCommunication hardware and software have a significant impact on the perf
 ormance<br />of clusters and supercomputers. The Message-Passing Interface
  (MPI) has become a<br />de-facto standard API for communication in HPC. H
 owever, it recently faces a new<br />challenge due to the emergence of man
 y-core nodes and of programming models that<br />provide dynamic task para
 llelism and assume large numbers of concurrent,<br />light-weight threads.
  Using MPI atop of these languages/runtimes is inefficient<br />because MP
 I implementation is not able to perform well with threads. Using MPI<br />
 as a communication middleware is also not efficient since MPI has to provi
 de<br />many abstractions that are not needed for many of the frameworks, 
 thus having<br />the extra overhead.<br /><br />Our research focuses on im
 proving the communication of applications and<br />frameworks in three fro
 nts. First, although MPI performance is lagging behind,<br />we show that 
 it can be improved using more advanced techniques of thread<br />synchroni
 zation and appropriate semantic relaxations. Our proposal and<br />techniq
 ues are being incorporated into the MPICH implementation. Second, we<br />
 develop a generic and low-level communication interface (LCI) which target
 s<br />emerging applications such as asynchronous task-model where the cur
 rent MPI<br />semantics are less ideal. LCI design allows for generic and 
 low-overhead<br />producer and consumer matching, better thread integratio
 n, and maps more direct<br />to the network interface. In previous works L
 CI has improved the<br />state-of-the-art performance of graph analytics f
 rameworks. Lastly, we develop<br />FULT, a fast User-level thread scheduli
 ng technique using bit-vectors to further<br />lower the overheads of sign
 al/wait operations and improves the inter-operation<br />between thread an
 d communication runtime.
END:VEVENT
END:VCALENDAR