ICPP 2021 Program
All times are in CDT (Chicago time).

YouTube Channel

Overview | By Date | By Event Type | By Room | Author Index

Monday, August 9th


1A: International Workshop on Deployment and Use of Accelerators (DUAC) - Presentations
Room A
Carlos Reaño
Welcome Remarks
Warp-centric K-Nearest Neighbor Graphs construction on GPU
Explaining the Classification Performance of Supervised and Semi-Supervised Methods for Automated Sparse Matrix Format Selection
An Intelligent Parallel Distributed Streaming Framework for Near Real-time Science Sensors and High Resolution Medical Images
Tangled: A Conventional Processor Integrating A Quantum-Inspired Coprocessor
Enabling Real-Time Irregular Data-Flow Pipelines on SIMD Devices
TurboBC: A Memory Efficient and Scalable GPU Based Betweenness Centrality(BC) Algorithm in the Language of Linear Algebra
DUAC Workshop


1B: International Workshop on Embedded Multicore Systems (ICPP-EMS) - Presentations
Room B
YungChia Lin
Welcome Remarks
ArchViMP – a Framework for Automatic Extraction of Concurrency-related Software Architectural Properties
Dual-KV: Improving Performance of Key-value Caches on Multilevel Cell Non-volatile Memory
Ghostwriter: A Cache Coherence Protocol for Error-Tolerant Applications
Hyperchaining Optimizations for an LLVM-Based Binary Translator on x86-64 and RISC-V Platforms
Intra- and Inter- Layer Transformation to Reduce Memory Traffic for CNN Computation
Accelerate Binarized Neural Networks with Processing-in-Memory Enabled by RISC-V Custom Instructions
Accelerating Neural Network Training using Arbitrary Precision Approximating Matrix Multiplication Algorithms
Support Convolution of CNN with Compression Sparse Matrix Multiplication Flow in TVM
EMS Workshop

1C: International Workshop on Parallel Programming Models and Systems Software for High-End Compu...
Room C
John Leidel
Welcome Remarks
DYFLOW: A flexible framework for orchestrating scientific workflows on supercomputers
Transparent Resource Elasticity for Task-Based Cluster Environments with Work Stealing
Impact of AVX-512 Instructions on Graph Partitioning Problems.
Evaluating the Performance of Integer Sum Reduction in SYCL
Design of a Portable Implementation of Partitioned Point-to-Point Communication Primitives
Assessing Resource Provisioning and Allocation of Ensembles of In Situ Workflows
FMSM: A Fuzzy Multi-keyword Search Scheme for Encrypted Cloud Data based on Multi-chain Network
Implementing Arbitrary/Common Concurrent Writes of CRCW PRAM
P2S2 Workshop


2A: International Workshop on Applications of Wireless Ad hoc and Sensor Networks (AWASN) - Prese...
Room A
Kazuya Sakai
Welcome Remarks
Self-Stabilization with Selfish Agents
Automated Arrhythmia Detection using Hilbert-Huang Transform based Convolutional Neural Network
New Evacuation Guidance Using Augmented Reality for Emergency Rescue Evacuation Support System (ERESS)
Analysis on Nursing Care Activity Related Stress Level for Reduction of Caregiving Workload
AWASN Workshop


2B: Workshop on LLVM in Parallel Processing (LLPP) - Presentations
Room B
Johannes Doerfert
Welcome Remarks
Shared Memory Remote Procedure Calls
Advancing OpenMP Offload Debugging Capabilities in LLVM
Loop Transformations using Clang's Abstract Syntax Tree
Adapting SYCL’s SIMT Programming Paradigm for Accelerators via Program Reconstruction
Towards Compile-Time-Reducing Compiler Optimization Selection via Machine Learning
A Virtual GPU as Developer-Friendly OpenMP Offload Target
LLPP Workshop

2C: International Workshop on Parallel and Distributed Algorithms for Decision Sciences (PDADS)...
Room C
Sudip Seal
Welcome Remarks
Domain Decomposition Preconditioners for Unstructured Network Problems in Parallel Vector Architectures
Constraint Solving by Quantum Annealing
Towards Faster Execution of Ensemble ML Bootstrap Based Techniques
Design Considerations for GPU-based Mixed Integer Programming on Parallel Computing Platforms
GPU Accelerated SL0 for Multidimensional Signals
Cache-Aware Data Management for Memory-Mapped Forests
PDADS Workshop


Room A
Anne Benoit; Anthony Kougkas
Boosting Compaction Performance of LSM-tree-based KV Stores in Multi-Near-Data Processing Systems
A Virtualization Platform Designed for Irregular Multi-Process Applications
A Log-Free and Consistent Chained Hashing for Non-volatile Memory
Postmortem Graph Analysis on the Temporal Graph
XHYPRE: A high-precision numerical software package for solving large-scale sparse linear equations

Tuesday, August 10th


Opening Remarks
Opening Remark


Keynote: Rick L. Stevens, DOE-ANL, Exascale and then what?: The next decade for HPC and AI
Xian-He Sun


1A: Memory Systems and NVM
Room A
Xi Wang
Matryoshka: A Coalesced Delta Sequence Prefetcher
Fast and Consistent Remote Direct Access to Non-volatile Memory
Crash-Consistency-Aware Encryption for Non-Volatile Memories
Wave-PIM: Accelerating Wave Simulation Using Processing-in-Memory
Conference Paper

1B: GPU Computing and Task-based Programming Models
Room B
Tobias Weinzierl
Efficient GPU-Implementation for Integer Sorting Based on Histogram and Prefix-Sums
Combining Dynamic Concurrency Throttling with Voltage and Frequency Scaling on Task-based Programming Models
CuART - a CUDA-based, scalable Radix-Tree lookup and update engine
BGPQ: A Heap-Based Priority Queue Design for GPUs
Conference Paper

1C: Resource Management and Infrastructure
Room C
Yang You
CERES: Container-Based Elastic Resource Management System for Mixed Workloads
PREP: Predicting Job Runtime with Job Running Path on Supercomputers
BitX: Empower Versatile Inference with Hardware Runtime Pruning
AMPS-Inf: Automatic Model Partitioning for Serverless Inference with Cost Efficiency.
Conference Paper


2A: Storage Systems and Parallel I/O
Room A
Suren Byna
A Graph-Assisted Out-of-Place Update Scheme for Erasure Coded Storage Systems
Multi-level Forwarding and Scheduling Recovery Technique in Heterogeneous Network for Erasure-coded Clusters
ASLDP: An Active Semi-supervised Learning method for Disk Failure Prediction
Coupling Right-Provisioned Cold Storage Data Centers with Deduplication
Conference Paper

2B: Scheduling Algorithms and Optimizations
Room B
Antonino Tumeo
Multi-Resource List Scheduling of Moldable Parallel Jobs under Precedence Constraints
HiPa: Hierarchical Partitioning for Fast PageRank on NUMA Multicore Systems
GVT-Guided Demand-Driven Scheduling in Parallel Discrete Event Simulation
Joint Optimization of DNN Partition and Scheduling for Mobile Cloud Computing
Conference Paper

2C: GPU-Accelerated Applications
Room C
Taisuke Boku
Accelerating Sequence-to-Graph Alignment on Heterogeneous Processors
Exploring HW/SW Co-Optimizations for Accelerating Large-scale Texture Identification on Distributed GPUs
MetaCache-GPU: Ultra-Fast Metagenomic Classification
Fourth-Order Exhaustive Epistasis Detection for the xPU Era
Conference Paper


3A: Performance Modeling and Evaluation
Room A
Doru Thom Popovici
Efficient Modeling of Random Sampling-Based LRU
An Evaluation of Task-Parallel Frameworks for Sparse Solvers on Multicore and Manycore CPU Architectures
gem5+RTL: A Framework to Enable RTL Models Inside a Full-System Simulator
Interferences between Communications and Computations in Distributed HPC Systems
Conference Paper

3B: Parallelization and Code Generation
Room B
Barbara Chapman
Automatic Code Generation and Optimization of Large-scale Stencil Computation on Many-core Processors
Tool-Supported Mini-App Extraction to Facilitate Program Analysis and Parallelization
Automatic Generation of High-Performance Inference Kernels for Graph Neural Networks on Multi-Core Systems
Optimizing Work Stealing Communication with Structured Atomic Operations
Conference Paper

3C: Applications with Machine Learning
Room C
Xingfu Wu
CNN+LSTM Accelerated Turbulent Flow Simulation with Link-Wise Artificial Compressibility Method
ComputeCOVID19+: Accelerating COVID-19 Diagnosis and Monitoring via High-Performance Deep Learning
IMPECCABLE: Integrated Modeling PipelinE for COVID Cure by Assessing Better LEads
Multi-Agent Reinforcement Learning based Distributed Renewable Energy Matching for Datacenters
Conference Paper


Bonfire Gathering: Remember the Old Tradition Virtually!
Wu Feng
ICPP Bonfire

Wednesday, August 11th


Best Paper Candidates
Felix Wolf
FastPSO: Towards Efficient Swarm Intelligence Algorithm on GPUs
SPMFS: A Scalable Persistent Memory File System on Optane Persistent Memory
Exploiting system level heterogeneity to improve the performance of a GeoStatistics multi-phase task-based application
Context-aware Data Operation Strategies in Edge Systems for High Application Performance
Conference Paper


Panel: Celebrating 50 Years of ICPP
Room A
Rudolf Eigenmann


4A: Graph Computing
Room A
Erika Parsons
Ascetic: Enhancing Cross-Iterations Data Efficiency in Out-of-Memory Graph Processing on GPUs
Communication Avoiding All-Pairs Shortest Paths Algorithm for Sparse graphs
An Edge-Fencing Strategy for Optimizing SSSP Computations on Large-Scale Graphs
Exploiting in-Hub Temporal Locality in SpMV-based Graph Processing
Conference Paper

4B: Storage Software and Optimizations
Room B
Anthony Kougkas
Fast Reconstruction for Large Disk Enclosures Based on RAID2.0
Parallel Multi-split Extendible Hashing for Persistent Memory
Intra-page Cache Update in SLC-mode with Partial Programming in High Density SSDs
HDNH: a read-efficient and write-optimized hashing scheme for hybrid DRAM-NVM memory
Conference Paper

4C: Algorithms and Applications
Room C
Zhou Jin
Generalized Skyline Interval Coloring and Dynamic Geometric Bin Packing Problems
Accelerating DBSCAN Algorithm with AI Chips for Large Datasets
Efficient Parallel Algorithms for String Comparison
Parallel Tucker Decomposition with Numerically Accurate SVD
Conference Paper


5A: Linear Algebra Algorithms
Room A
Matthew Knepley
Processor-Aware Cache-Oblivious Algorithms
Tridiagonal GPU Solver with Scaled Partial Pivoting at Maximum Bandwidth
Efficiently Parallelizable Strassen-Based Multiplication of a Matrix by its Transpose
Fast and Scalable Sparse Triangular Solver for Multi-GPU Based HPC Architectures
Conference Paper

5B: Data Analytics Systems and Runtime
Room B
Rong Ge
Using Vectorized Execution to Improve SQL Query Performance on Spark
Sparker: Efficient Reduction for More Scalable Machine Learning with Spark
NoStop: A Novel Configuration Optimization Scheme for Spark Streaming
ROBOTune: High-Dimensional Configuration Tuning for Cluster-Based Data Analytics
Conference Paper

5C: Applications and Performance
Room C
Valerie Taylor
ADA: An Application-Conscious Data Acquirer for Visual Molecular Dynamics
Teddy: An Efficient SIMD-based Literal Matching Engine for Scalable Deep Packet Inspection
Enabling Efficient SIMD Acceleration for Virtual Radio Access Network
Scaling Generalized N-Body Problems, A Case Study from Genomics
Conference Paper

Thursday, August 12th


Keynote: Manish Parashar, U. Utah, NSF, Harnessing Advanced Cyberinfrastructure for Urgent Science
Sameer Shende


6A: Networking and Routing
Room A
Kevin A. Brown
Receiver-Driven Congestion Control for InfiniBand
Optimizing Flow Completion Time via Adaptive Buffer Management in Data Center Networks
sRouting: Towards a Better Flow Size Estimation Performance through Routing and Sketch Configuration
Distributed Game-Theoretical Route Navigation for Vehicular Crowdsensing
Conference Paper

6B: Machine Learning and Acceleration
Room B
Riyadh Baghdadi
Prophet: Speeding up Distributed DNN Training with Predictable Communication Scheduling
Accelerated Device Placement Optimization with Contrastive Learning
Optimizing Massively Parallel Winograd Convolution on ARM Processor
Hippie: A Data-Paralleled Pipeline Approach to Improve Memory-Efficiency and Scalability for Large DNN Training
Conference Paper

6C: Data Structures and Applications
Room C
Wei Xue
A Universal Construction to implement Concurrent Data Structure for NUMA-multicore
A Novel Multi-CPU/GPU Collaborative Computing Framework for SGD-based Matrix Factorization
FedCav: Contribution-aware Model Aggregation on Distributed Heterogeneous Data in Federated Learning
A Fast, General System for Buffered Persistent Data Structures
Conference Paper


7A: Performance Optimization
Room A
Michael Gerndt
Accurate Matrix Multiplication on Binary128 Format Accelerated by Ozaki Scheme
Regu2D: Accelerating Vectorization of SpMV on Intel Processors through 2D-partitioning and Regular Arrangement
CD-SGD: Distributed Stochastic Gradient Descent with Compression and Delay Compensation
Recursion Brings Speedup to Out-of-Core TensorCore-based Linear Algebra Algorithms: A Case Study of Classic Gram-Schmidt QR Factorization
Conference Paper

7B: Machine Learning Algorithms
Room B
Leonid Oliker
Dubhe: Towards Data Unbiasedness with Homomorphic Encryption in Federated Learning Client Selection
Optimizing Winograd-Based Convolution with Tensor Cores
LoWino: Towards Efficient Low-Precision Winograd Convolutions on Modern CPUs
FIFL: A Fairness Incentive Framework for Federated Learning
Conference Paper

7C: Virtualization and Stream Processing
Room C
Ali Jannesari
Progressive Memory Adjustment with Performance Guarantee in Virtualized Systems
Best VM Selection for Big Data Applications across Multiple Frameworks by Transfer Learning
Efficient Complete Event Trend Detection over High-Velocity Streams
Paratick: Reducing Timer Overhead in Virtual Machines
Conference Paper


Closing and Conference Awards

Closing and Conference Awards

Created 2021-8-8 18:53