ICPP 2018 Program

A

A. Rodrigues, Luiz · more

A Communication-Efficient Causal Broadcast Protocol · pdf, pdf

Adhinarayanan, Vignesh · more

Models and Techniques for Green High-Performance Computing · pdf, pdf, pdf, pdf

Afsahi, Ahmad · more

The Case for Semi-Permanent Cache Occupancy · pdf, pdf

Ahn, Dong H. · more

PRIONN: Predicting Runtime and IO using Neural Networks · pdf, pdf

Ajwani, Deepak · more

An Empirical Comparison of k-Shortest Simple Path Algorithms on Multicores · pdf, pdf

Aktulga, H. M. · more

Optimization of the Spherical Harmonics Transform based Tree Traversals in the Helmholtz FMM Algorithm · pdf, pdf

Al-Mamun, Abdullah · more

Toward Performant and Energy-efficient Queries in Three-tier Wireless Sensor Networks · pdf, pdf

Anandakrishnan, Ramu · more

Identifying Carcinogenic Multi-hit Combinations usingWeighted Set Cover Algorithm · pdf, pdf, pdf, pdf

Antoškin, Vjatsešlav · more

A Computational Investigation of Redistricting Using Simulated Annealing · pdf, pdf, pdf, pdf

Ao, Yulong · more

A Fast Sparse Triangular Solver for Structured-grid Problems on Sunway Many-core Processor SW26010 · pdf, pdf

Arantes, Luciana · more

A Communication-Efficient Causal Broadcast Protocol · pdf, pdf

Arima, Eishi · more

Toward Footprint-Aware Power Shifting for Hybrid Memory Based Systems · pdf, pdf, pdf, pdf

Arnold, Dorian · more

Improving MPI Multi-threaded RMA Communication Performance · pdf, pdf

Return to Top

B

Balaji, Pavan · more

I/O Bottleneck Investigation in Deep Learning Systems · pdf, pdf, pdf, pdf

Ballard, Grey · more

Partitioning and Communication Strategies for Sparse Non-negative Matrix Factorization · pdf, pdf

Barreto Goes Perez, Tiago · more

Leveraging Resource Bottleneck Awareness and Optimizations for Data Analytics Performance · pdf, pdf, pdf, pdf
Reference-distance Eviction and Prefetching for Cache Management in Spark · pdf, pdf

Baskiyar, Sanjeev · more

Resource and Service Management in Fog Computing · pdf, pdf, pdf, pdf

Beckstein, Oliver · more

Task-parallel Analysis of Molecular Dynamics Trajectories · pdf, pdf

Benoit, Anne · more

A Performance Model to Execute Workflows on High-Bandwidth-Memory Architectures · pdf, pdf

Benson, Jeremy · more

KeyBin2: Distributed Clustering for Scalable and In-Situ Analysis · pdf, pdf

Berry, Jonathan W. · more

Optimizing for KNL Usage Modes When Data Doesn’t Fit in MCDRAM · pdf, pdf

Bhatele, Abhinav · more

Interference between I/O and MPI Traffic on Fat-tree Networks · pdf, pdf

Bhattacharjee, Mrinal · more

Efficient Search for Free Blocks in the WAFL File System · pdf, pdf

Blanco, Zachary · more

CSTF: Large-Scale Sparse Tensor Factorizations on Distributed Platforms · pdf, pdf
Reducing Communication in Proximal Newton Methods for Sparse Least Squares Problems · pdf, pdf

Brandt, Jim · more

Integrating Low-latency Analysis into HPC System Monitoring · pdf, pdf

Brew, Justin A. · more

Toward a Multi-GPU Implementation of the Modular Integer GCD Algorithm: Extended Abstract · pdf, pdf, pdf, pdf

Bridges, Patrick · more

Improving MPI Multi-threaded RMA Communication Performance · pdf, pdf

Bridges, Patrick G. · more

The Case for Semi-Permanent Cache Occupancy · pdf, pdf

Brown, Kevin A. · more

Interference between I/O and MPI Traffic on Fat-tree Networks · pdf, pdf

Brown, Laura E. · more

Utilization of Random Profiling for System Modeling and Dynamic Configuration · pdf, pdf, pdf, pdf
Constructing Dynamic Policies for Paging Mode Selection · pdf, pdf

Buluc, Aydin · more

Push-Pull on Graphs is Column- and Row-based SpMV Plus Masks · pdf, pdf, pdf, pdf
Implementing Push-Pull Efficiently in GraphBLAS · pdf, pdf

Butcher, Neil A. · more

Optimizing for KNL Usage Modes When Data Doesn’t Fit in MCDRAM · pdf, pdf

Return to Top

C

Cai, Binlei · more

Less Provisioning: A Fine-Grained Resource Scaling Engine for Long-Running Services with Tail Latency Guarantees · pdf, pdf

Cai, Wentong · more

Index Shard Replication Strategies for Improving Resource Utilization in Large Scale Search Engines · pdf, pdf

Canon, Louis-Claude · more

A Generic Approach to Scheduling and Checkpointing Workflows · pdf, pdf

Cao, Liangliang · more

Matrix Factorization on GPUs with Memory Optimization and Approximate Computing · pdf, pdf

Cao, Qiang · more

FFS-VA: A Fast Filtering System for Large-scale Video Analytics · pdf, pdf

Cao, Xuan · more

Index Shard Replication Strategies for Improving Resource Utilization in Large Scale Search Engines · pdf, pdf

Chan, Yuandong · more

SPECTR: Scalable Parallel Short Read Error Correction on Multi-core and Many-core Architectures · pdf, pdf

Chang, Chun-Kai · more

Characterizing the Impact of Soft Errors Affecting Floating-point ALUs using RTL-level Fault Injection · pdf, pdf

Chang, Shiyu · more

Matrix Factorization on GPUs with Memory Optimization and Approximate Computing · pdf, pdf

Chantzialexiou, George · more

Task-parallel Analysis of Molecular Dynamics Trajectories · pdf, pdf

Che, Yulin · more

Parallelizing Pruning-based Graph Structural Clustering · pdf, pdf

Cheatham, Thomas E. · more

Task-parallel Analysis of Molecular Dynamics Trajectories · pdf, pdf

Chen, Fei · more

Dual-Paradigm Stream Processing · pdf, pdf

Chen, Guihai · more

Charging Task Scheduling for Directional Wireless Charger Networks · pdf, pdf
IS-ASGD: Accelerating Asynchronous SGD using Importance Sampling · pdf, pdf
Heterogeneous Wireless Charger Placement with Obstacles · pdf, pdf

Chen, Hong · more

Click-Based Asynchronous Mesh Network with Bounded Bundled Data · pdf, pdf

Chen, Jianxi · more

A Write-efficient and Consistent Hashing Scheme for Non-Volatile Memory · pdf, pdf

Chen, Ren · more

C-Graph: A Highly Efficient Concurrent Graph Reachability Query Framework · pdf, pdf

Chen, Xinyu · more

KeyBin2: Distributed Clustering for Scalable and In-situ Analysis · pdf, pdf, pdf, pdf
KeyBin2: Distributed Clustering for Scalable and In-Situ Analysis · pdf, pdf

Chen, Yang · more

NFV Middlebox Placement with Balanced Set-up Cost and Bandwidth Consumption · pdf, pdf

Chen, Yifeng · more

Delta-Stepping Synchronous Parallel Model · pdf, pdf, pdf, pdf

Chen, Yong · more

Exploring Memory Coalescing for 3D-Stacked Hybrid Memory Cube · pdf, pdf, pdf, pdf
Memory Coalescing for Hybrid Memory Cube · pdf, pdf

Cheng, Bin · more

Efficient SSD Caching by Avoiding Unnecessary Writes using Machine Learning · pdf, pdf

Cheng, Dazhao · more

Reference-distance Eviction and Prefetching for Cache Management in Spark · pdf, pdf
Joint Optimization of MapReduce Scheduling and Network Policy in Hierarchical Clouds · pdf, pdf

Cheng, Wenxue · more

Power Efficient High Performance Packet I/O · pdf, pdf

Cheng, Yongli · more

HUS-Graph: I/O-Efficient Out-of-Core Graph Processing with Hybrid Update Strategy · pdf, pdf

Childers, Bruce · more

CGAcc: CSR-based Graph Traversal Accelerator on HMC · pdf, pdf, pdf, pdf

Colella, Phillip · more

A Low-Communication Method to Solve Poisson's Equation on Locally-Structured Grids · pdf, pdf, pdf, pdf

COLOMBET, Laurent · more

Combining Task-based Parallelism and Adaptive Mesh Refinement Techniques in Molecular Dynamics Simulations · pdf, pdf

Cotton, Ronald · more

WebNN: A Distributed Framework for Deep Learning · pdf, pdf, pdf, pdf

Cui, Chang · more

Delta-Stepping Synchronous Parallel Model · pdf, pdf, pdf, pdf

Curtis-Maury, Matthew · more

Efficient Search for Free Blocks in the WAFL File System · pdf, pdf

Return to Top

D

Dai, Haipeng · more

Charging Task Scheduling for Directional Wireless Charger Networks · pdf, pdf
Cache Assisted Randomized Sharing Counters in Network Measurement · pdf, pdf
Heterogeneous Wireless Charger Placement with Obstacles · pdf, pdf

Dang, Hoang-Vu · more

Fast and generic concurrent message-passing · pdf, pdf, pdf, pdf
FULT: Fast User-Level Thread Scheduling Using Bit-Vectors · pdf, pdf

Dash, Sajal · more

Identifying Carcinogenic Multi-hit Combinations usingWeighted Set Cover Algorithm · pdf, pdf, pdf, pdf

Davis, Eddie C. · more

Abstractions for Specifying Sparse Matrix Data Transformations · pdf, pdf, pdf, pdf

Davis, Timothy A. · more

A Multilevel Subtree Method for Single and Batched Sparse Cholesky Factorization · pdf, pdf

de Araujo, João Paulo · more

A Communication-Efficient Causal Broadcast Protocol · pdf, pdf

DeBardeleben, Nathan · more

Modeling Application Resilience in Large-scale Parallel Execution · pdf, pdf

Dechev, Damian · more

Integrating Low-latency Analysis into HPC System Monitoring · pdf, pdf

Dehnavi, Maryam Mehri · more

CSTF: Large-Scale Sparse Tensor Factorizations on Distributed Platforms · pdf, pdf

Demmel, James · more

ImageNet Training in Minutes · pdf, pdf
Reducing Communication in Proximal Newton Methods for Sparse Least Squares Problems · pdf, pdf

Devarakonda, Aditya · more

Reducing Communication in Proximal Newton Methods for Sparse Least Squares Problems · pdf, pdf

Devine, Thomas R. · more

Scalable Solutions for Automated Single Pulse Identification and Classification in Radio Astronomy · pdf, pdf

Dinan, James · more

Efficient Runtime Support for a Partitioned Global Logical Address Space · pdf, pdf

Dong, Wenqian · more

Modeling Application Resilience in Large-scale Parallel Execution · pdf, pdf

Dosanjh, Matthew · more

Improving MPI Multi-threaded RMA Communication Performance · pdf, pdf

Dosanjh, Matthew G. F. · more

The Case for Semi-Permanent Cache Occupancy · pdf, pdf

Dou, Wanchun · more

Heterogeneous Wireless Charger Placement with Obstacles · pdf, pdf

Duriakova, Erika · more

An Empirical Comparison of k-Shortest Simple Path Algorithms on Multicores · pdf, pdf

Return to Top

E

Erez, Mattan · more

Characterizing the Impact of Soft Errors Affecting Floating-point ALUs using RTL-level Fault Injection · pdf, pdf

Estrada, Trilce · more

KeyBin2: Distributed Clustering for Scalable and In-situ Analysis · pdf, pdf, pdf, pdf
KeyBin2: Distributed Clustering for Scalable and In-Situ Analysis · pdf, pdf

Eyraud-Dubois, Lionel · more

Using Static Allocation Algorithms for Matrix Matrix Multiplication on Multicores and GPUs · pdf, pdf

Return to Top

F

Faizian, Peyman · more

Load-Balanced Slim Fly Networks · pdf, pdf

Feng, Dan · more

HUS-Graph: I/O-Efficient Out-of-Core Graph Processing with Hybrid Update Strategy · pdf, pdf
A Write-efficient and Consistent Hashing Scheme for Non-Volatile Memory · pdf, pdf

Feng, Guangbo · more

Click-Based Asynchronous Mesh Network with Bounded Bundled Data · pdf, pdf

Feng, Wu-chun · more

Identifying Carcinogenic Multi-hit Combinations usingWeighted Set Cover Algorithm · pdf, pdf, pdf, pdf
I/O Bottleneck Investigation in Deep Learning Systems · pdf, pdf, pdf, pdf
A Framework for Auto-Parallelization and Code Generation: An Integrative Case Study with Legacy FORTRAN Codes · pdf, pdf

Feng, Yangde · more

Massively Scaling the Metal Microscopic Damage Simulation on Sunway TaihuLight Supercomputer · pdf, pdf

Fernandez, Alvaro · more

Performance Improvements of an Event Index Distributed System · pdf, pdf, pdf, pdf

Figiela, Kamil · more

Performance evaluation of parallel cloud functions · pdf, pdf, pdf, pdf

Fong, Liana · more

Matrix Factorization on GPUs with Memory Optimization and Approximate Computing · pdf, pdf

Fox, Geoffrey C. · more

Task-parallel Analysis of Molecular Dynamics Trajectories · pdf, pdf

Franchetti, Franz · more

Algorithm Design for Large Scale FFT-Based Simulations on CPU-GPU Platforms · pdf, pdf, pdf, pdf

Fu, Hao · more

GLP4NN: A Convergence-invariant and Network-agnostic Light-weight Parallelization Framework for Deep Neural Networks on Modern GPUs · pdf, pdf

Fu, Haohuan · more

A Fast Sparse Triangular Solver for Structured-grid Problems on Sunway Many-core Processor SW26010 · pdf, pdf

Fu, Mandi · more

A Write-efficient and Consistent Hashing Scheme for Non-Volatile Memory · pdf, pdf

Fu, Song · more

In-Depth Reliability Characterization of NAND Flash based Solid State Drives in High Performance Computing Systems · pdf, pdf, pdf, pdf

Return to Top

G

Gadou, Mohamed · more

A Multilevel Subtree Method for Single and Batched Sparse Cholesky Factorization · pdf, pdf

Gamblin, Todd · more

PRIONN: Predicting Runtime and IO using Neural Networks · pdf, pdf

Gan, Lin · more

A Fast Sparse Triangular Solver for Structured-grid Problems on Sunway Many-core Processor SW26010 · pdf, pdf

Gao, Chuansong · more

Index Shard Replication Strategies for Improving Resource Utilization in Large Scale Search Engines · pdf, pdf

Gao, Ping · more

SPECTR: Scalable Parallel Short Read Error Correction on Multi-core and Many-core Architectures · pdf, pdf

Gao, Xiaofeng · more

IS-ASGD: Accelerating Asynchronous SGD using Importance Sampling · pdf, pdf

Garner, Harold · more

Identifying Carcinogenic Multi-hit Combinations usingWeighted Set Cover Algorithm · pdf, pdf, pdf, pdf

Geng, Guanhui · more

Index Shard Replication Strategies for Improving Resource Utilization in Large Scale Search Engines · pdf, pdf

Geng, Xin · more

Learning Driven Parallelization for Large-Scale Video Workload in Hybrid CPU-GPU Cluster · pdf, pdf

Gentile, Ann · more

Integrating Low-latency Analysis into HPC System Monitoring · pdf, pdf

Gerndt, Michael · more

Exploiting Inter-Phase Application Dynamism to Auto-Tune HPC Applications for Energy-Efficiency · pdf, pdf, pdf, pdf

Ghazimirsaeed, S. Mahdieh · more

The Case for Semi-Permanent Cache Occupancy · pdf, pdf

Glantz, Roland · more

Topology-induced Enhancement of Mappings · pdf, pdf

Glick, Ben · more

An Extensible Ecosystem of Tools Providing User Friendly HPC Access and Supporting Jupyter Notebooks · pdf, pdf, pdf, pdf

Goin, Aaron · more

WebNN: A Distributed Framework for Deep Learning · pdf, pdf, pdf, pdf

Gonzalez, Santiago · more

Performance Improvements of an Event Index Distributed System · pdf, pdf, pdf, pdf

Goseva-Popstojanova, Katerina · more

Scalable Solutions for Automated Single Pulse Identification and Classification in Radio Astronomy · pdf, pdf

Grant, Ryan · more

Improving MPI Multi-threaded RMA Communication Performance · pdf, pdf

Grant, Ryan E. · more

The Case for Semi-Permanent Cache Occupancy · pdf, pdf

Groves, Taylor · more

Improving MPI Multi-threaded RMA Communication Performance · pdf, pdf

Gu, Lin · more

Dual-Paradigm Stream Processing · pdf, pdf

Guan, Qiang · more

Modeling Application Resilience in Large-scale Parallel Execution · pdf, pdf

Guo, Deke · more

DAG-SFC: Minimize the Embedding Cost of SFC with Parallel VNFs · pdf, pdf

Guo, Hui · more

DSAP: Data Structure-Aware Prefetching for Breadth First Search on GPU · pdf, pdf, pdf, pdf

Guo, Song · more

ran-GJS: Orchestrating Data Analytics for Heterogeneous Geo-distributed Edges · pdf, pdf

Gurbuzbalaban, Mert · more

Reducing Communication in Proximal Newton Methods for Sparse Least Squares Problems · pdf, pdf

Return to Top

H

Hall, Mary · more

Bringing Sparse Computations into the Optimization Light · view
Abstractions for Specifying Sparse Matrix Data Transformations · pdf, pdf, pdf, pdf

Hammond, Simon D. · more

Optimizing for KNL Usage Modes When Data Doesn’t Fit in MCDRAM · pdf, pdf

Han, Li · more

A Generic Approach to Scheduling and Checkpointing Workflows · pdf, pdf

Hanawa, Toshihiro · more

Toward Footprint-Aware Power Shifting for Hybrid Memory Based Systems · pdf, pdf, pdf, pdf

Hassan, Ahmed · more

Nemo: NUMA-aware Concurrency Control for Scalable Transactional Memory · pdf, pdf

He, Anping · more

Click-Based Asynchronous Mesh Network with Bounded Bundled Data · pdf, pdf

He, Bingsheng · more

Energy-Efficient Speculative Execution using Advanced Reservation for Heterogeneous Clusters · pdf, pdf
GLP4NN: A Convergence-invariant and Network-agnostic Light-weight Parallelization Framework for Deep Neural Networks on Modern GPUs · pdf, pdf

Hedden, Brandon · more

A Comprehensive Study on Bugs in Actor Systems · pdf, pdf

Hei, Yong · more

Click-Based Asynchronous Mesh Network with Bounded Bundled Data · pdf, pdf

Herbein, Stephen · more

PRIONN: Predicting Runtime and IO using Neural Networks · pdf, pdf

Hiebel, Jason · more

Utilization of Random Profiling for System Modeling and Dynamic Configuration · pdf, pdf, pdf, pdf
Constructing Dynamic Policies for Paging Mode Selection · pdf, pdf

Hjelm, Nathan · more

Improving MPI Multi-threaded RMA Communication Performance · pdf, pdf

Hoffmann, Henry · more

Energy-efficient Application Resource Scheduling using Machine Learning Classifiers · pdf, pdf
Performance & Energy Tradeoffs for Dependent Distributed Applications Under System-wide Power Caps · pdf, pdf

Hofmeyr, Steven · more

Energy-efficient Application Resource Scheduling using Machine Learning Classifiers · pdf, pdf

Hovland, Paul · more

Vectorised Computation of Diverging Ensembles · pdf, pdf

Hsieh, Cho-Jui · more

ImageNet Training in Minutes · pdf, pdf

Hu, Changjun · more

Massively Scaling the Metal Microscopic Damage Simulation on Sunway TaihuLight Supercomputer · pdf, pdf

Hu, Kan · more

Disk Failure Prediction in Data Centers via Online Learning · pdf, pdf

Hua, Yu · more

A Write-efficient and Consistent Hashing Scheme for Non-Volatile Memory · pdf, pdf

Huang, Libo · more

DSAP: Data Structure-Aware Prefetching for Breadth First Search on GPU · pdf, pdf, pdf, pdf
CGAcc: CSR-based Graph Traversal Accelerator on HMC · pdf, pdf, pdf, pdf

Huang, Ping · more

Efficient SSD Caching by Avoiding Unnecessary Writes using Machine Learning · pdf, pdf

Huang, Zhenyu · more

PBCS: An Efficient Parallel Characteristic Set Method for Solving Boolean Polynomial Systems · pdf, pdf

Hughey, Stephen M. · more

Optimization of the Spherical Harmonics Transform based Tree Traversals in the Helmholtz FMM Algorithm · pdf, pdf

Hurley, Neil · more

An Empirical Comparison of k-Shortest Simple Path Algorithms on Multicores · pdf, pdf

Hückelheim, Jan · more

Vectorised Computation of Diverging Ensembles · pdf, pdf

Return to Top

I

Ibrahim, Shadi · more

Energy-Efficient Speculative Execution using Advanced Reservation for Heterogeneous Clusters · pdf, pdf
Dual-Paradigm Stream Processing · pdf, pdf

Imes, Connor · more

Energy-efficient Application Resource Scheduling using Machine Learning Classifiers · pdf, pdf

Izadpanah, Ramin · more

Integrating Low-latency Analysis into HPC System Monitoring · pdf, pdf

Return to Top

J

Jain, Nikhil · more

Interference between I/O and MPI Traffic on Fat-tree Networks · pdf, pdf

Jannesari, Ali · more

Unveiling Thread Communication Bottlenecks Using Hardware-Independent Metrics · pdf, pdf

Javidi Kishi, Masoomeh · more

Nemo: NUMA-aware Concurrency Control for Scalable Transactional Memory · pdf, pdf

Jessup, Elizabeth · more

Iterative Solver Selection Techniques for Sparse Linear Systems · pdf, pdf, pdf, pdf

Jha, Shantenu · more

Task-parallel Analysis of Molecular Dynamics Trajectories · pdf, pdf

Jia, Xiaoying · more

Revisiting Multi-pass Scatter and Gather on GPUs · pdf, pdf

Jiang, Hong · more

HUS-Graph: I/O-Efficient Out-of-Core Graph Processing with Hybrid Update Strategy · pdf, pdf
Leverage Redundancy in Hardware Transactional Memory to Improve Cache Reliability · pdf, pdf

Jiang, Linhua · more

Toward Performant and Energy-efficient Queries in Three-tier Wireless Sensor Networks · pdf, pdf

Jin, Hai · more

Disk Failure Prediction in Data Centers via Online Learning · pdf, pdf
Dual-Paradigm Stream Processing · pdf, pdf

Jin, Yibo · more

ran-GJS: Orchestrating Data Analytics for Heterogeneous Geo-distributed Edges · pdf, pdf

Return to Top

K

Kalikar, Saurabh · more

Interval based Framework for Locking in Hierarchies · pdf, pdf, pdf, pdf
NumLock: Towards Optimal Multi-Granularity Locking in Hierarchies · pdf, pdf

Kannan, Ramakrishnan · more

Partitioning and Communication Strategies for Sparse Non-negative Matrix Factorization · pdf, pdf

Karavanic, Karen L. · more

Performance Analysis of DroughtHPC and Holistic HPC Workflows · pdf, pdf, pdf, pdf

Kavouklis, Christos · more

A Low-Communication Method to Solve Poisson's Equation on Locally-Structured Grids · pdf, pdf, pdf, pdf

Kaya, Oguz · more

Partitioning and Communication Strategies for Sparse Non-negative Matrix Factorization · pdf, pdf

Kerola, Teemu · more

Linear Time Sorting for Large Data Sets with Specialized Processor · pdf, pdf, pdf, pdf

Kesavan, Ram · more

Efficient Search for Free Blocks in the WAFL File System · pdf, pdf

Keutzer, Kurt · more

ImageNet Training in Minutes · pdf, pdf

Khoshlessan, Mahzad · more

Task-parallel Analysis of Molecular Dynamics Trajectories · pdf, pdf

Kinney, Nick · more

Identifying Carcinogenic Multi-hit Combinations usingWeighted Set Cover Algorithm · pdf, pdf, pdf, pdf

Kobus, Robin · more

SPECTR: Scalable Parallel Short Read Error Correction on Multi-core and Many-core Architectures · pdf, pdf

Kocoloski, Brian · more

Varbench: an Experimental Framework to Measure and Characterize Performance Variability · pdf, pdf

Kogge, Peter M. · more

Optimizing for KNL Usage Modes When Data Doesn’t Fit in MCDRAM · pdf, pdf

Kovacevic, Jelena · more

Algorithm Design for Large Scale FFT-Based Simulations on CPU-GPU Platforms · pdf, pdf, pdf, pdf

Krishnamoorthy, Sriram · more

Characterizing the Impact of Soft Errors Affecting Floating-point ALUs using RTL-level Fault Injection · pdf, pdf

Krommydas, Konstantinos · more

A Framework for Auto-Parallelization and Code Generation: An Integrative Case Study with Legacy FORTRAN Codes · pdf, pdf

Kulkarni, Anuva · more

Algorithm Design for Large Scale FFT-Based Simulations on CPU-GPU Platforms · pdf, pdf, pdf, pdf

Kumar, Nalini · more

Scalable Behavioral Emulation of Extreme-Scale Systems Using Structural Simulation Toolkit · pdf, pdf

Kumaraswamy, Madhura · more

Exploiting Inter-Phase Application Dynamism to Auto-Tune HPC Applications for Energy-Efficiency · pdf, pdf, pdf, pdf

Return to Top

L

Lai, Zhuohang · more

Revisiting Multi-pass Scatter and Gather on GPUs · pdf, pdf

Lam, Herman · more

Scalable Behavioral Emulation of Extreme-Scale Systems Using Structural Simulation Toolkit · pdf, pdf

Lambert, Thomas · more

Using Static Allocation Algorithms for Matrix Matrix Multiplication on Multicores and GPUs · pdf, pdf

Lange, John · more

Varbench: an Experimental Framework to Measure and Characterize Performance Variability · pdf, pdf

Larkins, D. Brian · more

Efficient Runtime Support for a Partitioned Global Logical Address Space · pdf, pdf

Le Fèvre, Valentin · more

A Generic Approach to Scheduling and Checkpointing Workflows · pdf, pdf

Lee, Patrick P. C. · more

Cross-Rack-Aware Updates in Erasure-Coded Data Centers · pdf, pdf

Leidel, John D. · more

Exploring Memory Coalescing for 3D-Stacked Hybrid Memory Cube · pdf, pdf, pdf, pdf
Memory Coalescing for Hybrid Memory Cube · pdf, pdf

Levenhagen, Michael J. · more

The Case for Semi-Permanent Cache Occupancy · pdf, pdf

Li, Cheng · more

Matrix Factorization on GPUs with Memory Optimization and Approximate Computing · pdf, pdf

Li, Dong · more

Modeling Application Resilience in Large-scale Parallel Execution · pdf, pdf

Li, Jianjiang · more

Massively Scaling the Metal Microscopic Damage Simulation on Sunway TaihuLight Supercomputer · pdf, pdf

Li, Jin cai · more

PBCS: An Efficient Parallel Characteristic Set Method for Solving Boolean Polynomial Systems · pdf, pdf

Li, Kenli · more

UHCL-Darknet: An OpenCL-based Deep Neural Network Framework for Heterogeneous Multi-/Many-core Clusters · pdf, pdf

Li, Keqin · more

UHCL-Darknet: An OpenCL-based Deep Neural Network Framework for Heterogeneous Multi-/Many-core Clusters · pdf, pdf

Li, Keqiu · more

Less Provisioning: A Fine-Grained Resource Scaling Engine for Long-Running Services with Tail Latency Guarantees · pdf, pdf

Li, Kun · more

Communication-Avoiding for Dynamical Core of Atmospheric General Circulation Model · pdf, pdf

Li, Leisheng · more

Bandwidth Reduced Parallel SpMV on the SW26010 Many-Core Platform · pdf, pdf

Li, Minghui · more

Index Shard Replication Strategies for Improving Resource Utilization in Large Scale Search Engines · pdf, pdf

Li, Pengfei · more

Click-Based Asynchronous Mesh Network with Bounded Bundled Data · pdf, pdf

Li, Qi · more

Cache Assisted Randomized Sharing Counters in Network Measurement · pdf, pdf

Li, Qiong · more

Duchy: Achieving Both SSD Durability and Controllable SMR Cleaning Overhead in Hybrid Storage Systems · pdf, pdf

Li, Shigang · more

Communication-Avoiding for Dynamical Core of Atmospheric General Circulation Model · pdf, pdf
Massively Scaling the Metal Microscopic Damage Simulation on Sunway TaihuLight Supercomputer · pdf, pdf

Li, Tonglin · more

Toward Performant and Energy-efficient Queries in Three-tier Wireless Sensor Networks · pdf, pdf

Li, Xiaoyong · more

PBCS: An Efficient Parallel Characteristic Set Method for Solving Boolean Polynomial Systems · pdf, pdf

Li, Xueqi · more

Accelerating FM-index Search for Genomic Data Processing · pdf, pdf

Li, Xuesong · more

Power Efficient High Performance Packet I/O · pdf, pdf

Li, Yusen · more

Index Shard Replication Strategies for Improving Resource Utilization in Large Scale Search Engines · pdf, pdf

Li, Zhenhua · more

H2Cloud: Maintaining the Whole Filesystem in an Object Storage Cloud · pdf, pdf

Li, Zhenyu · more

H2Cloud: Maintaining the Whole Filesystem in an Object Storage Cloud · pdf, pdf

Liang, Shuwen · more

In-Depth Reliability Characterization of NAND Flash based Solid State Drives in High Performance Computing Systems · pdf, pdf, pdf, pdf

Liao, Longlong · more

UHCL-Darknet: An OpenCL-based Deep Neural Network Framework for Heterogeneous Multi-/Many-core Clusters · pdf, pdf

Lim, Robert · more

Efficient Matching of GPU Kernel Subgraphs · pdf, pdf, pdf, pdf

Lin, Xu · more

DAG-SFC: Minimize the Embedding Cost of SFC with Parallel VNFs · pdf, pdf

Lingg, Michael P. · more

Optimization of the Spherical Harmonics Transform based Tree Traversals in the Helmholtz FMM Algorithm · pdf, pdf

Liu, Alex X. · more

Charging Task Scheduling for Directional Wireless Charger Networks · pdf, pdf
Cache Assisted Randomized Sharing Counters in Network Measurement · pdf, pdf

Liu, Bangtian · more

CSTF: Large-Scale Sparse Tensor Factorizations on Distributed Platforms · pdf, pdf

Liu, Qian · more

Cache Assisted Randomized Sharing Counters in Network Measurement · pdf, pdf

Liu, Weiguo · more

SPECTR: Scalable Parallel Short Read Error Correction on Multi-core and Many-core Architectures · pdf, pdf

Liu, Xiaoguang · more

Index Shard Replication Strategies for Improving Resource Utilization in Large Scale Search Engines · pdf, pdf

Liu, Zhiyi · more

Dual-Paradigm Stream Processing · pdf, pdf

Lu, Sanglu · more

ran-GJS: Orchestrating Data Analytics for Heterogeneous Geo-distributed Edges · pdf, pdf

Luckow, Andre · more

Task-parallel Analysis of Molecular Dynamics Trajectories · pdf, pdf

Luo, Qiong · more

Parallelizing Pruning-based Graph Structural Clustering · pdf, pdf
Revisiting Multi-pass Scatter and Gather on GPUs · pdf, pdf

Lv, Yashuai · more

DSAP: Data Structure-Aware Prefetching for Breadth First Search on GPU · pdf, pdf, pdf, pdf

Return to Top

M

Ma, Huadong · more

Learning Driven Parallelization for Large-Scale Video Workload in Hybrid CPU-GPU Cluster · pdf, pdf

Ma, Sheng · more

DSAP: Data Structure-Aware Prefetching for Breadth First Search on GPU · pdf, pdf, pdf, pdf

Mache, Jens · more

An Extensible Ecosystem of Tools Providing User Friendly HPC Access and Supporting Jupyter Notebooks · pdf, pdf, pdf, pdf

Malawski, Maciej · more

Performance evaluation of parallel cloud functions · pdf, pdf, pdf, pdf

Malony, Allen · more

Welcome and Introduction · view

Marquet, Kevin · more

NumaMMA: NUMA MeMory Analyzer · pdf, pdf

Matsuoka, Satoshi · more

Interference between I/O and MPI Traffic on Fat-tree Networks · pdf, pdf

Mazaheri, Arya · more

Unveiling Thread Communication Bottlenecks Using Hardware-Independent Metrics · pdf, pdf

McCorquodale, Peter · more

A Low-Communication Method to Solve Poisson's Equation on Locally-Structured Grids · pdf, pdf, pdf, pdf

Mehri Dehnavi, Maryam · more

Reducing Communication in Proximal Newton Methods for Sparse Least Squares Problems · pdf, pdf

Meng, Xiangxu · more

SPECTR: Scalable Parallel Short Read Error Correction on Multi-core and Many-core Architectures · pdf, pdf

Meyer, Ulrich · more

An Empirical Comparison of k-Shortest Simple Path Algorithms on Multicores · pdf, pdf

Meyerhenke, Henning · more

Topology-induced Enhancement of Mappings · pdf, pdf
Balanced k-means for Parallel Geometric Partitioning · pdf, pdf

Mills, Richard T. · more

Vectorized Parallel Sparse Matrix-Vector Multiplication in PETSc Using AVX-512 · pdf, pdf

Mohamedin, Mohamed · more

Nemo: NUMA-aware Concurrency Control for Scalable Transactional Memory · pdf, pdf

Mohammadi, Mahdi S. · more

Abstractions for Specifying Sparse Matrix Data Transformations · pdf, pdf, pdf, pdf

Mollah, Md Atiqul · more

Load-Balanced Slim Fly Networks · pdf, pdf

MONIL, MOHAMMAD ALAUL HAQUE · more

Adaptive auto-tuning in HPX using APEX · pdf, pdf, pdf, pdf

Monsalve Diaz, Jose Manuel · more

OpenMP 4.5 Implementations: Evaluation & Verification of Offloading Features · pdf, pdf, pdf, pdf

Moody, Adam · more

PRIONN: Predicting Runtime and IO using Neural Networks · pdf, pdf

Moradkhani, Hamid · more

Performance Analysis of DroughtHPC and Holistic HPC Workflows · pdf, pdf, pdf, pdf

Morel, Lionel · more

NumaMMA: NUMA MeMory Analyzer · pdf, pdf

Muite, Benson · more

A Computational Investigation of Redistricting Using Simulated Annealing · pdf, pdf, pdf, pdf

Return to Top

N

Naksinehaboon, Nichamon · more

Integrating Low-latency Analysis into HPC System Monitoring · pdf, pdf

NAMYST, Raymond · more

Combining Task-based Parallelism and Adaptive Mesh Refinement Techniques in Molecular Dynamics Simulations · pdf, pdf

Nandy, Payal · more

Abstractions for Specifying Sparse Matrix Data Transformations · pdf, pdf, pdf, pdf

Narayanan, Sri Hari Krishna · more

Vectorised Computation of Diverging Ensembles · pdf, pdf

Nasre, Rupesh · more

Interval based Framework for Locking in Hierarchies · pdf, pdf, pdf, pdf
NumLock: Towards Optimal Multi-Granularity Locking in Hierarchies · pdf, pdf

Neelakantan, Aravind · more

Scalable Behavioral Emulation of Extreme-Scale Systems Using Structural Simulation Toolkit · pdf, pdf

Nesterenko, Brandon · more

Improving Resource Utilization through Demand Aware Process Scheduling · pdf, pdf

Nie, Ningming · more

Massively Scaling the Metal Microscopic Damage Simulation on Sunway TaihuLight Supercomputer · pdf, pdf

Norris, Boyana · more

Iterative Solver Selection Techniques for Sparse Linear Systems · pdf, pdf, pdf, pdf

Return to Top

O

Olivier, Stephen L. · more

Optimizing for KNL Usage Modes When Data Doesn’t Fit in MCDRAM · pdf, pdf

Olschanowsky, Catherine · more

Abstractions for Specifying Sparse Matrix Data Transformations · pdf, pdf, pdf, pdf

Orduña, Juan · more

Performance Improvements of an Event Index Distributed System · pdf, pdf, pdf, pdf

Owens, John D. · more

Push-Pull on Graphs is Column- and Row-based SpMV Plus Masks · pdf, pdf, pdf, pdf
Implementing Push-Pull Efficiently in GraphBLAS · pdf, pdf

Return to Top

P

P. Duarte Júnior, Elias · more

A Communication-Efficient Causal Broadcast Protocol · pdf, pdf

Palmieri, Roberto · more

Nemo: NUMA-aware Concurrency Control for Scalable Transactional Memory · pdf, pdf

Pang, Di · more

Scalable Solutions for Automated Single Pulse Identification and Classification in Radio Astronomy · pdf, pdf

Panja, Rintu · more

MND-MST: A Multi-Node Multi-Device Parallel Boruvka's MST Algorithm · pdf, pdf

Parashar, Manish · more

Transforming Science through Cyberinfrastructure · view

Paraskevakos, Ioannis · more

Middleware for Data Intensive Analytics on HPC · pdf, pdf, pdf, pdf
Task-parallel Analysis of Molecular Dynamics Trajectories · pdf, pdf

Paudel, Anmol · more

A HPC Framework for Big Spatial Data Processing and Analytics · pdf, pdf, pdf, pdf
MPI-Vector-IO: Parallel I/O and Partitioning for Geospatial Vector Data · pdf, pdf

Pawlik, Maciej · more

Performance evaluation of parallel cloud functions · pdf, pdf, pdf, pdf

Peluso, Sebastiano · more

Nemo: NUMA-aware Concurrency Control for Scalable Transactional Memory · pdf, pdf

Perarnau, Swann · more

A Performance Model to Execute Workflows on High-Bandwidth-Memory Architectures · pdf, pdf

Peterson, Matt · more

KeyBin2: Distributed Clustering for Scalable and In-Situ Analysis · pdf, pdf

Phan, Tien-Dat · more

Energy-Efficient Speculative Execution using Advanced Reservation for Heterogeneous Clusters · pdf, pdf

Pottier, Loïc · more

A Performance Model to Execute Workflows on High-Bandwidth-Memory Architectures · pdf, pdf

PRAT, Raphaël · more

Combining Task-based Parallelism and Adaptive Mesh Refinement Techniques in Molecular Dynamics Simulations · pdf, pdf

Predari, Maria · more

Topology-induced Enhancement of Mappings · pdf, pdf

Pumma, Sarunya · more

I/O Bottleneck Investigation in Deep Learning Systems · pdf, pdf, pdf, pdf

Puri, Satish · more

A HPC Framework for Big Spatial Data Processing and Analytics · pdf, pdf, pdf, pdf
MPI-Vector-IO: Parallel I/O and Partitioning for Geospatial Vector Data · pdf, pdf

Return to Top

Q

Qian, Chen · more

H2Cloud: Maintaining the Whole Filesystem in an Object Storage Cloud · pdf, pdf

Qian, Cheng · more

CGAcc: CSR-based Graph Traversal Accelerator on HMC · pdf, pdf, pdf, pdf

Qian, Zhuzhong · more

ran-GJS: Orchestrating Data Analytics for Heterogeneous Geo-distributed Edges · pdf, pdf

Qiao, Zhi · more

In-Depth Reliability Characterization of NAND Flash based Solid State Drives in High Performance Computing Systems · pdf, pdf, pdf, pdf

Qiu, Kun · more

ParaPLL: Fast Parallel Shortest-path Distance Query on Large-scale Weighted Graphs · pdf, pdf

Return to Top

R

Rafique, Muhammad · more

CAMPS: Conflict-Aware Memory-Side Prefetching Scheme for Hybrid Memory Cube · pdf, pdf

Rahman, Md Shafayat · more

Topologies and Adaptive Routing on Large-Scale Interconnects · pdf, pdf, pdf, pdf
Load-Balanced Slim Fly Networks · pdf, pdf

Ramaswamy, Ajay · more

Scalable Behavioral Emulation of Extreme-Scale Systems Using Structural Simulation Toolkit · pdf, pdf

Rang, Wei · more

Joint Optimization of MapReduce Scheduling and Network Policy in Hierarchical Clouds · pdf, pdf

Ranka, Sanjay · more

A Multilevel Subtree Method for Single and Batched Sparse Cholesky Factorization · pdf, pdf

Rao, Jia · more

Improving Resource Utilization through Demand Aware Process Scheduling · pdf, pdf

Rathnayake, Sunimal · more

Cost-Time Performance of Scaling Applications on the Cloud · pdf, pdf, pdf, pdf

Ren, Bangbang · more

DAG-SFC: Minimize the Embedding Cost of SFC with Parallel VNFs · pdf, pdf

Ren, Fengyuan · more

Power Efficient High Performance Packet I/O · pdf, pdf

Ren, Xiao li · more

PBCS: An Efficient Parallel Characteristic Set Method for Solving Boolean Polynomial Systems · pdf, pdf

Rennich, Steven · more

A Multilevel Subtree Method for Single and Batched Sparse Cholesky Factorization · pdf, pdf

Robert, Yves · more

A Generic Approach to Scheduling and Checkpointing Workflows · pdf, pdf
A Performance Model to Execute Workflows on High-Bandwidth-Memory Architectures · pdf, pdf

Robins, Mark · more

AI and HPC: Challenges and Opportunities · view

Rupp, Karl · more

Vectorized Parallel Sparse Matrix-Vector Multiplication in PETSc Using AVX-512 · pdf, pdf

Return to Top

S

Sasanka, Ruchira · more

A Framework for Auto-Parallelization and Code Generation: An Integrative Case Study with Legacy FORTRAN Codes · pdf, pdf

Sathre, Paul · more

A Framework for Auto-Parallelization and Code Generation: An Integrative Case Study with Legacy FORTRAN Codes · pdf, pdf

Savas, Suleyman · more

Designing Domain-Specific Heterogenous Manycores from Dataflow Programs · pdf, pdf, pdf, pdf

Schickedanz, Alexander · more

An Empirical Comparison of k-Shortest Simple Path Algorithms on Multicores · pdf, pdf

Schmidt, Bertil · more

SPECTR: Scalable Parallel Short Read Error Correction on Multi-core and Many-core Architectures · pdf, pdf
Massively Parallel Huffman Decoding on GPUs · pdf, pdf

Schonbein, Whit · more

The Case for Semi-Permanent Cache Occupancy · pdf, pdf

Schulz, Martin · more

Toward Footprint-Aware Power Shifting for Hybrid Memory Based Systems · pdf, pdf, pdf, pdf
Interference between I/O and MPI Traffic on Fat-tree Networks · pdf, pdf

Selva, Manuel · more

NumaMMA: NUMA MeMory Analyzer · pdf, pdf

Sens, Pierre · more

A Communication-Efficient Causal Broadcast Protocol · pdf, pdf

Seth, Sharad · more

Leverage Redundancy in Hardware Transactional Memory to Improve Cache Reliability · pdf, pdf

Shaik, Shehenaz · more

Resource and Service Management in Fog Computing · pdf, pdf, pdf, pdf

Shen, Yulong · more

DAG-SFC: Minimize the Embedding Cost of SFC with Parallel VNFs · pdf, pdf

Shen, Zhirong · more

Cross-Rack-Aware Updates in Erasure-Coded Data Centers · pdf, pdf

Shi, Weisong · more

In-Depth Reliability Characterization of NAND Flash based Solid State Drives in High Performance Computing Systems · pdf, pdf, pdf, pdf

Si, Min · more

I/O Bottleneck Investigation in Deep Learning Systems · pdf, pdf, pdf, pdf

Smith, Barry F. · more

Vectorized Parallel Sparse Matrix-Vector Multiplication in PETSc Using AVX-512 · pdf, pdf

Snir, Marc · more

Fast and generic concurrent message-passing · pdf, pdf, pdf, pdf
FULT: Fast User-Level Thread Scheduling Using Bit-Vectors · pdf, pdf

Snyder, John · more

Efficient Runtime Support for a Partitioned Global Logical Address Space · pdf, pdf

Song, Jun qiang · more

PBCS: An Efficient Parallel Characteristic Set Method for Solving Boolean Polynomial Systems · pdf, pdf

Sood, Kanika · more

Iterative Solver Selection Techniques for Sparse Linear Systems · pdf, pdf, pdf, pdf

Soori, Saeed · more

Reducing Communication in Proximal Newton Methods for Sparse Least Squares Problems · pdf, pdf

Srisa-an, Witawas · more

Leverage Redundancy in Hardware Transactional Memory to Improve Cache Reliability · pdf, pdf

Stitt, Greg · more

Scalable Behavioral Emulation of Extreme-Scale Systems Using Structural Simulation Toolkit · pdf, pdf

Strout, Michelle · more

Abstractions for Specifying Sparse Matrix Data Transformations · pdf, pdf, pdf, pdf

subasi, omer · more

Characterizing the Impact of Soft Errors Affecting Floating-point ALUs using RTL-level Fault Injection · pdf, pdf

Sun, Jizhou · more

GLP4NN: A Convergence-invariant and Network-agnostic Light-weight Parallelization Framework for Deep Neural Networks on Modern GPUs · pdf, pdf

Sun, Ke · more

Charging Task Scheduling for Directional Wireless Charger Networks · pdf, pdf

Sun, Ninghui · more

Accelerating FM-index Search for Genomic Data Processing · pdf, pdf

Sun, Qiao · more

Bandwidth Reduced Parallel SpMV on the SW26010 Many-Core Platform · pdf, pdf

Sun, Shixuan · more

Parallelizing Pruning-based Graph Structural Clustering · pdf, pdf

Suriyakumar, Yasodha · more

Performance Analysis of DroughtHPC and Holistic HPC Workflows · pdf, pdf, pdf, pdf

Return to Top

T

Tan, Guangming · more

Communication-Avoiding for Dynamical Core of Atmospheric General Circulation Model · pdf, pdf
Accelerating FM-index Search for Genomic Data Processing · pdf, pdf

Tan, Wei · more

Matrix Factorization on GPUs with Memory Optimization and Approximate Computing · pdf, pdf

Tan, Yujuan · more

Leverage Redundancy in Hardware Transactional Memory to Improve Cache Reliability · pdf, pdf

Tang, Bingchang · more

Learning Driven Parallelization for Large-Scale Video Workload in Hybrid CPU-GPU Cluster · pdf, pdf

Tang, Guoming · more

DAG-SFC: Minimize the Embedding Cost of SFC with Parallel VNFs · pdf, pdf

Tang, Meng · more

A Multilevel Subtree Method for Single and Batched Sparse Cholesky Factorization · pdf, pdf

Tang, Shanjiang · more

GLP4NN: A Convergence-invariant and Network-agnostic Light-weight Parallelization Framework for Deep Neural Networks on Modern GPUs · pdf, pdf

Tang, Xueyan · more

Index Shard Replication Strategies for Improving Resource Utilization in Large Scale Search Engines · pdf, pdf

Taufer, Michela · more

KeyBin2: Distributed Clustering for Scalable and In-Situ Analysis · pdf, pdf
PRIONN: Predicting Runtime and IO using Neural Networks · pdf, pdf

Teo, Yong Meng · more

Cost-Time Performance of Scaling Applications on the Cloud · pdf, pdf, pdf, pdf

Teodorescu, Radu · more

C-Graph: A Highly Efficient Concurrent Graph Reachability Query Framework · pdf, pdf

Tian, Qi · more

UHCL-Darknet: An OpenCL-based Deep Neural Network Framework for Heterogeneous Multi-/Many-core Clusters · pdf, pdf

Tong, Jiancong · more

Index Shard Replication Strategies for Improving Resource Utilization in Large Scale Search Engines · pdf, pdf

Trahay, François · more

NumaMMA: NUMA MeMory Analyzer · pdf, pdf

Tyson, Gareth · more

H2Cloud: Maintaining the Whole Filesystem in an Object Storage Cloud · pdf, pdf

Tzovas, Charilaos · more

Balanced k-means for Parallel Geometric Partitioning · pdf, pdf

Return to Top

V

Vadhiyar, Sathish · more

MND-MST: A Multi-Node Multi-Device Parallel Boruvka's MST Algorithm · pdf, pdf

Van Straalen, Brian · more

A Low-Communication Method to Solve Poisson's Equation on Locally-Structured Grids · pdf, pdf, pdf, pdf

Varghese, Robin · more

Identifying Carcinogenic Multi-hit Combinations usingWeighted Set Cover Algorithm · pdf, pdf, pdf, pdf

Velesko, Paulius · more

Vectorised Computation of Diverging Ensembles · pdf, pdf

Vivien, Frédéric · more

A Generic Approach to Scheduling and Checkpointing Workflows · pdf, pdf

von Looz, Moritz · more

Balanced k-means for Parallel Geometric Partitioning · pdf, pdf

Return to Top

W

Wang, Fang · more

HUS-Graph: I/O-Efficient Out-of-Core Graph Processing with Hybrid Update Strategy · pdf, pdf

Wang, Fei · more

IS-ASGD: Accelerating Asynchronous SGD using Importance Sampling · pdf, pdf

Wang, Gang · more

Index Shard Replication Strategies for Improving Resource Utilization in Large Scale Search Engines · pdf, pdf

Wang, Hua · more

Efficient SSD Caching by Avoiding Unnecessary Writes using Machine Learning · pdf, pdf

Wang, Jiayao · more

Toward Performant and Energy-efficient Queries in Three-tier Wireless Sensor Networks · pdf, pdf

Wang, Jue · more

Massively Scaling the Metal Microscopic Damage Simulation on Sunway TaihuLight Supercomputer · pdf, pdf

Wang, Weijun · more

Heterogeneous Wireless Charger Placement with Obstacles · pdf, pdf

Wang, Xi · more

Exploring Memory Coalescing for 3D-Stacked Hybrid Memory Cube · pdf, pdf, pdf, pdf
Memory Coalescing for Hybrid Memory Cube · pdf, pdf

Wang, Xiangmeng · more

Massively Scaling the Metal Microscopic Damage Simulation on Sunway TaihuLight Supercomputer · pdf, pdf

Wang, Xiaoliang · more

ran-GJS: Orchestrating Data Analytics for Heterogeneous Geo-distributed Edges · pdf, pdf

Wang, Xiaoyu · more

Cache Assisted Randomized Sharing Counters in Network Measurement · pdf, pdf
Heterogeneous Wireless Charger Placement with Obstacles · pdf, pdf

Wang, Xin · more

ParaPLL: Fast Parallel Shortest-path Distance Query on Large-scale Weighted Graphs · pdf, pdf

Wang, Xinliang · more

A Fast Sparse Triangular Solver for Structured-grid Problems on Sunway Many-core Processor SW26010 · pdf, pdf

Wang, Yuanrong · more

Accelerating FM-index Search for Genomic Data Processing · pdf, pdf

Wang, Zhenlin · more

Utilization of Random Profiling for System Modeling and Dynamic Configuration · pdf, pdf, pdf, pdf
Constructing Dynamic Policies for Paging Mode Selection · pdf, pdf

Wang, Zhiying · more

DSAP: Data Structure-Aware Prefetching for Breadth First Search on GPU · pdf, pdf, pdf, pdf
CGAcc: CSR-based Graph Traversal Accelerator on HMC · pdf, pdf, pdf, pdf

Wang, Zijun · more

Matrix Factorization on GPUs with Memory Optimization and Approximate Computing · pdf, pdf

Weber, Kenneth · more

Toward a Multi-GPU Implementation of the Modular Integer GCD Algorithm: Extended Abstract · pdf, pdf, pdf, pdf

Wei, Dengping · more

Duchy: Achieving Both SSD Durability and Controllable SMR Cleaning Overhead in Hybrid Storage Systems · pdf, pdf

Wei, Yanjie · more

SPECTR: Scalable Parallel Short Read Error Correction on Multi-core and Many-core Architectures · pdf, pdf

Weissenberger, Andre · more

Massively Parallel Huffman Decoding on GPUs · pdf, pdf

Wernsman, Robert · more

Improving First Level Cache Efficiency for GPUs Using Dynamic Line Protection · pdf, pdf

Wolf, Felix · more

Unveiling Thread Communication Bottlenecks Using Hardware-Independent Metrics · pdf, pdf

Wolf, Tilman · more

ParaPLL: Fast Parallel Shortest-path Distance Query on Large-scale Weighted Graphs · pdf, pdf

Wood, Chad · more

SOSflow: A Scalable Observation System for Introspection and In Situ Analytics · pdf, pdf, pdf, pdf

Wu, Baodong · more

Communication-Avoiding for Dynamical Core of Atmospheric General Circulation Model · pdf, pdf
Massively Scaling the Metal Microscopic Damage Simulation on Sunway TaihuLight Supercomputer · pdf, pdf

Wu, Changmao · more

Bandwidth Reduced Parallel SpMV on the SW26010 Many-Core Platform · pdf, pdf

Wu, Jie · more

NFV Middlebox Placement with Balanced Set-up Cost and Bandwidth Consumption · pdf, pdf

Wu, Kai · more

Modeling Application Resilience in Large-scale Parallel Execution · pdf, pdf

Wu, Song · more

Disk Failure Prediction in Data Centers via Online Learning · pdf, pdf
Dual-Paradigm Stream Processing · pdf, pdf

Wu, Xiaobing · more

Heterogeneous Wireless Charger Placement with Obstacles · pdf, pdf

Wyatt, Michael R. · more

PRIONN: Predicting Runtime and IO using Neural Networks · pdf, pdf

Return to Top

X

Xia, Yinglong · more

C-Graph: A Highly Efficient Concurrent Graph Reachability Query Framework · pdf, pdf

Xiao, Jiang · more

Disk Failure Prediction in Data Centers via Online Learning · pdf, pdf

Xiao, Junmin · more

Communication-Avoiding for Dynamical Core of Atmospheric General Circulation Model · pdf, pdf

Xiao, Liquan · more

Duchy: Achieving Both SSD Durability and Controllable SMR Cleaning Overhead in Hybrid Storage Systems · pdf, pdf

Xie, Jing · more

Power Efficient High Performance Packet I/O · pdf, pdf

Xie, Xuchao · more

Duchy: Achieving Both SSD Durability and Controllable SMR Cleaning Overhead in Hybrid Storage Systems · pdf, pdf

Xiong, Zhuang · more

Disk Failure Prediction in Data Centers via Online Learning · pdf, pdf

Xu, Kai · more

SPECTR: Scalable Parallel Short Read Error Correction on Multi-core and Many-core Architectures · pdf, pdf

Xu, Ping · more

A Fast Sparse Triangular Solver for Structured-grid Problems on Sunway Many-core Processor SW26010 · pdf, pdf

Xu, Xianghao · more

HUS-Graph: I/O-Efficient Out-of-Core Graph Processing with Hybrid Update Strategy · pdf, pdf

Xue, Wei · more

A Fast Sparse Triangular Solver for Structured-grid Problems on Sunway Many-core Processor SW26010 · pdf, pdf

Return to Top

Y

Yan, Zhichao · more

Leverage Redundancy in Hardware Transactional Memory to Improve Cache Reliability · pdf, pdf

Yang, Bailong · more

Power Efficient High Performance Packet I/O · pdf, pdf

Yang, Canqun · more

UHCL-Darknet: An OpenCL-based Deep Neural Network Framework for Heterogeneous Multi-/Many-core Clusters · pdf, pdf

Yang, Carl · more

Push-Pull on Graphs is Column- and Row-based SpMV Plus Masks · pdf, pdf, pdf, pdf
Implementing Push-Pull Efficiently in GraphBLAS · pdf, pdf

Yang, Chao · more

A Fast Sparse Triangular Solver for Structured-grid Problems on Sunway Many-core Processor SW26010 · pdf, pdf

Yang, Donglin · more

Joint Optimization of MapReduce Scheduling and Network Policy in Hierarchical Clouds · pdf, pdf

Yang, Guangwen · more

A Fast Sparse Triangular Solver for Structured-grid Problems on Sunway Many-core Processor SW26010 · pdf, pdf

Yang, Tianye · more

Duchy: Achieving Both SSD Durability and Controllable SMR Cleaning Overhead in Hybrid Storage Systems · pdf, pdf

Yao, Erlin · more

Communication-Avoiding for Dynamical Core of Atmospheric General Circulation Model · pdf, pdf

Ye, Jun · more

IS-ASGD: Accelerating Asynchronous SGD using Importance Sampling · pdf, pdf

Yi, Qing · more

Improving Resource Utilization through Demand Aware Process Scheduling · pdf, pdf

Yi, Xinbo · more

Efficient SSD Caching by Avoiding Unnecessary Writes using Machine Learning · pdf, pdf

Yi, Yusheng · more

Disk Failure Prediction in Data Centers via Online Learning · pdf, pdf

You, Yang · more

ImageNet Training in Minutes · pdf, pdf

Yu, Ce · more

GLP4NN: A Convergence-invariant and Network-agnostic Light-weight Parallelization Framework for Deep Neural Networks on Modern GPUs · pdf, pdf

Yu, Hongfeng · more

A Distributed Infomap Algorithm for Scalable and High-Quality Community Detection · pdf, pdf

Yu, Qi · more

DSAP: Data Structure-Aware Prefetching for Breadth First Search on GPU · pdf, pdf, pdf, pdf
CGAcc: CSR-based Graph Traversal Accelerator on HMC · pdf, pdf, pdf, pdf

Yuan, Jing · more

ParaPLL: Fast Parallel Shortest-path Distance Query on Large-scale Weighted Graphs · pdf, pdf

Yuan, Xin · more

Load-Balanced Slim Fly Networks · pdf, pdf

Return to Top

Z

Zambreno, Joseph · more

Improving First Level Cache Efficiency for GPUs Using Dynamic Line Protection · pdf, pdf

Zang, Dawei · more

Accelerating FM-index Search for Genomic Data Processing · pdf, pdf

Zeng, Jianping · more

A Distributed Infomap Algorithm for Scalable and High-Quality Community Detection · pdf, pdf

Zhai, Ennan · more

H2Cloud: Maintaining the Whole Filesystem in an Object Storage Cloud · pdf, pdf

Zhang, Changyou · more

Bandwidth Reduced Parallel SpMV on the SW26010 Many-Core Platform · pdf, pdf

Zhang, Chen · more

FFS-VA: A Fast Filtering System for Large-scale Video Analytics · pdf, pdf

Zhang, Haitao · more

Learning Driven Parallelization for Large-Scale Video Workload in Hybrid CPU-GPU Cluster · pdf, pdf

Zhang, He · more

Communication-Avoiding for Dynamical Core of Atmospheric General Circulation Model · pdf, pdf

Zhang, Hong · more

Vectorized Parallel Sparse Matrix-Vector Multiplication in PETSc Using AVX-512 · pdf, pdf

Zhang, Huazhe · more

Performance & Energy Tradeoffs for Dependent Distributed Applications Under System-wide Power Caps · pdf, pdf

Zhang, Jiajia · more

Bandwidth Reduced Parallel SpMV on the SW26010 Many-Core Platform · pdf, pdf

Zhang, Jiling · more

Click-Based Asynchronous Mesh Network with Bounded Bundled Data · pdf, pdf

Zhang, Lijun · more

Charging Task Scheduling for Directional Wireless Charger Networks · pdf, pdf

Zhang, Qifei · more

Delta-Stepping Synchronous Parallel Model · pdf, pdf, pdf, pdf

Zhang, Rongqi · more

Less Provisioning: A Fine-Grained Resource Scaling Engine for Long-Running Services with Tail Latency Guarantees · pdf, pdf

Zhang, Sheng · more

ran-GJS: Orchestrating Data Analytics for Heterogeneous Geo-distributed Edges · pdf, pdf

Zhang, Tong · more

Power Efficient High Performance Packet I/O · pdf, pdf

Zhang, Weidong · more

Delta-Stepping Synchronous Parallel Model · pdf, pdf, pdf, pdf

Zhang, Xiaoyi · more

A Write-efficient and Consistent Hashing Scheme for Non-Volatile Memory · pdf, pdf

Zhang, Yongxuan · more

HUS-Graph: I/O-Efficient Out-of-Core Graph Processing with Hybrid Update Strategy · pdf, pdf

Zhang, Yunquan · more

Communication-Avoiding for Dynamical Core of Atmospheric General Circulation Model · pdf, pdf
Massively Scaling the Metal Microscopic Damage Simulation on Sunway TaihuLight Supercomputer · pdf, pdf

Zhang, Zhao · more

ImageNet Training in Minutes · pdf, pdf

Zhao, Dongfang · more

Toward Performant and Energy-efficient Queries in Three-tier Wireless Sensor Networks · pdf, pdf

Zhao, Jin · more

ParaPLL: Fast Parallel Shortest-path Distance Query on Large-scale Weighted Graphs · pdf, pdf

Zhao, Juan · more

PBCS: An Efficient Parallel Characteristic Set Method for Solving Boolean Polynomial Systems · pdf, pdf

Zhao, Laiping · more

Less Provisioning: A Fine-Grained Resource Scaling Engine for Long-Running Services with Tail Latency Guarantees · pdf, pdf

Zhao, Leiyu · more

H2Cloud: Maintaining the Whole Filesystem in an Object Storage Cloud · pdf, pdf

Zhao, Minghao · more

H2Cloud: Maintaining the Whole Filesystem in an Object Storage Cloud · pdf, pdf

Zhao, Xinghui · more

WebNN: A Distributed Framework for Deep Learning · pdf, pdf, pdf, pdf
A Comprehensive Study on Bugs in Actor Systems · pdf, pdf

Zheng, Jiaqi · more

Charging Task Scheduling for Directional Wireless Charger Networks · pdf, pdf
Cache Assisted Randomized Sharing Counters in Network Measurement · pdf, pdf
Heterogeneous Wireless Charger Placement with Obstacles · pdf, pdf

Zheng, Weimin · more

A Fast Sparse Triangular Solver for Structured-grid Problems on Sunway Many-core Processor SW26010 · pdf, pdf

Zhou, Amelie Chi · more

Energy-Efficient Speculative Execution using Advanced Reservation for Heterogeneous Clusters · pdf, pdf

Zhou, Ke · more

Efficient SSD Caching by Avoiding Unnecessary Writes using Machine Learning · pdf, pdf

Zhou, Li · more

C-Graph: A Highly Efficient Concurrent Graph Reachability Query Framework · pdf, pdf

Zhou, Xiaobo · more

Leveraging Resource Bottleneck Awareness and Optimizations for Data Analytics Performance · pdf, pdf, pdf, pdf
Reference-distance Eviction and Prefetching for Cache Management in Spark · pdf, pdf

Zhu, Min · more

PBCS: An Efficient Parallel Characteristic Set Method for Solving Boolean Polynomial Systems · pdf, pdf

Zhu, Xian · more

Improving First Level Cache Efficiency for GPUs Using Dynamic Line Protection · pdf, pdf

Zhu, Yuanyang · more

ParaPLL: Fast Parallel Shortest-path Distance Query on Large-scale Weighted Graphs · pdf, pdf

Zhu, Zhichun · more

CAMPS: Conflict-Aware Memory-Side Prefetching Scheme for Hybrid Memory Cube · pdf, pdf

Return to Top