ICPP 2018 Program


Overview | By Date | By Event Type | By Room | Author Index

A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | V | W | X | Y | Z

A
A. Rodrigues, Luiz · moreA Communication-Efficient Causal Broadcast Protocol · pdf, pdf
Adhinarayanan, Vignesh · moreModels and Techniques for Green High-Performance Computing · pdf, pdf, pdf, pdf
Afsahi, Ahmad · moreThe Case for Semi-Permanent Cache Occupancy · pdf, pdf
Ahn, Dong H. · morePRIONN: Predicting Runtime and IO using Neural Networks · pdf, pdf
Ajwani, Deepak · moreAn Empirical Comparison of k-Shortest Simple Path Algorithms on Multicores · pdf, pdf
Aktulga, H. M. · moreOptimization of the Spherical Harmonics Transform based Tree Traversals in the Helmholtz FMM Algorithm · pdf, pdf
Al-Mamun, Abdullah · moreToward Performant and Energy-efficient Queries in Three-tier Wireless Sensor Networks · pdf, pdf
Anandakrishnan, Ramu · moreIdentifying Carcinogenic Multi-hit Combinations usingWeighted Set Cover Algorithm · pdf, pdf, pdf, pdf
Antoškin, Vjatsešlav · moreA Computational Investigation of Redistricting Using Simulated Annealing · pdf, pdf, pdf, pdf
Ao, Yulong · moreA Fast Sparse Triangular Solver for Structured-grid Problems on Sunway Many-core Processor SW26010 · pdf, pdf
Arantes, Luciana · moreA Communication-Efficient Causal Broadcast Protocol · pdf, pdf
Arima, Eishi · moreToward Footprint-Aware Power Shifting for Hybrid Memory Based Systems · pdf, pdf, pdf, pdf
Arnold, Dorian · moreImproving MPI Multi-threaded RMA Communication Performance · pdf, pdf

B
Balaji, Pavan · moreI/O Bottleneck Investigation in Deep Learning Systems · pdf, pdf, pdf, pdf
Ballard, Grey · morePartitioning and Communication Strategies for Sparse Non-negative Matrix Factorization · pdf, pdf
Barreto Goes Perez, Tiago · moreLeveraging Resource Bottleneck Awareness and Optimizations for Data Analytics Performance · pdf, pdf, pdf, pdf
Reference-distance Eviction and Prefetching for Cache Management in Spark · pdf, pdf
Baskiyar, Sanjeev · moreResource and Service Management in Fog Computing · pdf, pdf, pdf, pdf
Beckstein, Oliver · moreTask-parallel Analysis of Molecular Dynamics Trajectories · pdf, pdf
Benoit, Anne · moreA Performance Model to Execute Workflows on High-Bandwidth-Memory Architectures · pdf, pdf
Benson, Jeremy · moreKeyBin2: Distributed Clustering for Scalable and In-Situ Analysis · pdf, pdf
Berry, Jonathan W. · moreOptimizing for KNL Usage Modes When Data Doesn’t Fit in MCDRAM · pdf, pdf
Bhatele, Abhinav · moreInterference between I/O and MPI Traffic on Fat-tree Networks · pdf, pdf
Bhattacharjee, Mrinal · moreEfficient Search for Free Blocks in the WAFL File System · pdf, pdf
Blanco, Zachary · moreCSTF: Large-Scale Sparse Tensor Factorizations on Distributed Platforms · pdf, pdf
Reducing Communication in Proximal Newton Methods for Sparse Least Squares Problems · pdf, pdf
Brandt, Jim · moreIntegrating Low-latency Analysis into HPC System Monitoring · pdf, pdf
Brew, Justin A. · moreToward a Multi-GPU Implementation of the Modular Integer GCD Algorithm: Extended Abstract · pdf, pdf, pdf, pdf
Bridges, Patrick · moreImproving MPI Multi-threaded RMA Communication Performance · pdf, pdf
Bridges, Patrick G. · moreThe Case for Semi-Permanent Cache Occupancy · pdf, pdf
Brown, Kevin A. · moreInterference between I/O and MPI Traffic on Fat-tree Networks · pdf, pdf
Brown, Laura E. · moreUtilization of Random Profiling for System Modeling and Dynamic Configuration · pdf, pdf, pdf, pdf
Constructing Dynamic Policies for Paging Mode Selection · pdf, pdf
Buluc, Aydin · morePush-Pull on Graphs is Column- and Row-based SpMV Plus Masks · pdf, pdf, pdf, pdf
Implementing Push-Pull Efficiently in GraphBLAS · pdf, pdf
Butcher, Neil A. · moreOptimizing for KNL Usage Modes When Data Doesn’t Fit in MCDRAM · pdf, pdf

C
Cai, Binlei · moreLess Provisioning: A Fine-Grained Resource Scaling Engine for Long-Running Services with Tail Latency Guarantees · pdf, pdf
Cai, Wentong · moreIndex Shard Replication Strategies for Improving Resource Utilization in Large Scale Search Engines · pdf, pdf
Canon, Louis-Claude · moreA Generic Approach to Scheduling and Checkpointing Workflows · pdf, pdf
Cao, Liangliang · moreMatrix Factorization on GPUs with Memory Optimization and Approximate Computing · pdf, pdf
Cao, Qiang · moreFFS-VA: A Fast Filtering System for Large-scale Video Analytics · pdf, pdf
Cao, Xuan · moreIndex Shard Replication Strategies for Improving Resource Utilization in Large Scale Search Engines · pdf, pdf
Chan, Yuandong · moreSPECTR: Scalable Parallel Short Read Error Correction on Multi-core and Many-core Architectures · pdf, pdf
Chang, Chun-Kai · moreCharacterizing the Impact of Soft Errors Affecting Floating-point ALUs using RTL-level Fault Injection · pdf, pdf
Chang, Shiyu · moreMatrix Factorization on GPUs with Memory Optimization and Approximate Computing · pdf, pdf
Chantzialexiou, George · moreTask-parallel Analysis of Molecular Dynamics Trajectories · pdf, pdf
Che, Yulin · moreParallelizing Pruning-based Graph Structural Clustering · pdf, pdf
Cheatham, Thomas E. · moreTask-parallel Analysis of Molecular Dynamics Trajectories · pdf, pdf
Chen, Fei · moreDual-Paradigm Stream Processing · pdf, pdf
Chen, Guihai · moreCharging Task Scheduling for Directional Wireless Charger Networks · pdf, pdf
IS-ASGD: Accelerating Asynchronous SGD using Importance Sampling · pdf, pdf
Heterogeneous Wireless Charger Placement with Obstacles · pdf, pdf
Chen, Hong · moreClick-Based Asynchronous Mesh Network with Bounded Bundled Data · pdf, pdf
Chen, Jianxi · moreA Write-efficient and Consistent Hashing Scheme for Non-Volatile Memory · pdf, pdf
Chen, Ren · moreC-Graph: A Highly Efficient Concurrent Graph Reachability Query Framework · pdf, pdf
Chen, Xinyu · moreKeyBin2: Distributed Clustering for Scalable and In-situ Analysis · pdf, pdf, pdf, pdf
KeyBin2: Distributed Clustering for Scalable and In-Situ Analysis · pdf, pdf
Chen, Yang · moreNFV Middlebox Placement with Balanced Set-up Cost and Bandwidth Consumption · pdf, pdf
Chen, Yifeng · moreDelta-Stepping Synchronous Parallel Model · pdf, pdf, pdf, pdf
Chen, Yong · moreExploring Memory Coalescing for 3D-Stacked Hybrid Memory Cube · pdf, pdf, pdf, pdf
Memory Coalescing for Hybrid Memory Cube · pdf, pdf
Cheng, Bin · moreEfficient SSD Caching by Avoiding Unnecessary Writes using Machine Learning · pdf, pdf
Cheng, Dazhao · moreReference-distance Eviction and Prefetching for Cache Management in Spark · pdf, pdf
Joint Optimization of MapReduce Scheduling and Network Policy in Hierarchical Clouds · pdf, pdf
Cheng, Wenxue · morePower Efficient High Performance Packet I/O · pdf, pdf
Cheng, Yongli · moreHUS-Graph: I/O-Efficient Out-of-Core Graph Processing with Hybrid Update Strategy · pdf, pdf
Childers, Bruce · moreCGAcc: CSR-based Graph Traversal Accelerator on HMC · pdf, pdf, pdf, pdf
Colella, Phillip · moreA Low-Communication Method to Solve Poisson's Equation on Locally-Structured Grids · pdf, pdf, pdf, pdf
COLOMBET, Laurent · moreCombining Task-based Parallelism and Adaptive Mesh Refinement Techniques in Molecular Dynamics Simulations · pdf, pdf
Cotton, Ronald · moreWebNN: A Distributed Framework for Deep Learning · pdf, pdf, pdf, pdf
Cui, Chang · moreDelta-Stepping Synchronous Parallel Model · pdf, pdf, pdf, pdf
Curtis-Maury, Matthew · moreEfficient Search for Free Blocks in the WAFL File System · pdf, pdf

D
Dai, Haipeng · moreCharging Task Scheduling for Directional Wireless Charger Networks · pdf, pdf
Cache Assisted Randomized Sharing Counters in Network Measurement · pdf, pdf
Heterogeneous Wireless Charger Placement with Obstacles · pdf, pdf
Dang, Hoang-Vu · moreFast and generic concurrent message-passing · pdf, pdf, pdf, pdf
FULT: Fast User-Level Thread Scheduling Using Bit-Vectors · pdf, pdf
Dash, Sajal · moreIdentifying Carcinogenic Multi-hit Combinations usingWeighted Set Cover Algorithm · pdf, pdf, pdf, pdf
Davis, Eddie C. · moreAbstractions for Specifying Sparse Matrix Data Transformations · pdf, pdf, pdf, pdf
Davis, Timothy A. · moreA Multilevel Subtree Method for Single and Batched Sparse Cholesky Factorization · pdf, pdf
de Araujo, Joăo Paulo · moreA Communication-Efficient Causal Broadcast Protocol · pdf, pdf
DeBardeleben, Nathan · moreModeling Application Resilience in Large-scale Parallel Execution · pdf, pdf
Dechev, Damian · moreIntegrating Low-latency Analysis into HPC System Monitoring · pdf, pdf
Dehnavi, Maryam Mehri · moreCSTF: Large-Scale Sparse Tensor Factorizations on Distributed Platforms · pdf, pdf
Demmel, James · moreImageNet Training in Minutes · pdf, pdf
Reducing Communication in Proximal Newton Methods for Sparse Least Squares Problems · pdf, pdf
Devarakonda, Aditya · moreReducing Communication in Proximal Newton Methods for Sparse Least Squares Problems · pdf, pdf
Devine, Thomas R. · moreScalable Solutions for Automated Single Pulse Identification and Classification in Radio Astronomy · pdf, pdf
Dinan, James · moreEfficient Runtime Support for a Partitioned Global Logical Address Space · pdf, pdf
Dong, Wenqian · moreModeling Application Resilience in Large-scale Parallel Execution · pdf, pdf
Dosanjh, Matthew · moreImproving MPI Multi-threaded RMA Communication Performance · pdf, pdf
Dosanjh, Matthew G. F. · moreThe Case for Semi-Permanent Cache Occupancy · pdf, pdf
Dou, Wanchun · moreHeterogeneous Wireless Charger Placement with Obstacles · pdf, pdf
Duriakova, Erika · moreAn Empirical Comparison of k-Shortest Simple Path Algorithms on Multicores · pdf, pdf

E
Erez, Mattan · moreCharacterizing the Impact of Soft Errors Affecting Floating-point ALUs using RTL-level Fault Injection · pdf, pdf
Estrada, Trilce · moreKeyBin2: Distributed Clustering for Scalable and In-situ Analysis · pdf, pdf, pdf, pdf
KeyBin2: Distributed Clustering for Scalable and In-Situ Analysis · pdf, pdf
Eyraud-Dubois, Lionel · moreUsing Static Allocation Algorithms for Matrix Matrix Multiplication on Multicores and GPUs · pdf, pdf

F
Faizian, Peyman · moreLoad-Balanced Slim Fly Networks · pdf, pdf
Feng, Dan · moreHUS-Graph: I/O-Efficient Out-of-Core Graph Processing with Hybrid Update Strategy · pdf, pdf
A Write-efficient and Consistent Hashing Scheme for Non-Volatile Memory · pdf, pdf
Feng, Guangbo · moreClick-Based Asynchronous Mesh Network with Bounded Bundled Data · pdf, pdf
Feng, Wu-chun · moreIdentifying Carcinogenic Multi-hit Combinations usingWeighted Set Cover Algorithm · pdf, pdf, pdf, pdf
I/O Bottleneck Investigation in Deep Learning Systems · pdf, pdf, pdf, pdf
A Framework for Auto-Parallelization and Code Generation: An Integrative Case Study with Legacy FORTRAN Codes · pdf, pdf
Feng, Yangde · moreMassively Scaling the Metal Microscopic Damage Simulation on Sunway TaihuLight Supercomputer · pdf, pdf
Fernandez, Alvaro · morePerformance Improvements of an Event Index Distributed System · pdf, pdf, pdf, pdf
Figiela, Kamil · morePerformance evaluation of parallel cloud functions · pdf, pdf, pdf, pdf
Fong, Liana · moreMatrix Factorization on GPUs with Memory Optimization and Approximate Computing · pdf, pdf
Fox, Geoffrey C. · moreTask-parallel Analysis of Molecular Dynamics Trajectories · pdf, pdf
Franchetti, Franz · moreAlgorithm Design for Large Scale FFT-Based Simulations on CPU-GPU Platforms · pdf, pdf, pdf, pdf
Fu, Hao · moreGLP4NN: A Convergence-invariant and Network-agnostic Light-weight Parallelization Framework for Deep Neural Networks on Modern GPUs · pdf, pdf
Fu, Haohuan · moreA Fast Sparse Triangular Solver for Structured-grid Problems on Sunway Many-core Processor SW26010 · pdf, pdf
Fu, Mandi · moreA Write-efficient and Consistent Hashing Scheme for Non-Volatile Memory · pdf, pdf
Fu, Song · moreIn-Depth Reliability Characterization of NAND Flash based Solid State Drives in High Performance Computing Systems · pdf, pdf, pdf, pdf

G
Gadou, Mohamed · moreA Multilevel Subtree Method for Single and Batched Sparse Cholesky Factorization · pdf, pdf
Gamblin, Todd · morePRIONN: Predicting Runtime and IO using Neural Networks · pdf, pdf
Gan, Lin · moreA Fast Sparse Triangular Solver for Structured-grid Problems on Sunway Many-core Processor SW26010 · pdf, pdf
Gao, Chuansong · moreIndex Shard Replication Strategies for Improving Resource Utilization in Large Scale Search Engines · pdf, pdf
Gao, Ping · moreSPECTR: Scalable Parallel Short Read Error Correction on Multi-core and Many-core Architectures · pdf, pdf
Gao, Xiaofeng · moreIS-ASGD: Accelerating Asynchronous SGD using Importance Sampling · pdf, pdf
Garner, Harold · moreIdentifying Carcinogenic Multi-hit Combinations usingWeighted Set Cover Algorithm · pdf, pdf, pdf, pdf
Geng, Guanhui · moreIndex Shard Replication Strategies for Improving Resource Utilization in Large Scale Search Engines · pdf, pdf
Geng, Xin · moreLearning Driven Parallelization for Large-Scale Video Workload in Hybrid CPU-GPU Cluster · pdf, pdf
Gentile, Ann · moreIntegrating Low-latency Analysis into HPC System Monitoring · pdf, pdf
Gerndt, Michael · moreExploiting Inter-Phase Application Dynamism to Auto-Tune HPC Applications for Energy-Efficiency · pdf, pdf, pdf, pdf
Ghazimirsaeed, S. Mahdieh · moreThe Case for Semi-Permanent Cache Occupancy · pdf, pdf
Glantz, Roland · moreTopology-induced Enhancement of Mappings · pdf, pdf
Glick, Ben · moreAn Extensible Ecosystem of Tools Providing User Friendly HPC Access and Supporting Jupyter Notebooks · pdf, pdf, pdf, pdf
Goin, Aaron · moreWebNN: A Distributed Framework for Deep Learning · pdf, pdf, pdf, pdf
Gonzalez, Santiago · morePerformance Improvements of an Event Index Distributed System · pdf, pdf, pdf, pdf
Goseva-Popstojanova, Katerina · moreScalable Solutions for Automated Single Pulse Identification and Classification in Radio Astronomy · pdf, pdf
Grant, Ryan · moreImproving MPI Multi-threaded RMA Communication Performance · pdf, pdf
Grant, Ryan E. · moreThe Case for Semi-Permanent Cache Occupancy · pdf, pdf
Groves, Taylor · moreImproving MPI Multi-threaded RMA Communication Performance · pdf, pdf
Gu, Lin · moreDual-Paradigm Stream Processing · pdf, pdf
Guan, Qiang · moreModeling Application Resilience in Large-scale Parallel Execution · pdf, pdf
Guo, Deke · moreDAG-SFC: Minimize the Embedding Cost of SFC with Parallel VNFs · pdf, pdf
Guo, Hui · moreDSAP: Data Structure-Aware Prefetching for Breadth First Search on GPU · pdf, pdf, pdf, pdf
Guo, Song · moreran-GJS: Orchestrating Data Analytics for Heterogeneous Geo-distributed Edges · pdf, pdf
Gurbuzbalaban, Mert · moreReducing Communication in Proximal Newton Methods for Sparse Least Squares Problems · pdf, pdf

H
Hall, Mary · moreBringing Sparse Computations into the Optimization Light · view
Abstractions for Specifying Sparse Matrix Data Transformations · pdf, pdf, pdf, pdf
Hammond, Simon D. · moreOptimizing for KNL Usage Modes When Data Doesn’t Fit in MCDRAM · pdf, pdf
Han, Li · moreA Generic Approach to Scheduling and Checkpointing Workflows · pdf, pdf
Hanawa, Toshihiro · moreToward Footprint-Aware Power Shifting for Hybrid Memory Based Systems · pdf, pdf, pdf, pdf
Hassan, Ahmed · moreNemo: NUMA-aware Concurrency Control for Scalable Transactional Memory · pdf, pdf
He, Anping · moreClick-Based Asynchronous Mesh Network with Bounded Bundled Data · pdf, pdf
He, Bingsheng · moreEnergy-Efficient Speculative Execution using Advanced Reservation for Heterogeneous Clusters · pdf, pdf
GLP4NN: A Convergence-invariant and Network-agnostic Light-weight Parallelization Framework for Deep Neural Networks on Modern GPUs · pdf, pdf
Hedden, Brandon · moreA Comprehensive Study on Bugs in Actor Systems · pdf, pdf
Hei, Yong · moreClick-Based Asynchronous Mesh Network with Bounded Bundled Data · pdf, pdf
Herbein, Stephen · morePRIONN: Predicting Runtime and IO using Neural Networks · pdf, pdf
Hiebel, Jason · moreUtilization of Random Profiling for System Modeling and Dynamic Configuration · pdf, pdf, pdf, pdf
Constructing Dynamic Policies for Paging Mode Selection · pdf, pdf
Hjelm, Nathan · moreImproving MPI Multi-threaded RMA Communication Performance · pdf, pdf
Hoffmann, Henry · moreEnergy-efficient Application Resource Scheduling using Machine Learning Classifiers · pdf, pdf
Performance & Energy Tradeoffs for Dependent Distributed Applications Under System-wide Power Caps · pdf, pdf
Hofmeyr, Steven · moreEnergy-efficient Application Resource Scheduling using Machine Learning Classifiers · pdf, pdf
Hovland, Paul · moreVectorised Computation of Diverging Ensembles · pdf, pdf
Hsieh, Cho-Jui · moreImageNet Training in Minutes · pdf, pdf
Hu, Changjun · moreMassively Scaling the Metal Microscopic Damage Simulation on Sunway TaihuLight Supercomputer · pdf, pdf
Hu, Kan · moreDisk Failure Prediction in Data Centers via Online Learning · pdf, pdf
Hua, Yu · moreA Write-efficient and Consistent Hashing Scheme for Non-Volatile Memory · pdf, pdf
Huang, Libo · moreDSAP: Data Structure-Aware Prefetching for Breadth First Search on GPU · pdf, pdf, pdf, pdf
CGAcc: CSR-based Graph Traversal Accelerator on HMC · pdf, pdf, pdf, pdf
Huang, Ping · moreEfficient SSD Caching by Avoiding Unnecessary Writes using Machine Learning · pdf, pdf
Huang, Zhenyu · morePBCS: An Efficient Parallel Characteristic Set Method for Solving Boolean Polynomial Systems · pdf, pdf
Hughey, Stephen M. · moreOptimization of the Spherical Harmonics Transform based Tree Traversals in the Helmholtz FMM Algorithm · pdf, pdf
Hurley, Neil · moreAn Empirical Comparison of k-Shortest Simple Path Algorithms on Multicores · pdf, pdf
Hückelheim, Jan · moreVectorised Computation of Diverging Ensembles · pdf, pdf

I
Ibrahim, Shadi · moreEnergy-Efficient Speculative Execution using Advanced Reservation for Heterogeneous Clusters · pdf, pdf
Dual-Paradigm Stream Processing · pdf, pdf
Imes, Connor · moreEnergy-efficient Application Resource Scheduling using Machine Learning Classifiers · pdf, pdf
Izadpanah, Ramin · moreIntegrating Low-latency Analysis into HPC System Monitoring · pdf, pdf

J
Jain, Nikhil · moreInterference between I/O and MPI Traffic on Fat-tree Networks · pdf, pdf
Jannesari, Ali · moreUnveiling Thread Communication Bottlenecks Using Hardware-Independent Metrics · pdf, pdf
Javidi Kishi, Masoomeh · moreNemo: NUMA-aware Concurrency Control for Scalable Transactional Memory · pdf, pdf
Jessup, Elizabeth · moreIterative Solver Selection Techniques for Sparse Linear Systems · pdf, pdf, pdf, pdf
Jha, Shantenu · moreTask-parallel Analysis of Molecular Dynamics Trajectories · pdf, pdf
Jia, Xiaoying · moreRevisiting Multi-pass Scatter and Gather on GPUs · pdf, pdf
Jiang, Hong · moreHUS-Graph: I/O-Efficient Out-of-Core Graph Processing with Hybrid Update Strategy · pdf, pdf
Leverage Redundancy in Hardware Transactional Memory to Improve Cache Reliability · pdf, pdf
Jiang, Linhua · moreToward Performant and Energy-efficient Queries in Three-tier Wireless Sensor Networks · pdf, pdf
Jin, Hai · moreDisk Failure Prediction in Data Centers via Online Learning · pdf, pdf
Dual-Paradigm Stream Processing · pdf, pdf
Jin, Yibo · moreran-GJS: Orchestrating Data Analytics for Heterogeneous Geo-distributed Edges · pdf, pdf

K
Kalikar, Saurabh · moreInterval based Framework for Locking in Hierarchies · pdf, pdf, pdf, pdf
NumLock: Towards Optimal Multi-Granularity Locking in Hierarchies · pdf, pdf
Kannan, Ramakrishnan · morePartitioning and Communication Strategies for Sparse Non-negative Matrix Factorization · pdf, pdf
Karavanic, Karen L. · morePerformance Analysis of DroughtHPC and Holistic HPC Workflows · pdf, pdf, pdf, pdf
Kavouklis, Christos · moreA Low-Communication Method to Solve Poisson's Equation on Locally-Structured Grids · pdf, pdf, pdf, pdf
Kaya, Oguz · morePartitioning and Communication Strategies for Sparse Non-negative Matrix Factorization · pdf, pdf
Kerola, Teemu · moreLinear Time Sorting for Large Data Sets with Specialized Processor · pdf, pdf, pdf, pdf
Kesavan, Ram · moreEfficient Search for Free Blocks in the WAFL File System · pdf, pdf
Keutzer, Kurt · moreImageNet Training in Minutes · pdf, pdf
Khoshlessan, Mahzad · moreTask-parallel Analysis of Molecular Dynamics Trajectories · pdf, pdf
Kinney, Nick · moreIdentifying Carcinogenic Multi-hit Combinations usingWeighted Set Cover Algorithm · pdf, pdf, pdf, pdf
Kobus, Robin · moreSPECTR: Scalable Parallel Short Read Error Correction on Multi-core and Many-core Architectures · pdf, pdf
Kocoloski, Brian · moreVarbench: an Experimental Framework to Measure and Characterize Performance Variability · pdf, pdf
Kogge, Peter M. · moreOptimizing for KNL Usage Modes When Data Doesn’t Fit in MCDRAM · pdf, pdf
Kovacevic, Jelena · moreAlgorithm Design for Large Scale FFT-Based Simulations on CPU-GPU Platforms · pdf, pdf, pdf, pdf
Krishnamoorthy, Sriram · moreCharacterizing the Impact of Soft Errors Affecting Floating-point ALUs using RTL-level Fault Injection · pdf, pdf
Krommydas, Konstantinos · moreA Framework for Auto-Parallelization and Code Generation: An Integrative Case Study with Legacy FORTRAN Codes · pdf, pdf
Kulkarni, Anuva · moreAlgorithm Design for Large Scale FFT-Based Simulations on CPU-GPU Platforms · pdf, pdf, pdf, pdf
Kumar, Nalini · moreScalable Behavioral Emulation of Extreme-Scale Systems Using Structural Simulation Toolkit · pdf, pdf
Kumaraswamy, Madhura · moreExploiting Inter-Phase Application Dynamism to Auto-Tune HPC Applications for Energy-Efficiency · pdf, pdf, pdf, pdf

L
Lai, Zhuohang · moreRevisiting Multi-pass Scatter and Gather on GPUs · pdf, pdf
Lam, Herman · moreScalable Behavioral Emulation of Extreme-Scale Systems Using Structural Simulation Toolkit · pdf, pdf
Lambert, Thomas · moreUsing Static Allocation Algorithms for Matrix Matrix Multiplication on Multicores and GPUs · pdf, pdf
Lange, John · moreVarbench: an Experimental Framework to Measure and Characterize Performance Variability · pdf, pdf
Larkins, D. Brian · moreEfficient Runtime Support for a Partitioned Global Logical Address Space · pdf, pdf
Le Fčvre, Valentin · moreA Generic Approach to Scheduling and Checkpointing Workflows · pdf, pdf
Lee, Patrick P. C. · moreCross-Rack-Aware Updates in Erasure-Coded Data Centers · pdf, pdf
Leidel, John D. · moreExploring Memory Coalescing for 3D-Stacked Hybrid Memory Cube · pdf, pdf, pdf, pdf
Memory Coalescing for Hybrid Memory Cube · pdf, pdf
Levenhagen, Michael J. · moreThe Case for Semi-Permanent Cache Occupancy · pdf, pdf
Li, Cheng · moreMatrix Factorization on GPUs with Memory Optimization and Approximate Computing · pdf, pdf
Li, Dong · moreModeling Application Resilience in Large-scale Parallel Execution · pdf, pdf
Li, Jianjiang · moreMassively Scaling the Metal Microscopic Damage Simulation on Sunway TaihuLight Supercomputer · pdf, pdf
Li, Jin cai · morePBCS: An Efficient Parallel Characteristic Set Method for Solving Boolean Polynomial Systems · pdf, pdf
Li, Kenli · moreUHCL-Darknet: An OpenCL-based Deep Neural Network Framework for Heterogeneous Multi-/Many-core Clusters · pdf, pdf
Li, Keqin · moreUHCL-Darknet: An OpenCL-based Deep Neural Network Framework for Heterogeneous Multi-/Many-core Clusters · pdf, pdf
Li, Keqiu · moreLess Provisioning: A Fine-Grained Resource Scaling Engine for Long-Running Services with Tail Latency Guarantees · pdf, pdf
Li, Kun · moreCommunication-Avoiding for Dynamical Core of Atmospheric General Circulation Model · pdf, pdf
Li, Leisheng · moreBandwidth Reduced Parallel SpMV on the SW26010 Many-Core Platform · pdf, pdf
Li, Minghui · moreIndex Shard Replication Strategies for Improving Resource Utilization in Large Scale Search Engines · pdf, pdf
Li, Pengfei · moreClick-Based Asynchronous Mesh Network with Bounded Bundled Data · pdf, pdf
Li, Qi · moreCache Assisted Randomized Sharing Counters in Network Measurement · pdf, pdf
Li, Qiong · moreDuchy: Achieving Both SSD Durability and Controllable SMR Cleaning Overhead in Hybrid Storage Systems · pdf, pdf
Li, Shigang · moreCommunication-Avoiding for Dynamical Core of Atmospheric General Circulation Model · pdf, pdf
Massively Scaling the Metal Microscopic Damage Simulation on Sunway TaihuLight Supercomputer · pdf, pdf
Li, Tonglin · moreToward Performant and Energy-efficient Queries in Three-tier Wireless Sensor Networks · pdf, pdf
Li, Xiaoyong · morePBCS: An Efficient Parallel Characteristic Set Method for Solving Boolean Polynomial Systems · pdf, pdf
Li, Xueqi · moreAccelerating FM-index Search for Genomic Data Processing · pdf, pdf
Li, Xuesong · morePower Efficient High Performance Packet I/O · pdf, pdf
Li, Yusen · moreIndex Shard Replication Strategies for Improving Resource Utilization in Large Scale Search Engines · pdf, pdf
Li, Zhenhua · moreH2Cloud: Maintaining the Whole Filesystem in an Object Storage Cloud · pdf, pdf
Li, Zhenyu · moreH2Cloud: Maintaining the Whole Filesystem in an Object Storage Cloud · pdf, pdf
Liang, Shuwen · moreIn-Depth Reliability Characterization of NAND Flash based Solid State Drives in High Performance Computing Systems · pdf, pdf, pdf, pdf
Liao, Longlong · moreUHCL-Darknet: An OpenCL-based Deep Neural Network Framework for Heterogeneous Multi-/Many-core Clusters · pdf, pdf
Lim, Robert · moreEfficient Matching of GPU Kernel Subgraphs · pdf, pdf, pdf, pdf
Lin, Xu · moreDAG-SFC: Minimize the Embedding Cost of SFC with Parallel VNFs · pdf, pdf
Lingg, Michael P. · moreOptimization of the Spherical Harmonics Transform based Tree Traversals in the Helmholtz FMM Algorithm · pdf, pdf
Liu, Alex X. · moreCharging Task Scheduling for Directional Wireless Charger Networks · pdf, pdf
Cache Assisted Randomized Sharing Counters in Network Measurement · pdf, pdf
Liu, Bangtian · moreCSTF: Large-Scale Sparse Tensor Factorizations on Distributed Platforms · pdf, pdf
Liu, Qian · moreCache Assisted Randomized Sharing Counters in Network Measurement · pdf, pdf
Liu, Weiguo · moreSPECTR: Scalable Parallel Short Read Error Correction on Multi-core and Many-core Architectures · pdf, pdf
Liu, Xiaoguang · moreIndex Shard Replication Strategies for Improving Resource Utilization in Large Scale Search Engines · pdf, pdf
Liu, Zhiyi · moreDual-Paradigm Stream Processing · pdf, pdf
Lu, Sanglu · moreran-GJS: Orchestrating Data Analytics for Heterogeneous Geo-distributed Edges · pdf, pdf
Luckow, Andre · moreTask-parallel Analysis of Molecular Dynamics Trajectories · pdf, pdf
Luo, Qiong · moreParallelizing Pruning-based Graph Structural Clustering · pdf, pdf
Revisiting Multi-pass Scatter and Gather on GPUs · pdf, pdf
Lv, Yashuai · moreDSAP: Data Structure-Aware Prefetching for Breadth First Search on GPU · pdf, pdf, pdf, pdf

M
Ma, Huadong · moreLearning Driven Parallelization for Large-Scale Video Workload in Hybrid CPU-GPU Cluster · pdf, pdf
Ma, Sheng · moreDSAP: Data Structure-Aware Prefetching for Breadth First Search on GPU · pdf, pdf, pdf, pdf
Mache, Jens · moreAn Extensible Ecosystem of Tools Providing User Friendly HPC Access and Supporting Jupyter Notebooks · pdf, pdf, pdf, pdf
Malawski, Maciej · morePerformance evaluation of parallel cloud functions · pdf, pdf, pdf, pdf
Malony, Allen · moreWelcome and Introduction · view
Marquet, Kevin · moreNumaMMA: NUMA MeMory Analyzer · pdf, pdf
Matsuoka, Satoshi · moreInterference between I/O and MPI Traffic on Fat-tree Networks · pdf, pdf
Mazaheri, Arya · moreUnveiling Thread Communication Bottlenecks Using Hardware-Independent Metrics · pdf, pdf
McCorquodale, Peter · moreA Low-Communication Method to Solve Poisson's Equation on Locally-Structured Grids · pdf, pdf, pdf, pdf
Mehri Dehnavi, Maryam · moreReducing Communication in Proximal Newton Methods for Sparse Least Squares Problems · pdf, pdf
Meng, Xiangxu · moreSPECTR: Scalable Parallel Short Read Error Correction on Multi-core and Many-core Architectures · pdf, pdf
Meyer, Ulrich · moreAn Empirical Comparison of k-Shortest Simple Path Algorithms on Multicores · pdf, pdf
Meyerhenke, Henning · moreTopology-induced Enhancement of Mappings · pdf, pdf
Balanced k-means for Parallel Geometric Partitioning · pdf, pdf
Mills, Richard T. · moreVectorized Parallel Sparse Matrix-Vector Multiplication in PETSc Using AVX-512 · pdf, pdf
Mohamedin, Mohamed · moreNemo: NUMA-aware Concurrency Control for Scalable Transactional Memory · pdf, pdf
Mohammadi, Mahdi S. · moreAbstractions for Specifying Sparse Matrix Data Transformations · pdf, pdf, pdf, pdf
Mollah, Md Atiqul · moreLoad-Balanced Slim Fly Networks · pdf, pdf
MONIL, MOHAMMAD ALAUL HAQUE · moreAdaptive auto-tuning in HPX using APEX · pdf, pdf, pdf, pdf
Monsalve Diaz, Jose Manuel · moreOpenMP 4.5 Implementations: Evaluation & Verification of Offloading Features · pdf, pdf, pdf, pdf
Moody, Adam · morePRIONN: Predicting Runtime and IO using Neural Networks · pdf, pdf
Moradkhani, Hamid · morePerformance Analysis of DroughtHPC and Holistic HPC Workflows · pdf, pdf, pdf, pdf
Morel, Lionel · moreNumaMMA: NUMA MeMory Analyzer · pdf, pdf
Muite, Benson · moreA Computational Investigation of Redistricting Using Simulated Annealing · pdf, pdf, pdf, pdf

N
Naksinehaboon, Nichamon · moreIntegrating Low-latency Analysis into HPC System Monitoring · pdf, pdf
NAMYST, Raymond · moreCombining Task-based Parallelism and Adaptive Mesh Refinement Techniques in Molecular Dynamics Simulations · pdf, pdf
Nandy, Payal · moreAbstractions for Specifying Sparse Matrix Data Transformations · pdf, pdf, pdf, pdf
Narayanan, Sri Hari Krishna · moreVectorised Computation of Diverging Ensembles · pdf, pdf
Nasre, Rupesh · moreInterval based Framework for Locking in Hierarchies · pdf, pdf, pdf, pdf
NumLock: Towards Optimal Multi-Granularity Locking in Hierarchies · pdf, pdf
Neelakantan, Aravind · moreScalable Behavioral Emulation of Extreme-Scale Systems Using Structural Simulation Toolkit · pdf, pdf
Nesterenko, Brandon · moreImproving Resource Utilization through Demand Aware Process Scheduling · pdf, pdf
Nie, Ningming · moreMassively Scaling the Metal Microscopic Damage Simulation on Sunway TaihuLight Supercomputer · pdf, pdf
Norris, Boyana · moreIterative Solver Selection Techniques for Sparse Linear Systems · pdf, pdf, pdf, pdf

O
Olivier, Stephen L. · moreOptimizing for KNL Usage Modes When Data Doesn’t Fit in MCDRAM · pdf, pdf
Olschanowsky, Catherine · moreAbstractions for Specifying Sparse Matrix Data Transformations · pdf, pdf, pdf, pdf
Orduńa, Juan · morePerformance Improvements of an Event Index Distributed System · pdf, pdf, pdf, pdf
Owens, John D. · morePush-Pull on Graphs is Column- and Row-based SpMV Plus Masks · pdf, pdf, pdf, pdf
Implementing Push-Pull Efficiently in GraphBLAS · pdf, pdf

P
P. Duarte Júnior, Elias · moreA Communication-Efficient Causal Broadcast Protocol · pdf, pdf
Palmieri, Roberto · moreNemo: NUMA-aware Concurrency Control for Scalable Transactional Memory · pdf, pdf
Pang, Di · moreScalable Solutions for Automated Single Pulse Identification and Classification in Radio Astronomy · pdf, pdf
Panja, Rintu · moreMND-MST: A Multi-Node Multi-Device Parallel Boruvka's MST Algorithm · pdf, pdf
Parashar, Manish · moreTransforming Science through Cyberinfrastructure · view
Paraskevakos, Ioannis · moreMiddleware for Data Intensive Analytics on HPC · pdf, pdf, pdf, pdf
Task-parallel Analysis of Molecular Dynamics Trajectories · pdf, pdf
Paudel, Anmol · moreA HPC Framework for Big Spatial Data Processing and Analytics · pdf, pdf, pdf, pdf
MPI-Vector-IO: Parallel I/O and Partitioning for Geospatial Vector Data · pdf, pdf
Pawlik, Maciej · morePerformance evaluation of parallel cloud functions · pdf, pdf, pdf, pdf
Peluso, Sebastiano · moreNemo: NUMA-aware Concurrency Control for Scalable Transactional Memory · pdf, pdf
Perarnau, Swann · moreA Performance Model to Execute Workflows on High-Bandwidth-Memory Architectures · pdf, pdf
Peterson, Matt · moreKeyBin2: Distributed Clustering for Scalable and In-Situ Analysis · pdf, pdf
Phan, Tien-Dat · moreEnergy-Efficient Speculative Execution using Advanced Reservation for Heterogeneous Clusters · pdf, pdf
Pottier, Loďc · moreA Performance Model to Execute Workflows on High-Bandwidth-Memory Architectures · pdf, pdf
PRAT, Raphaël · moreCombining Task-based Parallelism and Adaptive Mesh Refinement Techniques in Molecular Dynamics Simulations · pdf, pdf
Predari, Maria · moreTopology-induced Enhancement of Mappings · pdf, pdf
Pumma, Sarunya · moreI/O Bottleneck Investigation in Deep Learning Systems · pdf, pdf, pdf, pdf
Puri, Satish · moreA HPC Framework for Big Spatial Data Processing and Analytics · pdf, pdf, pdf, pdf
MPI-Vector-IO: Parallel I/O and Partitioning for Geospatial Vector Data · pdf, pdf

Q
Qian, Chen · moreH2Cloud: Maintaining the Whole Filesystem in an Object Storage Cloud · pdf, pdf
Qian, Cheng · moreCGAcc: CSR-based Graph Traversal Accelerator on HMC · pdf, pdf, pdf, pdf
Qian, Zhuzhong · moreran-GJS: Orchestrating Data Analytics for Heterogeneous Geo-distributed Edges · pdf, pdf
Qiao, Zhi · moreIn-Depth Reliability Characterization of NAND Flash based Solid State Drives in High Performance Computing Systems · pdf, pdf, pdf, pdf
Qiu, Kun · moreParaPLL: Fast Parallel Shortest-path Distance Query on Large-scale Weighted Graphs · pdf, pdf

R
Rafique, Muhammad · moreCAMPS: Conflict-Aware Memory-Side Prefetching Scheme for Hybrid Memory Cube · pdf, pdf
Rahman, Md Shafayat · moreTopologies and Adaptive Routing on Large-Scale Interconnects · pdf, pdf, pdf, pdf
Load-Balanced Slim Fly Networks · pdf, pdf
Ramaswamy, Ajay · moreScalable Behavioral Emulation of Extreme-Scale Systems Using Structural Simulation Toolkit · pdf, pdf
Rang, Wei · moreJoint Optimization of MapReduce Scheduling and Network Policy in Hierarchical Clouds · pdf, pdf
Ranka, Sanjay · moreA Multilevel Subtree Method for Single and Batched Sparse Cholesky Factorization · pdf, pdf
Rao, Jia · moreImproving Resource Utilization through Demand Aware Process Scheduling · pdf, pdf
Rathnayake, Sunimal · moreCost-Time Performance of Scaling Applications on the Cloud · pdf, pdf, pdf, pdf
Ren, Bangbang · moreDAG-SFC: Minimize the Embedding Cost of SFC with Parallel VNFs · pdf, pdf
Ren, Fengyuan · morePower Efficient High Performance Packet I/O · pdf, pdf
Ren, Xiao li · morePBCS: An Efficient Parallel Characteristic Set Method for Solving Boolean Polynomial Systems · pdf, pdf
Rennich, Steven · moreA Multilevel Subtree Method for Single and Batched Sparse Cholesky Factorization · pdf, pdf
Robert, Yves · moreA Generic Approach to Scheduling and Checkpointing Workflows · pdf, pdf
A Performance Model to Execute Workflows on High-Bandwidth-Memory Architectures · pdf, pdf
Robins, Mark · moreAI and HPC: Challenges and Opportunities · view
Rupp, Karl · moreVectorized Parallel Sparse Matrix-Vector Multiplication in PETSc Using AVX-512 · pdf, pdf

S
Sasanka, Ruchira · moreA Framework for Auto-Parallelization and Code Generation: An Integrative Case Study with Legacy FORTRAN Codes · pdf, pdf
Sathre, Paul · moreA Framework for Auto-Parallelization and Code Generation: An Integrative Case Study with Legacy FORTRAN Codes · pdf, pdf
Savas, Suleyman · moreDesigning Domain-Specific Heterogenous Manycores from Dataflow Programs · pdf, pdf, pdf, pdf
Schickedanz, Alexander · moreAn Empirical Comparison of k-Shortest Simple Path Algorithms on Multicores · pdf, pdf
Schmidt, Bertil · moreSPECTR: Scalable Parallel Short Read Error Correction on Multi-core and Many-core Architectures · pdf, pdf
Massively Parallel Huffman Decoding on GPUs · pdf, pdf
Schonbein, Whit · moreThe Case for Semi-Permanent Cache Occupancy · pdf, pdf
Schulz, Martin · moreToward Footprint-Aware Power Shifting for Hybrid Memory Based Systems · pdf, pdf, pdf, pdf
Interference between I/O and MPI Traffic on Fat-tree Networks · pdf, pdf
Selva, Manuel · moreNumaMMA: NUMA MeMory Analyzer · pdf, pdf
Sens, Pierre · moreA Communication-Efficient Causal Broadcast Protocol · pdf, pdf
Seth, Sharad · moreLeverage Redundancy in Hardware Transactional Memory to Improve Cache Reliability · pdf, pdf
Shaik, Shehenaz · moreResource and Service Management in Fog Computing · pdf, pdf, pdf, pdf
Shen, Yulong · moreDAG-SFC: Minimize the Embedding Cost of SFC with Parallel VNFs · pdf, pdf
Shen, Zhirong · moreCross-Rack-Aware Updates in Erasure-Coded Data Centers · pdf, pdf
Shi, Weisong · moreIn-Depth Reliability Characterization of NAND Flash based Solid State Drives in High Performance Computing Systems · pdf, pdf, pdf, pdf
Si, Min · moreI/O Bottleneck Investigation in Deep Learning Systems · pdf, pdf, pdf, pdf
Smith, Barry F. · moreVectorized Parallel Sparse Matrix-Vector Multiplication in PETSc Using AVX-512 · pdf, pdf
Snir, Marc · moreFast and generic concurrent message-passing · pdf, pdf, pdf, pdf
FULT: Fast User-Level Thread Scheduling Using Bit-Vectors · pdf, pdf
Snyder, John · moreEfficient Runtime Support for a Partitioned Global Logical Address Space · pdf, pdf
Song, Jun qiang · morePBCS: An Efficient Parallel Characteristic Set Method for Solving Boolean Polynomial Systems · pdf, pdf
Sood, Kanika · moreIterative Solver Selection Techniques for Sparse Linear Systems · pdf, pdf, pdf, pdf
Soori, Saeed · moreReducing Communication in Proximal Newton Methods for Sparse Least Squares Problems · pdf, pdf
Srisa-an, Witawas · moreLeverage Redundancy in Hardware Transactional Memory to Improve Cache Reliability · pdf, pdf
Stitt, Greg · moreScalable Behavioral Emulation of Extreme-Scale Systems Using Structural Simulation Toolkit · pdf, pdf
Strout, Michelle · moreAbstractions for Specifying Sparse Matrix Data Transformations · pdf, pdf, pdf, pdf
subasi, omer · moreCharacterizing the Impact of Soft Errors Affecting Floating-point ALUs using RTL-level Fault Injection · pdf, pdf
Sun, Jizhou · moreGLP4NN: A Convergence-invariant and Network-agnostic Light-weight Parallelization Framework for Deep Neural Networks on Modern GPUs · pdf, pdf
Sun, Ke · moreCharging Task Scheduling for Directional Wireless Charger Networks · pdf, pdf
Sun, Ninghui · moreAccelerating FM-index Search for Genomic Data Processing · pdf, pdf
Sun, Qiao · moreBandwidth Reduced Parallel SpMV on the SW26010 Many-Core Platform · pdf, pdf
Sun, Shixuan · moreParallelizing Pruning-based Graph Structural Clustering · pdf, pdf
Suriyakumar, Yasodha · morePerformance Analysis of DroughtHPC and Holistic HPC Workflows · pdf, pdf, pdf, pdf

T
Tan, Guangming · moreCommunication-Avoiding for Dynamical Core of Atmospheric General Circulation Model · pdf, pdf
Accelerating FM-index Search for Genomic Data Processing · pdf, pdf
Tan, Wei · moreMatrix Factorization on GPUs with Memory Optimization and Approximate Computing · pdf, pdf
Tan, Yujuan · moreLeverage Redundancy in Hardware Transactional Memory to Improve Cache Reliability · pdf, pdf
Tang, Bingchang · moreLearning Driven Parallelization for Large-Scale Video Workload in Hybrid CPU-GPU Cluster · pdf, pdf
Tang, Guoming · moreDAG-SFC: Minimize the Embedding Cost of SFC with Parallel VNFs · pdf, pdf
Tang, Meng · moreA Multilevel Subtree Method for Single and Batched Sparse Cholesky Factorization · pdf, pdf
Tang, Shanjiang · moreGLP4NN: A Convergence-invariant and Network-agnostic Light-weight Parallelization Framework for Deep Neural Networks on Modern GPUs · pdf, pdf
Tang, Xueyan · moreIndex Shard Replication Strategies for Improving Resource Utilization in Large Scale Search Engines · pdf, pdf
Taufer, Michela · moreKeyBin2: Distributed Clustering for Scalable and In-Situ Analysis · pdf, pdf
PRIONN: Predicting Runtime and IO using Neural Networks · pdf, pdf
Teo, Yong Meng · moreCost-Time Performance of Scaling Applications on the Cloud · pdf, pdf, pdf, pdf
Teodorescu, Radu · moreC-Graph: A Highly Efficient Concurrent Graph Reachability Query Framework · pdf, pdf
Tian, Qi · moreUHCL-Darknet: An OpenCL-based Deep Neural Network Framework for Heterogeneous Multi-/Many-core Clusters · pdf, pdf
Tong, Jiancong · moreIndex Shard Replication Strategies for Improving Resource Utilization in Large Scale Search Engines · pdf, pdf
Trahay, François · moreNumaMMA: NUMA MeMory Analyzer · pdf, pdf
Tyson, Gareth · moreH2Cloud: Maintaining the Whole Filesystem in an Object Storage Cloud · pdf, pdf
Tzovas, Charilaos · moreBalanced k-means for Parallel Geometric Partitioning · pdf, pdf

V
Vadhiyar, Sathish · moreMND-MST: A Multi-Node Multi-Device Parallel Boruvka's MST Algorithm · pdf, pdf
Van Straalen, Brian · moreA Low-Communication Method to Solve Poisson's Equation on Locally-Structured Grids · pdf, pdf, pdf, pdf
Varghese, Robin · moreIdentifying Carcinogenic Multi-hit Combinations usingWeighted Set Cover Algorithm · pdf, pdf, pdf, pdf
Velesko, Paulius · moreVectorised Computation of Diverging Ensembles · pdf, pdf
Vivien, Frédéric · moreA Generic Approach to Scheduling and Checkpointing Workflows · pdf, pdf
von Looz, Moritz · moreBalanced k-means for Parallel Geometric Partitioning · pdf, pdf

W
Wang, Fang · moreHUS-Graph: I/O-Efficient Out-of-Core Graph Processing with Hybrid Update Strategy · pdf, pdf
Wang, Fei · moreIS-ASGD: Accelerating Asynchronous SGD using Importance Sampling · pdf, pdf
Wang, Gang · moreIndex Shard Replication Strategies for Improving Resource Utilization in Large Scale Search Engines · pdf, pdf
Wang, Hua · moreEfficient SSD Caching by Avoiding Unnecessary Writes using Machine Learning · pdf, pdf
Wang, Jiayao · moreToward Performant and Energy-efficient Queries in Three-tier Wireless Sensor Networks · pdf, pdf
Wang, Jue · moreMassively Scaling the Metal Microscopic Damage Simulation on Sunway TaihuLight Supercomputer · pdf, pdf
Wang, Weijun · moreHeterogeneous Wireless Charger Placement with Obstacles · pdf, pdf
Wang, Xi · moreExploring Memory Coalescing for 3D-Stacked Hybrid Memory Cube · pdf, pdf, pdf, pdf
Memory Coalescing for Hybrid Memory Cube · pdf, pdf
Wang, Xiangmeng · moreMassively Scaling the Metal Microscopic Damage Simulation on Sunway TaihuLight Supercomputer · pdf, pdf
Wang, Xiaoliang · moreran-GJS: Orchestrating Data Analytics for Heterogeneous Geo-distributed Edges · pdf, pdf
Wang, Xiaoyu · moreCache Assisted Randomized Sharing Counters in Network Measurement · pdf, pdf
Heterogeneous Wireless Charger Placement with Obstacles · pdf, pdf
Wang, Xin · moreParaPLL: Fast Parallel Shortest-path Distance Query on Large-scale Weighted Graphs · pdf, pdf
Wang, Xinliang · moreA Fast Sparse Triangular Solver for Structured-grid Problems on Sunway Many-core Processor SW26010 · pdf, pdf
Wang, Yuanrong · moreAccelerating FM-index Search for Genomic Data Processing · pdf, pdf
Wang, Zhenlin · moreUtilization of Random Profiling for System Modeling and Dynamic Configuration · pdf, pdf, pdf, pdf
Constructing Dynamic Policies for Paging Mode Selection · pdf, pdf
Wang, Zhiying · moreDSAP: Data Structure-Aware Prefetching for Breadth First Search on GPU · pdf, pdf, pdf, pdf
CGAcc: CSR-based Graph Traversal Accelerator on HMC · pdf, pdf, pdf, pdf
Wang, Zijun · moreMatrix Factorization on GPUs with Memory Optimization and Approximate Computing · pdf, pdf
Weber, Kenneth · moreToward a Multi-GPU Implementation of the Modular Integer GCD Algorithm: Extended Abstract · pdf, pdf, pdf, pdf
Wei, Dengping · moreDuchy: Achieving Both SSD Durability and Controllable SMR Cleaning Overhead in Hybrid Storage Systems · pdf, pdf
Wei, Yanjie · moreSPECTR: Scalable Parallel Short Read Error Correction on Multi-core and Many-core Architectures · pdf, pdf
Weissenberger, Andre · moreMassively Parallel Huffman Decoding on GPUs · pdf, pdf
Wernsman, Robert · moreImproving First Level Cache Efficiency for GPUs Using Dynamic Line Protection · pdf, pdf
Wolf, Felix · moreUnveiling Thread Communication Bottlenecks Using Hardware-Independent Metrics · pdf, pdf
Wolf, Tilman · moreParaPLL: Fast Parallel Shortest-path Distance Query on Large-scale Weighted Graphs · pdf, pdf
Wood, Chad · moreSOSflow: A Scalable Observation System for Introspection and In Situ Analytics · pdf, pdf, pdf, pdf
Wu, Baodong · moreCommunication-Avoiding for Dynamical Core of Atmospheric General Circulation Model · pdf, pdf
Massively Scaling the Metal Microscopic Damage Simulation on Sunway TaihuLight Supercomputer · pdf, pdf
Wu, Changmao · moreBandwidth Reduced Parallel SpMV on the SW26010 Many-Core Platform · pdf, pdf
Wu, Jie · moreNFV Middlebox Placement with Balanced Set-up Cost and Bandwidth Consumption · pdf, pdf
Wu, Kai · moreModeling Application Resilience in Large-scale Parallel Execution · pdf, pdf
Wu, Song · moreDisk Failure Prediction in Data Centers via Online Learning · pdf, pdf
Dual-Paradigm Stream Processing · pdf, pdf
Wu, Xiaobing · moreHeterogeneous Wireless Charger Placement with Obstacles · pdf, pdf
Wyatt, Michael R. · morePRIONN: Predicting Runtime and IO using Neural Networks · pdf, pdf

X
Xia, Yinglong · moreC-Graph: A Highly Efficient Concurrent Graph Reachability Query Framework · pdf, pdf
Xiao, Jiang · moreDisk Failure Prediction in Data Centers via Online Learning · pdf, pdf
Xiao, Junmin · moreCommunication-Avoiding for Dynamical Core of Atmospheric General Circulation Model · pdf, pdf
Xiao, Liquan · moreDuchy: Achieving Both SSD Durability and Controllable SMR Cleaning Overhead in Hybrid Storage Systems · pdf, pdf
Xie, Jing · morePower Efficient High Performance Packet I/O · pdf, pdf
Xie, Xuchao · moreDuchy: Achieving Both SSD Durability and Controllable SMR Cleaning Overhead in Hybrid Storage Systems · pdf, pdf
Xiong, Zhuang · moreDisk Failure Prediction in Data Centers via Online Learning · pdf, pdf
Xu, Kai · moreSPECTR: Scalable Parallel Short Read Error Correction on Multi-core and Many-core Architectures · pdf, pdf
Xu, Ping · moreA Fast Sparse Triangular Solver for Structured-grid Problems on Sunway Many-core Processor SW26010 · pdf, pdf
Xu, Xianghao · moreHUS-Graph: I/O-Efficient Out-of-Core Graph Processing with Hybrid Update Strategy · pdf, pdf
Xue, Wei · moreA Fast Sparse Triangular Solver for Structured-grid Problems on Sunway Many-core Processor SW26010 · pdf, pdf

Y
Yan, Zhichao · moreLeverage Redundancy in Hardware Transactional Memory to Improve Cache Reliability · pdf, pdf
Yang, Bailong · morePower Efficient High Performance Packet I/O · pdf, pdf
Yang, Canqun · moreUHCL-Darknet: An OpenCL-based Deep Neural Network Framework for Heterogeneous Multi-/Many-core Clusters · pdf, pdf
Yang, Carl · morePush-Pull on Graphs is Column- and Row-based SpMV Plus Masks · pdf, pdf, pdf, pdf
Implementing Push-Pull Efficiently in GraphBLAS · pdf, pdf
Yang, Chao · moreA Fast Sparse Triangular Solver for Structured-grid Problems on Sunway Many-core Processor SW26010 · pdf, pdf
Yang, Donglin · moreJoint Optimization of MapReduce Scheduling and Network Policy in Hierarchical Clouds · pdf, pdf
Yang, Guangwen · moreA Fast Sparse Triangular Solver for Structured-grid Problems on Sunway Many-core Processor SW26010 · pdf, pdf
Yang, Tianye · moreDuchy: Achieving Both SSD Durability and Controllable SMR Cleaning Overhead in Hybrid Storage Systems · pdf, pdf
Yao, Erlin · moreCommunication-Avoiding for Dynamical Core of Atmospheric General Circulation Model · pdf, pdf
Ye, Jun · moreIS-ASGD: Accelerating Asynchronous SGD using Importance Sampling · pdf, pdf
Yi, Qing · moreImproving Resource Utilization through Demand Aware Process Scheduling · pdf, pdf
Yi, Xinbo · moreEfficient SSD Caching by Avoiding Unnecessary Writes using Machine Learning · pdf, pdf
Yi, Yusheng · moreDisk Failure Prediction in Data Centers via Online Learning · pdf, pdf
You, Yang · moreImageNet Training in Minutes · pdf, pdf
Yu, Ce · moreGLP4NN: A Convergence-invariant and Network-agnostic Light-weight Parallelization Framework for Deep Neural Networks on Modern GPUs · pdf, pdf
Yu, Hongfeng · moreA Distributed Infomap Algorithm for Scalable and High-Quality Community Detection · pdf, pdf
Yu, Qi · moreDSAP: Data Structure-Aware Prefetching for Breadth First Search on GPU · pdf, pdf, pdf, pdf
CGAcc: CSR-based Graph Traversal Accelerator on HMC · pdf, pdf, pdf, pdf
Yuan, Jing · moreParaPLL: Fast Parallel Shortest-path Distance Query on Large-scale Weighted Graphs · pdf, pdf
Yuan, Xin · moreLoad-Balanced Slim Fly Networks · pdf, pdf

Z
Zambreno, Joseph · moreImproving First Level Cache Efficiency for GPUs Using Dynamic Line Protection · pdf, pdf
Zang, Dawei · moreAccelerating FM-index Search for Genomic Data Processing · pdf, pdf
Zeng, Jianping · moreA Distributed Infomap Algorithm for Scalable and High-Quality Community Detection · pdf, pdf
Zhai, Ennan · moreH2Cloud: Maintaining the Whole Filesystem in an Object Storage Cloud · pdf, pdf
Zhang, Changyou · moreBandwidth Reduced Parallel SpMV on the SW26010 Many-Core Platform · pdf, pdf
Zhang, Chen · moreFFS-VA: A Fast Filtering System for Large-scale Video Analytics · pdf, pdf
Zhang, Haitao · moreLearning Driven Parallelization for Large-Scale Video Workload in Hybrid CPU-GPU Cluster · pdf, pdf
Zhang, He · moreCommunication-Avoiding for Dynamical Core of Atmospheric General Circulation Model · pdf, pdf
Zhang, Hong · moreVectorized Parallel Sparse Matrix-Vector Multiplication in PETSc Using AVX-512 · pdf, pdf
Zhang, Huazhe · morePerformance & Energy Tradeoffs for Dependent Distributed Applications Under System-wide Power Caps · pdf, pdf
Zhang, Jiajia · moreBandwidth Reduced Parallel SpMV on the SW26010 Many-Core Platform · pdf, pdf
Zhang, Jiling · moreClick-Based Asynchronous Mesh Network with Bounded Bundled Data · pdf, pdf
Zhang, Lijun · moreCharging Task Scheduling for Directional Wireless Charger Networks · pdf, pdf
Zhang, Qifei · moreDelta-Stepping Synchronous Parallel Model · pdf, pdf, pdf, pdf
Zhang, Rongqi · moreLess Provisioning: A Fine-Grained Resource Scaling Engine for Long-Running Services with Tail Latency Guarantees · pdf, pdf
Zhang, Sheng · moreran-GJS: Orchestrating Data Analytics for Heterogeneous Geo-distributed Edges · pdf, pdf
Zhang, Tong · morePower Efficient High Performance Packet I/O · pdf, pdf
Zhang, Weidong · moreDelta-Stepping Synchronous Parallel Model · pdf, pdf, pdf, pdf
Zhang, Xiaoyi · moreA Write-efficient and Consistent Hashing Scheme for Non-Volatile Memory · pdf, pdf
Zhang, Yongxuan · moreHUS-Graph: I/O-Efficient Out-of-Core Graph Processing with Hybrid Update Strategy · pdf, pdf
Zhang, Yunquan · moreCommunication-Avoiding for Dynamical Core of Atmospheric General Circulation Model · pdf, pdf
Massively Scaling the Metal Microscopic Damage Simulation on Sunway TaihuLight Supercomputer · pdf, pdf
Zhang, Zhao · moreImageNet Training in Minutes · pdf, pdf
Zhao, Dongfang · moreToward Performant and Energy-efficient Queries in Three-tier Wireless Sensor Networks · pdf, pdf
Zhao, Jin · moreParaPLL: Fast Parallel Shortest-path Distance Query on Large-scale Weighted Graphs · pdf, pdf
Zhao, Juan · morePBCS: An Efficient Parallel Characteristic Set Method for Solving Boolean Polynomial Systems · pdf, pdf
Zhao, Laiping · moreLess Provisioning: A Fine-Grained Resource Scaling Engine for Long-Running Services with Tail Latency Guarantees · pdf, pdf
Zhao, Leiyu · moreH2Cloud: Maintaining the Whole Filesystem in an Object Storage Cloud · pdf, pdf
Zhao, Minghao · moreH2Cloud: Maintaining the Whole Filesystem in an Object Storage Cloud · pdf, pdf
Zhao, Xinghui · moreWebNN: A Distributed Framework for Deep Learning · pdf, pdf, pdf, pdf
A Comprehensive Study on Bugs in Actor Systems · pdf, pdf
Zheng, Jiaqi · moreCharging Task Scheduling for Directional Wireless Charger Networks · pdf, pdf
Cache Assisted Randomized Sharing Counters in Network Measurement · pdf, pdf
Heterogeneous Wireless Charger Placement with Obstacles · pdf, pdf
Zheng, Weimin · moreA Fast Sparse Triangular Solver for Structured-grid Problems on Sunway Many-core Processor SW26010 · pdf, pdf
Zhou, Amelie Chi · moreEnergy-Efficient Speculative Execution using Advanced Reservation for Heterogeneous Clusters · pdf, pdf
Zhou, Ke · moreEfficient SSD Caching by Avoiding Unnecessary Writes using Machine Learning · pdf, pdf
Zhou, Li · moreC-Graph: A Highly Efficient Concurrent Graph Reachability Query Framework · pdf, pdf
Zhou, Xiaobo · moreLeveraging Resource Bottleneck Awareness and Optimizations for Data Analytics Performance · pdf, pdf, pdf, pdf
Reference-distance Eviction and Prefetching for Cache Management in Spark · pdf, pdf
Zhu, Min · morePBCS: An Efficient Parallel Characteristic Set Method for Solving Boolean Polynomial Systems · pdf, pdf
Zhu, Xian · moreImproving First Level Cache Efficiency for GPUs Using Dynamic Line Protection · pdf, pdf
Zhu, Yuanyang · moreParaPLL: Fast Parallel Shortest-path Distance Query on Large-scale Weighted Graphs · pdf, pdf
Zhu, Zhichun · moreCAMPS: Conflict-Aware Memory-Side Prefetching Scheme for Hybrid Memory Cube · pdf, pdf

Created 2018-8-10 9:31