A. Rodrigues, Luiz · more Luiz A. Rodrigues (Western Paraná State University) | A Communication-Efficient Causal Broadcast Protocol · pdf, pdf |
Adhinarayanan, Vignesh · more Vignesh Adhinarayanan (Virginia Tech) | Models and Techniques for Green High-Performance Computing · pdf, pdf, pdf, pdf |
Afsahi, Ahmad · more Ahmad Afsahi (Queen’s University) | The Case for Semi-Permanent Cache Occupancy · pdf, pdf |
Ahn, Dong H. · more Dong H. Ahn (Lawrence Livermore National Laboratory) | PRIONN: Predicting Runtime and IO using Neural Networks · pdf, pdf |
Ajwani, Deepak · more Deepak Ajwani (Nokia Bell Laboratories, Dublin) | An Empirical Comparison of k-Shortest Simple Path Algorithms on Multicores · pdf, pdf |
Aktulga, H. M. · more H. M. Aktulga (Michigan State University) | Optimization of the Spherical Harmonics Transform based Tree Traversals in the Helmholtz FMM Algorithm · pdf, pdf |
Al-Mamun, Abdullah · more Abdullah Al-Mamun (University of Nevada, Reno) | Toward Performant and Energy-efficient Queries in Three-tier Wireless Sensor Networks · pdf, pdf |
Anandakrishnan, Ramu · more Ramu Anandakrishnan (Edward Via College of Osteopathic Medicine, Blacksburg) | Identifying Carcinogenic Multi-hit Combinations usingWeighted Set Cover Algorithm · pdf, pdf, pdf, pdf |
Antoškin, Vjatsešlav · more Vjatsešlav Antoškin (University of Tartu) | A Computational Investigation of Redistricting Using Simulated Annealing · pdf, pdf, pdf, pdf |
Ao, Yulong · more Yulong Ao (School of Mathematical Sciences,Peking University) | A Fast Sparse Triangular Solver for Structured-grid Problems on Sunway Many-core Processor SW26010 · pdf, pdf |
Arantes, Luciana · more Luciana Arantes (Sorbonne Université, CNRS, INRIA, LIP6) | A Communication-Efficient Causal Broadcast Protocol · pdf, pdf |
Arima, Eishi · more Eishi Arima (The University of Tokyo) | Toward Footprint-Aware Power Shifting for Hybrid Memory Based Systems · pdf, pdf, pdf, pdf |
Arnold, Dorian · more Dorian Arnold (Emory University, University of New Mexico) | Improving MPI Multi-threaded RMA Communication Performance · pdf, pdf |
Balaji, Pavan · more Pavan Balaji (Argonne National Laboratory) | I/O Bottleneck Investigation in Deep Learning Systems · pdf, pdf, pdf, pdf |
Ballard, Grey · more Grey Ballard (Wakeforest University) | Partitioning and Communication Strategies for Sparse Non-negative Matrix Factorization · pdf, pdf |
Barreto Goes Perez, Tiago · more Tiago Barreto Goes Perez (University of Colorado Colorado Springs) | Leveraging Resource Bottleneck Awareness and Optimizations for Data Analytics Performance · pdf, pdf, pdf, pdf Reference-distance Eviction and Prefetching for Cache Management in Spark · pdf, pdf |
Baskiyar, Sanjeev · more Sanjeev Baskiyar (Auburn University) | Resource and Service Management in Fog Computing · pdf, pdf, pdf, pdf |
Beckstein, Oliver · more Oliver Beckstein (Arizona State University) | Task-parallel Analysis of Molecular Dynamics Trajectories · pdf, pdf |
Benoit, Anne · more Anne Benoit (ENS Lyon & Inria; Georgia Institute of Technology, Atlanta) | A Performance Model to Execute Workflows on High-Bandwidth-Memory Architectures · pdf, pdf |
Benson, Jeremy · more Jeremy Benson (University of New Mexico) | KeyBin2: Distributed Clustering for Scalable and In-Situ Analysis · pdf, pdf |
Berry, Jonathan W. · more Jonathan W. Berry (Sandia National Labs) | Optimizing for KNL Usage Modes When Data Doesn’t Fit in MCDRAM · pdf, pdf |
Bhatele, Abhinav · more Abhinav Bhatele (Lawrence Livermore National Laboratory) | Interference between I/O and MPI Traffic on Fat-tree Networks · pdf, pdf |
Bhattacharjee, Mrinal · more Mrinal Bhattacharjee (NetApp) | Efficient Search for Free Blocks in the WAFL File System · pdf, pdf |
Blanco, Zachary · more Zachary Blanco (Rutgers University) | CSTF: Large-Scale Sparse Tensor Factorizations on Distributed Platforms · pdf, pdf Reducing Communication in Proximal Newton Methods for Sparse Least Squares Problems · pdf, pdf |
Brandt, Jim · more Jim Brandt (Sandia National Laboratories) | Integrating Low-latency Analysis into HPC System Monitoring · pdf, pdf |
Brew, Justin A. · more Justin A. Brew (University of Mount Union, Department of Computer Science) | Toward a Multi-GPU Implementation of the Modular Integer GCD Algorithm: Extended Abstract · pdf, pdf, pdf, pdf |
Bridges, Patrick · more Patrick Bridges (University of New Mexico) | Improving MPI Multi-threaded RMA Communication Performance · pdf, pdf |
Bridges, Patrick G. · more Patrick G. Bridges (Univeristy of New Mexico) | The Case for Semi-Permanent Cache Occupancy · pdf, pdf |
Brown, Kevin A. · more Kevin A. Brown (Tokyo Institute of Technology, Tokyo Tech/AIST RWBC-Open Innovation Laboratory) | Interference between I/O and MPI Traffic on Fat-tree Networks · pdf, pdf |
Brown, Laura E. · more Laura E. Brown (Michigan Technological University) | Utilization of Random Profiling for System Modeling and Dynamic Configuration · pdf, pdf, pdf, pdf Constructing Dynamic Policies for Paging Mode Selection · pdf, pdf |
Buluc, Aydin · more Aydin Buluc (Lawrence Berkeley National Laboratory) | Push-Pull on Graphs is Column- and Row-based SpMV Plus Masks · pdf, pdf, pdf, pdf Implementing Push-Pull Efficiently in GraphBLAS · pdf, pdf |
Butcher, Neil A. · more Neil A. Butcher (Notre Dame) | Optimizing for KNL Usage Modes When Data Doesn’t Fit in MCDRAM · pdf, pdf |
Cai, Binlei · more Binlei Cai (School of Computer Science and Technology, Tianjin University; Tianjin Key Laboratory of Advanced Networking) | Less Provisioning: A Fine-Grained Resource Scaling Engine for Long-Running Services with Tail Latency Guarantees · pdf, pdf |
Cai, Wentong · more Wentong Cai (Nanyang Technological University) | Index Shard Replication Strategies for Improving Resource Utilization in Large Scale Search Engines · pdf, pdf |
Canon, Louis-Claude · more Louis-Claude Canon (Univ Lyon, CNRS, ENS de Lyon, Inria, Université Claude-Bernard Lyon 1, LIP UMR5668 LYON Cedex 07 France; FEMTO-ST, Université de Bourgogne Franche-Comté, France) | A Generic Approach to Scheduling and Checkpointing Workflows · pdf, pdf |
Cao, Liangliang · more Liangliang Cao (HelloVera.AI) | Matrix Factorization on GPUs with Memory Optimization and Approximate Computing · pdf, pdf |
Cao, Qiang · more Qiang Cao (Huazhong University of Science and Technology) | FFS-VA: A Fast Filtering System for Large-scale Video Analytics · pdf, pdf |
Cao, Xuan · more Xuan Cao (Baidu Inc.) | Index Shard Replication Strategies for Improving Resource Utilization in Large Scale Search Engines · pdf, pdf |
Chan, Yuandong · more Yuandong Chan (School of Software, Shandong University) | SPECTR: Scalable Parallel Short Read Error Correction on Multi-core and Many-core Architectures · pdf, pdf |
Chang, Chun-Kai · more Chun-Kai Chang (The University of Texas at Austin) | Characterizing the Impact of Soft Errors Affecting Floating-point ALUs using RTL-level Fault Injection · pdf, pdf |
Chang, Shiyu · more Shiyu Chang (IBM Research) | Matrix Factorization on GPUs with Memory Optimization and Approximate Computing · pdf, pdf |
Chantzialexiou, George · more George Chantzialexiou (Rutgers University) | Task-parallel Analysis of Molecular Dynamics Trajectories · pdf, pdf |
Che, Yulin · more Yulin Che (Hong Kong University of Science and Technology) | Parallelizing Pruning-based Graph Structural Clustering · pdf, pdf |
Cheatham, Thomas E. · more Thomas E. Cheatham (University of Utah) | Task-parallel Analysis of Molecular Dynamics Trajectories · pdf, pdf |
Chen, Fei · more Fei Chen (Huazhong University of Science & Technology) | Dual-Paradigm Stream Processing · pdf, pdf |
Chen, Guihai · more Guihai Chen (Nanjing University) | Charging Task Scheduling for Directional Wireless Charger Networks · pdf, pdf IS-ASGD: Accelerating Asynchronous SGD using Importance Sampling · pdf, pdf Heterogeneous Wireless Charger Placement with Obstacles · pdf, pdf |
Chen, Hong · more Hong Chen (Institute of Microelectronics, Tsinghua University,Beijing National Research Center for Information Science and Technology) | Click-Based Asynchronous Mesh Network with Bounded Bundled Data · pdf, pdf |
Chen, Jianxi · more Jianxi Chen (Huazhong University of Science and Technology) | A Write-efficient and Consistent Hashing Scheme for Non-Volatile Memory · pdf, pdf |
Chen, Ren · more Ren Chen (Huawei Research America) | C-Graph: A Highly Efficient Concurrent Graph Reachability Query Framework · pdf, pdf |
Chen, Xinyu · more Xinyu Chen (University of New Mexico) | KeyBin2: Distributed Clustering for Scalable and In-situ Analysis · pdf, pdf, pdf, pdf KeyBin2: Distributed Clustering for Scalable and In-Situ Analysis · pdf, pdf |
Chen, Yang · more Yang Chen (Temple University) | NFV Middlebox Placement with Balanced Set-up Cost and Bandwidth Consumption · pdf, pdf |
Chen, Yifeng · more Yifeng Chen (Peking University) | Delta-Stepping Synchronous Parallel Model · pdf, pdf, pdf, pdf |
Chen, Yong · more Yong Chen (Texas Tech University) | Exploring Memory Coalescing for 3D-Stacked Hybrid Memory Cube · pdf, pdf, pdf, pdf Memory Coalescing for Hybrid Memory Cube · pdf, pdf |
Cheng, Bin · more Bin Cheng (Shenzhen Tencent Computer System Co., Ltd.) | Efficient SSD Caching by Avoiding Unnecessary Writes using Machine Learning · pdf, pdf |
Cheng, Dazhao · more Dazhao Cheng (University of North Carolina, Charlotte) | Reference-distance Eviction and Prefetching for Cache Management in Spark · pdf, pdf Joint Optimization of MapReduce Scheduling and Network Policy in Hierarchical Clouds · pdf, pdf |
Cheng, Wenxue · more Wenxue Cheng (Tsinghua University) | Power Efficient High Performance Packet I/O · pdf, pdf |
Cheng, Yongli · more Yongli Cheng (FuZhou University) | HUS-Graph: I/O-Efficient Out-of-Core Graph Processing with Hybrid Update Strategy · pdf, pdf |
Childers, Bruce · more Bruce Childers (University of Pittsburgh) | CGAcc: CSR-based Graph Traversal Accelerator on HMC · pdf, pdf, pdf, pdf |
Colella, Phillip · more Phillip Colella (Lawrence Berkeley National Laboratory) | A Low-Communication Method to Solve Poisson's Equation on Locally-Structured Grids · pdf, pdf, pdf, pdf |
COLOMBET, Laurent · more Laurent COLOMBET (CEA) | Combining Task-based Parallelism and Adaptive Mesh Refinement Techniques in Molecular Dynamics Simulations · pdf, pdf |
Cotton, Ronald · more Ronald Cotton (Washington State University) | WebNN: A Distributed Framework for Deep Learning · pdf, pdf, pdf, pdf |
Cui, Chang · more Chang Cui (Peking University) | Delta-Stepping Synchronous Parallel Model · pdf, pdf, pdf, pdf |
Curtis-Maury, Matthew · more Matthew Curtis-Maury (NetApp) | Efficient Search for Free Blocks in the WAFL File System · pdf, pdf |
Dai, Haipeng · more Haipeng Dai (State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, Jiangsu) | Charging Task Scheduling for Directional Wireless Charger Networks · pdf, pdf Cache Assisted Randomized Sharing Counters in Network Measurement · pdf, pdf Heterogeneous Wireless Charger Placement with Obstacles · pdf, pdf |
Dang, Hoang-Vu · more Hoang-Vu Dang (University of Illinois) | Fast and generic concurrent message-passing · pdf, pdf, pdf, pdf FULT: Fast User-Level Thread Scheduling Using Bit-Vectors · pdf, pdf |
Dash, Sajal · more Sajal Dash (Virginia Tech) | Identifying Carcinogenic Multi-hit Combinations usingWeighted Set Cover Algorithm · pdf, pdf, pdf, pdf |
Davis, Eddie C. · more Eddie C. Davis (Boise State University) | Abstractions for Specifying Sparse Matrix Data Transformations · pdf, pdf, pdf, pdf |
Davis, Timothy A. · more Timothy A. Davis (Texas A&M University) | A Multilevel Subtree Method for Single and Batched Sparse Cholesky Factorization · pdf, pdf |
de Araujo, Joăo Paulo · more Joăo Paulo de Araujo (Sorbonne Université, CNRS, INRIA, LIP6) | A Communication-Efficient Causal Broadcast Protocol · pdf, pdf |
DeBardeleben, Nathan · more Nathan DeBardeleben (Los Alamos National Laboratory) | Modeling Application Resilience in Large-scale Parallel Execution · pdf, pdf |
Dechev, Damian · more Damian Dechev (University of Central Florida) | Integrating Low-latency Analysis into HPC System Monitoring · pdf, pdf |
Dehnavi, Maryam Mehri · more Maryam Mehri Dehnavi (Rutgers University) | CSTF: Large-Scale Sparse Tensor Factorizations on Distributed Platforms · pdf, pdf |
Demmel, James · more James Demmel (University of California Berkeley) | ImageNet Training in Minutes · pdf, pdf Reducing Communication in Proximal Newton Methods for Sparse Least Squares Problems · pdf, pdf |
Devarakonda, Aditya · more Aditya Devarakonda (University of California Berkeley) | Reducing Communication in Proximal Newton Methods for Sparse Least Squares Problems · pdf, pdf |
Devine, Thomas R. · more Thomas R. Devine (West Virginia University, Fairmont State University) | Scalable Solutions for Automated Single Pulse Identification and Classification in Radio Astronomy · pdf, pdf |
Dinan, James · more James Dinan (Intel Corporation) | Efficient Runtime Support for a Partitioned Global Logical Address Space · pdf, pdf |
Dong, Wenqian · more Wenqian Dong (University of California Merced) | Modeling Application Resilience in Large-scale Parallel Execution · pdf, pdf |
Dosanjh, Matthew · more Matthew Dosanjh (Sandia National Laboratories) | Improving MPI Multi-threaded RMA Communication Performance · pdf, pdf |
Dosanjh, Matthew G. F. · more Matthew G. F. Dosanjh (Sandia National Labratories) | The Case for Semi-Permanent Cache Occupancy · pdf, pdf |
Dou, Wanchun · more Wanchun Dou (Nanjing University) | Heterogeneous Wireless Charger Placement with Obstacles · pdf, pdf |
Duriakova, Erika · more Erika Duriakova (Insight Centre for Data Analytics; School of Computer Science and Informatics, University College Dublin) | An Empirical Comparison of k-Shortest Simple Path Algorithms on Multicores · pdf, pdf |
Erez, Mattan · more Mattan Erez (The University of Texas at Austin) | Characterizing the Impact of Soft Errors Affecting Floating-point ALUs using RTL-level Fault Injection · pdf, pdf |
Estrada, Trilce · more Trilce Estrada (University of New Mexico) | KeyBin2: Distributed Clustering for Scalable and In-situ Analysis · pdf, pdf, pdf, pdf KeyBin2: Distributed Clustering for Scalable and In-Situ Analysis · pdf, pdf |
Eyraud-Dubois, Lionel · more Lionel Eyraud-Dubois (Inria) | Using Static Allocation Algorithms for Matrix Matrix Multiplication on Multicores and GPUs · pdf, pdf |
Faizian, Peyman · more Peyman Faizian (Florida State University) | Load-Balanced Slim Fly Networks · pdf, pdf |
Feng, Dan · more Dan Feng (Huazhong University of Science and Technology) | HUS-Graph: I/O-Efficient Out-of-Core Graph Processing with Hybrid Update Strategy · pdf, pdf A Write-efficient and Consistent Hashing Scheme for Non-Volatile Memory · pdf, pdf |
Feng, Guangbo · more Guangbo Feng (School of Information Science and Engineering, Lanzhou University) | Click-Based Asynchronous Mesh Network with Bounded Bundled Data · pdf, pdf |
Feng, Wu-chun · more Wu-chun Feng (Virginia Tech) | Identifying Carcinogenic Multi-hit Combinations usingWeighted Set Cover Algorithm · pdf, pdf, pdf, pdf I/O Bottleneck Investigation in Deep Learning Systems · pdf, pdf, pdf, pdf A Framework for Auto-Parallelization and Code Generation: An Integrative Case Study with Legacy FORTRAN Codes · pdf, pdf |
Feng, Yangde · more Yangde Feng (Computer Network Information Center, Chinese Academy of Sciences) | Massively Scaling the Metal Microscopic Damage Simulation on Sunway TaihuLight Supercomputer · pdf, pdf |
Fernandez, Alvaro · more Alvaro Fernandez (Instituto de Física Corpuscular (IFIC), Universidad de Valencia and CSIC) | Performance Improvements of an Event Index Distributed System · pdf, pdf, pdf, pdf |
Figiela, Kamil · more Kamil Figiela (AGH University of Science and Technology) | Performance evaluation of parallel cloud functions · pdf, pdf, pdf, pdf |
Fong, Liana · more Liana Fong (IBM Research) | Matrix Factorization on GPUs with Memory Optimization and Approximate Computing · pdf, pdf |
Fox, Geoffrey C. · more Geoffrey C. Fox (Indiana University) | Task-parallel Analysis of Molecular Dynamics Trajectories · pdf, pdf |
Franchetti, Franz · more Franz Franchetti (Carnegie Mellon University) | Algorithm Design for Large Scale FFT-Based Simulations on CPU-GPU Platforms · pdf, pdf, pdf, pdf |
Fu, Hao · more Hao Fu (Tianjin University) | GLP4NN: A Convergence-invariant and Network-agnostic Light-weight Parallelization Framework for Deep Neural Networks on Modern GPUs · pdf, pdf |
Fu, Haohuan · more Haohuan Fu (Tsinghua University, National Supercomputing Center in Wuxi) | A Fast Sparse Triangular Solver for Structured-grid Problems on Sunway Many-core Processor SW26010 · pdf, pdf |
Fu, Mandi · more Mandi Fu (Huazhong University of Science and Technology) | A Write-efficient and Consistent Hashing Scheme for Non-Volatile Memory · pdf, pdf |
Fu, Song · more Song Fu (University of North Texas) | In-Depth Reliability Characterization of NAND Flash based Solid State Drives in High Performance Computing Systems · pdf, pdf, pdf, pdf |
Gadou, Mohamed · more Mohamed Gadou (University of Florida) | A Multilevel Subtree Method for Single and Batched Sparse Cholesky Factorization · pdf, pdf |
Gamblin, Todd · more Todd Gamblin (Lawrence Livermore National Laboratory) | PRIONN: Predicting Runtime and IO using Neural Networks · pdf, pdf |
Gan, Lin · more Lin Gan (Tsinghua University, National Supercomputing Center in Wuxi) | A Fast Sparse Triangular Solver for Structured-grid Problems on Sunway Many-core Processor SW26010 · pdf, pdf |
Gao, Chuansong · more Chuansong Gao (Baidu Inc.) | Index Shard Replication Strategies for Improving Resource Utilization in Large Scale Search Engines · pdf, pdf |
Gao, Ping · more Ping Gao (School of Software, Shandong University) | SPECTR: Scalable Parallel Short Read Error Correction on Multi-core and Many-core Architectures · pdf, pdf |
Gao, Xiaofeng · more Xiaofeng Gao (Shanghai Jiao Tong University) | IS-ASGD: Accelerating Asynchronous SGD using Importance Sampling · pdf, pdf |
Garner, Harold · more Harold Garner (Edward Via College of Osteopathic Medicine, Blacksburg) | Identifying Carcinogenic Multi-hit Combinations usingWeighted Set Cover Algorithm · pdf, pdf, pdf, pdf |
Geng, Guanhui · more Guanhui Geng (Baidu Inc.) | Index Shard Replication Strategies for Improving Resource Utilization in Large Scale Search Engines · pdf, pdf |
Geng, Xin · more Xin Geng (Beijing University of Posts and Telecomm. (BUPT)) | Learning Driven Parallelization for Large-Scale Video Workload in Hybrid CPU-GPU Cluster · pdf, pdf |
Gentile, Ann · more Ann Gentile (Sandia National Laboratories) | Integrating Low-latency Analysis into HPC System Monitoring · pdf, pdf |
Gerndt, Michael · more Michael Gerndt (Technical University of Munich) | Exploiting Inter-Phase Application Dynamism to Auto-Tune HPC Applications for Energy-Efficiency · pdf, pdf, pdf, pdf |
Ghazimirsaeed, S. Mahdieh · more S. Mahdieh Ghazimirsaeed (Queen’s University) | The Case for Semi-Permanent Cache Occupancy · pdf, pdf |
Glantz, Roland · more Roland Glantz (Karlsruhe Institute of Technology) | Topology-induced Enhancement of Mappings · pdf, pdf |
Glick, Ben · more Ben Glick (Lewis & Clark College) | An Extensible Ecosystem of Tools Providing User Friendly HPC Access and Supporting Jupyter Notebooks · pdf, pdf, pdf, pdf |
Goin, Aaron · more Aaron Goin (Washington State University) | WebNN: A Distributed Framework for Deep Learning · pdf, pdf, pdf, pdf |
Gonzalez, Santiago · more Santiago Gonzalez (Instituto de Física Corpuscular (IFIC), Universidad de Valencia and CSIC) | Performance Improvements of an Event Index Distributed System · pdf, pdf, pdf, pdf |
Goseva-Popstojanova, Katerina · more Katerina Goseva-Popstojanova (West Virginia University) | Scalable Solutions for Automated Single Pulse Identification and Classification in Radio Astronomy · pdf, pdf |
Grant, Ryan · more Ryan Grant (Sandia National Laboratories) | Improving MPI Multi-threaded RMA Communication Performance · pdf, pdf |
Grant, Ryan E. · more Ryan E. Grant (Sandia National Laboratories, University of New Mexico) | The Case for Semi-Permanent Cache Occupancy · pdf, pdf |
Groves, Taylor · more Taylor Groves (Lawrence Berkeley National Laboratory) | Improving MPI Multi-threaded RMA Communication Performance · pdf, pdf |
Gu, Lin · more Lin Gu (Huazhong University of Science & Technology) | Dual-Paradigm Stream Processing · pdf, pdf |
Guan, Qiang · more Qiang Guan (Kent State University) | Modeling Application Resilience in Large-scale Parallel Execution · pdf, pdf |
Guo, Deke · more Deke Guo (National University of Defense Technology) | DAG-SFC: Minimize the Embedding Cost of SFC with Parallel VNFs · pdf, pdf |
Guo, Hui · more Hui Guo (National University of Defense Technology) | DSAP: Data Structure-Aware Prefetching for Breadth First Search on GPU · pdf, pdf, pdf, pdf |
Guo, Song · more Song Guo (Hong Kong Polytechnic University) | ran-GJS: Orchestrating Data Analytics for Heterogeneous Geo-distributed Edges · pdf, pdf |
Gurbuzbalaban, Mert · more Mert Gurbuzbalaban (Rutgers University) | Reducing Communication in Proximal Newton Methods for Sparse Least Squares Problems · pdf, pdf |
Hall, Mary · more Mary Hall (University of Utah) Mary Hall is a professor in the School of Computing at University of Utah. She received a PhD in Computer Science from Rice University. Her research focus brings together compiler optimizations targeting current and future high-performance architectures on real-world applications. Hall's prior work has developed compiler techniques for exploiting parallelism and locality on a diversity of architectures: automatic parallelization for SMPs, superword-level parallelism for multimedia extensions, processing-in-memory architectures, FPGAs and more recently many-core CPUs and GPUs. Professor Hall is an ACM Distinguished Scientist and ACM’s representative on the Computing Research Association Board of Directors. She is deeply interested in computing history, having served on the ACM History Committee for a decade and as chair from 2009-2014. She also actively participates in outreach programs to encourage the participation of women and underrepresented minorities in computer science. | Bringing Sparse Computations into the Optimization Light · view Abstractions for Specifying Sparse Matrix Data Transformations · pdf, pdf, pdf, pdf |
Hammond, Simon D. · more Simon D. Hammond (Sandia National Labs) | Optimizing for KNL Usage Modes When Data Doesn’t Fit in MCDRAM · pdf, pdf |
Han, Li · more Li Han (East China Normal University, China; Univ Lyon, CNRS, ENS de Lyon, Inria, Université Claude-Bernard Lyon 1, LIP UMR5668 LYON Cedex 07 France) | A Generic Approach to Scheduling and Checkpointing Workflows · pdf, pdf |
Hanawa, Toshihiro · more Toshihiro Hanawa (The University of Tokyo) | Toward Footprint-Aware Power Shifting for Hybrid Memory Based Systems · pdf, pdf, pdf, pdf |
Hassan, Ahmed · more Ahmed Hassan (Alexandria University) | Nemo: NUMA-aware Concurrency Control for Scalable Transactional Memory · pdf, pdf |
He, Anping · more Anping He (School of Information Science and Engineering, Lanzhou University) | Click-Based Asynchronous Mesh Network with Bounded Bundled Data · pdf, pdf |
He, Bingsheng · more Bingsheng He (National University of Singapore) | Energy-Efficient Speculative Execution using Advanced Reservation for Heterogeneous Clusters · pdf, pdf GLP4NN: A Convergence-invariant and Network-agnostic Light-weight Parallelization Framework for Deep Neural Networks on Modern GPUs · pdf, pdf |
Hedden, Brandon · more Brandon Hedden (Washington State University) | A Comprehensive Study on Bugs in Actor Systems · pdf, pdf |
Hei, Yong · more Yong Hei (Institute of Microelectronics Chinese Academy of Sciences) | Click-Based Asynchronous Mesh Network with Bounded Bundled Data · pdf, pdf |
Herbein, Stephen · more Stephen Herbein (University of Delaware) | PRIONN: Predicting Runtime and IO using Neural Networks · pdf, pdf |
Hiebel, Jason · more Jason Hiebel (Michigan Technological University) | Utilization of Random Profiling for System Modeling and Dynamic Configuration · pdf, pdf, pdf, pdf Constructing Dynamic Policies for Paging Mode Selection · pdf, pdf |
Hjelm, Nathan · more Nathan Hjelm (Los Alamos National Lab, University of New Mexico) | Improving MPI Multi-threaded RMA Communication Performance · pdf, pdf |
Hoffmann, Henry · more Henry Hoffmann (University of Chicago) | Energy-efficient Application Resource Scheduling using Machine Learning Classifiers · pdf, pdf Performance & Energy Tradeoffs for Dependent Distributed Applications Under System-wide Power Caps · pdf, pdf |
Hofmeyr, Steven · more Steven Hofmeyr (Lawrence Berkeley National Laboratory) | Energy-efficient Application Resource Scheduling using Machine Learning Classifiers · pdf, pdf |
Hovland, Paul · more Paul Hovland (Argonne National Laboratory) | Vectorised Computation of Diverging Ensembles · pdf, pdf |
Hsieh, Cho-Jui · more Cho-Jui Hsieh (UC Davis) | ImageNet Training in Minutes · pdf, pdf |
Hu, Changjun · more Changjun Hu (University of Science and Technology Beijing) | Massively Scaling the Metal Microscopic Damage Simulation on Sunway TaihuLight Supercomputer · pdf, pdf |
Hu, Kan · more Kan Hu (Huazhong University of Science & Technology) | Disk Failure Prediction in Data Centers via Online Learning · pdf, pdf |
Hua, Yu · more Yu Hua (Huazhong University of Science and Technology) | A Write-efficient and Consistent Hashing Scheme for Non-Volatile Memory · pdf, pdf |
Huang, Libo · more Libo Huang (National University of Defense Technology) | DSAP: Data Structure-Aware Prefetching for Breadth First Search on GPU · pdf, pdf, pdf, pdf CGAcc: CSR-based Graph Traversal Accelerator on HMC · pdf, pdf, pdf, pdf |
Huang, Ping · more Ping Huang (Department of Computer and Information Sciences, Temple University) | Efficient SSD Caching by Avoiding Unnecessary Writes using Machine Learning · pdf, pdf |
Huang, Zhenyu · more Zhenyu Huang (State Key Laboratory of Information Security, Institute of Information Engineering, Chinese Academy of Sciences) | PBCS: An Efficient Parallel Characteristic Set Method for Solving Boolean Polynomial Systems · pdf, pdf |
Hughey, Stephen M. · more Stephen M. Hughey (Michigan State University) | Optimization of the Spherical Harmonics Transform based Tree Traversals in the Helmholtz FMM Algorithm · pdf, pdf |
Hurley, Neil · more Neil Hurley (Insight Centre for Data Analytics; School of Computer Science, University College Dublin) | An Empirical Comparison of k-Shortest Simple Path Algorithms on Multicores · pdf, pdf |
Hückelheim, Jan · more Jan Hückelheim (Imperial College London) | Vectorised Computation of Diverging Ensembles · pdf, pdf |
Ibrahim, Shadi · more Shadi Ibrahim (Inria) | Energy-Efficient Speculative Execution using Advanced Reservation for Heterogeneous Clusters · pdf, pdf Dual-Paradigm Stream Processing · pdf, pdf |
Imes, Connor · more Connor Imes (University of Chicago) | Energy-efficient Application Resource Scheduling using Machine Learning Classifiers · pdf, pdf |
Izadpanah, Ramin · more Ramin Izadpanah (University of Central Florida) | Integrating Low-latency Analysis into HPC System Monitoring · pdf, pdf |
Jain, Nikhil · more Nikhil Jain (Lawrence Livermore National Laboratory) | Interference between I/O and MPI Traffic on Fat-tree Networks · pdf, pdf |
Jannesari, Ali · more Ali Jannesari (Iowa State University) | Unveiling Thread Communication Bottlenecks Using Hardware-Independent Metrics · pdf, pdf |
Javidi Kishi, Masoomeh · more Masoomeh Javidi Kishi (Lehigh University) | Nemo: NUMA-aware Concurrency Control for Scalable Transactional Memory · pdf, pdf |
Jessup, Elizabeth · more Elizabeth Jessup (University of Colorado Boulder) | Iterative Solver Selection Techniques for Sparse Linear Systems · pdf, pdf, pdf, pdf |
Jha, Shantenu · more Shantenu Jha (Rutgers University, Brookhaven National Laboratory) | Task-parallel Analysis of Molecular Dynamics Trajectories · pdf, pdf |
Jia, Xiaoying · more Xiaoying Jia (Nvidia Corporation) | Revisiting Multi-pass Scatter and Gather on GPUs · pdf, pdf |
Jiang, Hong · more Hong Jiang (University of Texas at Arlington) | HUS-Graph: I/O-Efficient Out-of-Core Graph Processing with Hybrid Update Strategy · pdf, pdf Leverage Redundancy in Hardware Transactional Memory to Improve Cache Reliability · pdf, pdf |
Jiang, Linhua · more Linhua Jiang (University of Shanghai for Science and Technology) | Toward Performant and Energy-efficient Queries in Three-tier Wireless Sensor Networks · pdf, pdf |
Jin, Hai · more Hai Jin (Huazhong University of Science & Technology) | Disk Failure Prediction in Data Centers via Online Learning · pdf, pdf Dual-Paradigm Stream Processing · pdf, pdf |
Jin, Yibo · more Yibo Jin (Nanjing University) | ran-GJS: Orchestrating Data Analytics for Heterogeneous Geo-distributed Edges · pdf, pdf |
Kalikar, Saurabh · more Saurabh Kalikar (Indian Institute of Technology Madras, India) | Interval based Framework for Locking in Hierarchies · pdf, pdf, pdf, pdf NumLock: Towards Optimal Multi-Granularity Locking in Hierarchies · pdf, pdf |
Kannan, Ramakrishnan · more Ramakrishnan Kannan (Oak Ridge National Laboratory) | Partitioning and Communication Strategies for Sparse Non-negative Matrix Factorization · pdf, pdf |
Karavanic, Karen L. · more Karen L. Karavanic (Portland State University) | Performance Analysis of DroughtHPC and Holistic HPC Workflows · pdf, pdf, pdf, pdf |
Kavouklis, Christos · more Christos Kavouklis (Lawrence Livermore National Laboratory) | A Low-Communication Method to Solve Poisson's Equation on Locally-Structured Grids · pdf, pdf, pdf, pdf |
Kaya, Oguz · more Oguz Kaya (Inria Bordeaux) | Partitioning and Communication Strategies for Sparse Non-negative Matrix Factorization · pdf, pdf |
Kerola, Teemu · more Teemu Kerola (University of Helsinki) | Linear Time Sorting for Large Data Sets with Specialized Processor · pdf, pdf, pdf, pdf |
Kesavan, Ram · more Ram Kesavan (NetApp) | Efficient Search for Free Blocks in the WAFL File System · pdf, pdf |
Keutzer, Kurt · more Kurt Keutzer (UC Berkeley) | ImageNet Training in Minutes · pdf, pdf |
Khoshlessan, Mahzad · more Mahzad Khoshlessan (Arizona State University) | Task-parallel Analysis of Molecular Dynamics Trajectories · pdf, pdf |
Kinney, Nick · more Nick Kinney (Edward Via College of Osteopathic Medicine, Blacksburg) | Identifying Carcinogenic Multi-hit Combinations usingWeighted Set Cover Algorithm · pdf, pdf, pdf, pdf |
Kobus, Robin · more Robin Kobus (Institute for Computer Science, Johannes Gutenberg University) | SPECTR: Scalable Parallel Short Read Error Correction on Multi-core and Many-core Architectures · pdf, pdf |
Kocoloski, Brian · more Brian Kocoloski (Washington University in St. Louis) | Varbench: an Experimental Framework to Measure and Characterize Performance Variability · pdf, pdf |
Kogge, Peter M. · more Peter M. Kogge (Notre Dame) | Optimizing for KNL Usage Modes When Data Doesn’t Fit in MCDRAM · pdf, pdf |
Kovacevic, Jelena · more Jelena Kovacevic (Carnegie Mellon University) | Algorithm Design for Large Scale FFT-Based Simulations on CPU-GPU Platforms · pdf, pdf, pdf, pdf |
Krishnamoorthy, Sriram · more Sriram Krishnamoorthy (Pacific Northwest National Laboratory) | Characterizing the Impact of Soft Errors Affecting Floating-point ALUs using RTL-level Fault Injection · pdf, pdf |
Krommydas, Konstantinos · more Konstantinos Krommydas (Intel Corporation) | A Framework for Auto-Parallelization and Code Generation: An Integrative Case Study with Legacy FORTRAN Codes · pdf, pdf |
Kulkarni, Anuva · more Anuva Kulkarni (Carnegie Mellon University) | Algorithm Design for Large Scale FFT-Based Simulations on CPU-GPU Platforms · pdf, pdf, pdf, pdf |
Kumar, Nalini · more Nalini Kumar (University of Florida) | Scalable Behavioral Emulation of Extreme-Scale Systems Using Structural Simulation Toolkit · pdf, pdf |
Kumaraswamy, Madhura · more Madhura Kumaraswamy (Technical University of Munich) | Exploiting Inter-Phase Application Dynamism to Auto-Tune HPC Applications for Energy-Efficiency · pdf, pdf, pdf, pdf |
Lai, Zhuohang · more Zhuohang Lai (Hong Kong University of Science and Technology) | Revisiting Multi-pass Scatter and Gather on GPUs · pdf, pdf |
Lam, Herman · more Herman Lam (University of Florida) | Scalable Behavioral Emulation of Extreme-Scale Systems Using Structural Simulation Toolkit · pdf, pdf |
Lambert, Thomas · more Thomas Lambert (University of Manchester) | Using Static Allocation Algorithms for Matrix Matrix Multiplication on Multicores and GPUs · pdf, pdf |
Lange, John · more John Lange (University of Pittsburgh) | Varbench: an Experimental Framework to Measure and Characterize Performance Variability · pdf, pdf |
Larkins, D. Brian · more D. Brian Larkins (Rhodes College) | Efficient Runtime Support for a Partitioned Global Logical Address Space · pdf, pdf |
Le Fčvre, Valentin · more Valentin Le Fčvre (Univ Lyon, CNRS, ENS de Lyon, Inria, Université Claude-Bernard Lyon 1, LIP UMR5668 LYON Cedex 07 France) | A Generic Approach to Scheduling and Checkpointing Workflows · pdf, pdf |
Lee, Patrick P. C. · more Patrick P. C. Lee (The Chinese University of Hong Kong) | Cross-Rack-Aware Updates in Erasure-Coded Data Centers · pdf, pdf |
Leidel, John D. · more John D. Leidel (Texas Tech University) | Exploring Memory Coalescing for 3D-Stacked Hybrid Memory Cube · pdf, pdf, pdf, pdf Memory Coalescing for Hybrid Memory Cube · pdf, pdf |
Levenhagen, Michael J. · more Michael J. Levenhagen (Sandia National Laboratories) | The Case for Semi-Permanent Cache Occupancy · pdf, pdf |
Li, Cheng · more Cheng Li (UIUC) | Matrix Factorization on GPUs with Memory Optimization and Approximate Computing · pdf, pdf |
Li, Dong · more Dong Li (University of California Merced) | Modeling Application Resilience in Large-scale Parallel Execution · pdf, pdf |
Li, Jianjiang · more Jianjiang Li (University of Science and Technology Beijing) | Massively Scaling the Metal Microscopic Damage Simulation on Sunway TaihuLight Supercomputer · pdf, pdf |
Li, Jin cai · more Jin cai Li (College of Meteorology and Oceanology,National University of Defense Technology) | PBCS: An Efficient Parallel Characteristic Set Method for Solving Boolean Polynomial Systems · pdf, pdf |
Li, Kenli · more Kenli Li (Hunan University, National Supercomputing Center in Changsha) | UHCL-Darknet: An OpenCL-based Deep Neural Network Framework for Heterogeneous Multi-/Many-core Clusters · pdf, pdf |
Li, Keqin · more Keqin Li (State University of New York, National Supercomputing Center in Changsha) | UHCL-Darknet: An OpenCL-based Deep Neural Network Framework for Heterogeneous Multi-/Many-core Clusters · pdf, pdf |
Li, Keqiu · more Keqiu Li (School of Computer Science and Technology, Tianjin University; Tianjin Key Laboratory of Advanced Networking) | Less Provisioning: A Fine-Grained Resource Scaling Engine for Long-Running Services with Tail Latency Guarantees · pdf, pdf |
Li, Kun · more Kun Li (Institute of Computing Technology, Chinese Academy of Sciences) | Communication-Avoiding for Dynamical Core of Atmospheric General Circulation Model · pdf, pdf |
Li, Leisheng · more Leisheng Li (Institute of Software, Chinese Academy of Sciences) | Bandwidth Reduced Parallel SpMV on the SW26010 Many-Core Platform · pdf, pdf |
Li, Minghui · more Minghui Li (Baidu Inc.) | Index Shard Replication Strategies for Improving Resource Utilization in Large Scale Search Engines · pdf, pdf |
Li, Pengfei · more Pengfei Li (School of Information Science and Engineering, Lanzhou University) | Click-Based Asynchronous Mesh Network with Bounded Bundled Data · pdf, pdf |
Li, Qi · more Qi Li (Air Force Engineering University) | Cache Assisted Randomized Sharing Counters in Network Measurement · pdf, pdf |
Li, Qiong · more Qiong Li (National University of Defense Technology) | Duchy: Achieving Both SSD Durability and Controllable SMR Cleaning Overhead in Hybrid Storage Systems · pdf, pdf |
Li, Shigang · more Shigang Li (Institute of Computing Technology, Chinese Academy of Sciences) | Communication-Avoiding for Dynamical Core of Atmospheric General Circulation Model · pdf, pdf Massively Scaling the Metal Microscopic Damage Simulation on Sunway TaihuLight Supercomputer · pdf, pdf |
Li, Tonglin · more Tonglin Li (Lawrence Berkeley National Laboratory) | Toward Performant and Energy-efficient Queries in Three-tier Wireless Sensor Networks · pdf, pdf |
Li, Xiaoyong · more Xiaoyong Li (College of Meteorology and Oceanology,National University of Defense Technology) | PBCS: An Efficient Parallel Characteristic Set Method for Solving Boolean Polynomial Systems · pdf, pdf |
Li, Xueqi · more Xueqi Li (Institute of Computing Technology, Chinese Academy of Sciences; University of Chinese Academy of Sciences) | Accelerating FM-index Search for Genomic Data Processing · pdf, pdf |
Li, Xuesong · more Xuesong Li (Tsinghua University, High-Tech Institute of Xi'an) | Power Efficient High Performance Packet I/O · pdf, pdf |
Li, Yusen · more Yusen Li (Nankai University) | Index Shard Replication Strategies for Improving Resource Utilization in Large Scale Search Engines · pdf, pdf |
Li, Zhenhua · more Zhenhua Li (Tsinghua University) | H2Cloud: Maintaining the Whole Filesystem in an Object Storage Cloud · pdf, pdf |
Li, Zhenyu · more Zhenyu Li (Institute of Computing Technology, Chinese Academy of Sciences) | H2Cloud: Maintaining the Whole Filesystem in an Object Storage Cloud · pdf, pdf |
Liang, Shuwen · more Shuwen Liang (University of North Texas) | In-Depth Reliability Characterization of NAND Flash based Solid State Drives in High Performance Computing Systems · pdf, pdf, pdf, pdf |
Liao, Longlong · more Longlong Liao (National University of Defense Technology, State Key Laboratory of High Performance Computing) | UHCL-Darknet: An OpenCL-based Deep Neural Network Framework for Heterogeneous Multi-/Many-core Clusters · pdf, pdf |
Lim, Robert · more Robert Lim (University of Oregon) | Efficient Matching of GPU Kernel Subgraphs · pdf, pdf, pdf, pdf |
Lin, Xu · more Xu Lin (Xidian University) | DAG-SFC: Minimize the Embedding Cost of SFC with Parallel VNFs · pdf, pdf |
Lingg, Michael P. · more Michael P. Lingg (Michigan State University, Belcan Engineering) | Optimization of the Spherical Harmonics Transform based Tree Traversals in the Helmholtz FMM Algorithm · pdf, pdf |
Liu, Alex X. · more Alex X. Liu (State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, Jiangsu; Department of Computer Science and Engineering, Michigan State University, East Lansing, MI, USA.) | Charging Task Scheduling for Directional Wireless Charger Networks · pdf, pdf Cache Assisted Randomized Sharing Counters in Network Measurement · pdf, pdf |
Liu, Bangtian · more Bangtian Liu (Rutgers University) | CSTF: Large-Scale Sparse Tensor Factorizations on Distributed Platforms · pdf, pdf |
Liu, Qian · more Qian Liu (State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, Jiangsu) | Cache Assisted Randomized Sharing Counters in Network Measurement · pdf, pdf |
Liu, Weiguo · more Weiguo Liu (School of Software, Shandong University) | SPECTR: Scalable Parallel Short Read Error Correction on Multi-core and Many-core Architectures · pdf, pdf |
Liu, Xiaoguang · more Xiaoguang Liu (Nankai University) | Index Shard Replication Strategies for Improving Resource Utilization in Large Scale Search Engines · pdf, pdf |
Liu, Zhiyi · more Zhiyi Liu (Huazhong University of Science & Technology) | Dual-Paradigm Stream Processing · pdf, pdf |
Lu, Sanglu · more Sanglu Lu (Nanjing University) | ran-GJS: Orchestrating Data Analytics for Heterogeneous Geo-distributed Edges · pdf, pdf |
Luckow, Andre · more Andre Luckow (Ludwig-Maximilians University) | Task-parallel Analysis of Molecular Dynamics Trajectories · pdf, pdf |
Luo, Qiong · more Qiong Luo (Hong Kong University of Science and Technology) | Parallelizing Pruning-based Graph Structural Clustering · pdf, pdf Revisiting Multi-pass Scatter and Gather on GPUs · pdf, pdf |
Lv, Yashuai · more Yashuai Lv (National University of Defense Technology) | DSAP: Data Structure-Aware Prefetching for Breadth First Search on GPU · pdf, pdf, pdf, pdf |
Ma, Huadong · more Huadong Ma (Beijing University of Posts and Telecomm. (BUPT)) | Learning Driven Parallelization for Large-Scale Video Workload in Hybrid CPU-GPU Cluster · pdf, pdf |
Ma, Sheng · more Sheng Ma (National University of Defense Technology) | DSAP: Data Structure-Aware Prefetching for Breadth First Search on GPU · pdf, pdf, pdf, pdf |
Mache, Jens · more Jens Mache (Lewis & Clark College) | An Extensible Ecosystem of Tools Providing User Friendly HPC Access and Supporting Jupyter Notebooks · pdf, pdf, pdf, pdf |
Malawski, Maciej · more Maciej Malawski (AGH University of Science and Technology) | Performance evaluation of parallel cloud functions · pdf, pdf, pdf, pdf |
Malony, Allen · more Allen Malony (University of Oregon, ICPP General Chair) | Welcome and Introduction · view |
Marquet, Kevin · more Kevin Marquet (Univ Lyon, INSA Lyon, Inria, CITI) | NumaMMA: NUMA MeMory Analyzer · pdf, pdf |
Matsuoka, Satoshi · more Satoshi Matsuoka (Tokyo Institute of Technology, RIKEN Center for Computational Sciences) | Interference between I/O and MPI Traffic on Fat-tree Networks · pdf, pdf |
Mazaheri, Arya · more Arya Mazaheri (Technische Universitaet Darmstadt) | Unveiling Thread Communication Bottlenecks Using Hardware-Independent Metrics · pdf, pdf |
McCorquodale, Peter · more Peter McCorquodale (Lawrence Berkeley National Laboratory) | A Low-Communication Method to Solve Poisson's Equation on Locally-Structured Grids · pdf, pdf, pdf, pdf |
Mehri Dehnavi, Maryam · more Maryam Mehri Dehnavi (Rutgers University) | Reducing Communication in Proximal Newton Methods for Sparse Least Squares Problems · pdf, pdf |
Meng, Xiangxu · more Xiangxu Meng (School of Software, Shandong University) | SPECTR: Scalable Parallel Short Read Error Correction on Multi-core and Many-core Architectures · pdf, pdf |
Meyer, Ulrich · more Ulrich Meyer (Goethe University, Frankfurt) | An Empirical Comparison of k-Shortest Simple Path Algorithms on Multicores · pdf, pdf |
Meyerhenke, Henning · more Henning Meyerhenke (University of Cologne) | Topology-induced Enhancement of Mappings · pdf, pdf Balanced k-means for Parallel Geometric Partitioning · pdf, pdf |
Mills, Richard T. · more Richard T. Mills (Argonne National Laboratory) | Vectorized Parallel Sparse Matrix-Vector Multiplication in PETSc Using AVX-512 · pdf, pdf |
Mohamedin, Mohamed · more Mohamed Mohamedin (Virginia Tech) | Nemo: NUMA-aware Concurrency Control for Scalable Transactional Memory · pdf, pdf |
Mohammadi, Mahdi S. · more Mahdi S. Mohammadi (University of Arizona) | Abstractions for Specifying Sparse Matrix Data Transformations · pdf, pdf, pdf, pdf |
Mollah, Md Atiqul · more Md Atiqul Mollah (Florida State University) | Load-Balanced Slim Fly Networks · pdf, pdf |
MONIL, MOHAMMAD ALAUL HAQUE · more MOHAMMAD ALAUL HAQUE MONIL (UNIVERSITY OF OREGON) | Adaptive auto-tuning in HPX using APEX · pdf, pdf, pdf, pdf |
Monsalve Diaz, Jose Manuel · more Jose Manuel Monsalve Diaz (University of Delaware) | OpenMP 4.5 Implementations: Evaluation & Verification of Offloading Features · pdf, pdf, pdf, pdf |
Moody, Adam · more Adam Moody (Lawrence Livermore National Laboratory) | PRIONN: Predicting Runtime and IO using Neural Networks · pdf, pdf |
Moradkhani, Hamid · more Hamid Moradkhani (University of Alabama) | Performance Analysis of DroughtHPC and Holistic HPC Workflows · pdf, pdf, pdf, pdf |
Morel, Lionel · more Lionel Morel (Univ Grenoble Alpes, CEA, List) | NumaMMA: NUMA MeMory Analyzer · pdf, pdf |
Muite, Benson · more Benson Muite (University of Tartu) | A Computational Investigation of Redistricting Using Simulated Annealing · pdf, pdf, pdf, pdf |
Naksinehaboon, Nichamon · more Nichamon Naksinehaboon (Open Grid Computing) | Integrating Low-latency Analysis into HPC System Monitoring · pdf, pdf |
NAMYST, Raymond · more Raymond NAMYST (University of Bordeaux, INRIA) | Combining Task-based Parallelism and Adaptive Mesh Refinement Techniques in Molecular Dynamics Simulations · pdf, pdf |
Nandy, Payal · more Payal Nandy (University of Utah, university of utah) | Abstractions for Specifying Sparse Matrix Data Transformations · pdf, pdf, pdf, pdf |
Narayanan, Sri Hari Krishna · more Sri Hari Krishna Narayanan (Argonne National Laboratory) | Vectorised Computation of Diverging Ensembles · pdf, pdf |
Nasre, Rupesh · more Rupesh Nasre (Indian Institute of Technology Madras) | Interval based Framework for Locking in Hierarchies · pdf, pdf, pdf, pdf NumLock: Towards Optimal Multi-Granularity Locking in Hierarchies · pdf, pdf |
Neelakantan, Aravind · more Aravind Neelakantan (University of Florida) | Scalable Behavioral Emulation of Extreme-Scale Systems Using Structural Simulation Toolkit · pdf, pdf |
Nesterenko, Brandon · more Brandon Nesterenko (UCCS) | Improving Resource Utilization through Demand Aware Process Scheduling · pdf, pdf |
Nie, Ningming · more Ningming Nie (Computer Network Information Center, Chinese Academy of Sciences) | Massively Scaling the Metal Microscopic Damage Simulation on Sunway TaihuLight Supercomputer · pdf, pdf |
Norris, Boyana · more Boyana Norris (University of Oregon) | Iterative Solver Selection Techniques for Sparse Linear Systems · pdf, pdf, pdf, pdf |
Olivier, Stephen L. · more Stephen L. Olivier (Sandia National Labs) | Optimizing for KNL Usage Modes When Data Doesn’t Fit in MCDRAM · pdf, pdf |
Olschanowsky, Catherine · more Catherine Olschanowsky (Boise State University) | Abstractions for Specifying Sparse Matrix Data Transformations · pdf, pdf, pdf, pdf |
Orduńa, Juan · more Juan Orduńa (Departamento de Informática, Universiad de Valencia, SPAIN) | Performance Improvements of an Event Index Distributed System · pdf, pdf, pdf, pdf |
Owens, John D. · more John D. Owens (University of California, Davis) | Push-Pull on Graphs is Column- and Row-based SpMV Plus Masks · pdf, pdf, pdf, pdf Implementing Push-Pull Efficiently in GraphBLAS · pdf, pdf |
P. Duarte Júnior, Elias · more Elias P. Duarte Júnior (Federal University of Paraná) | A Communication-Efficient Causal Broadcast Protocol · pdf, pdf |
Palmieri, Roberto · more Roberto Palmieri (Lehigh University) | Nemo: NUMA-aware Concurrency Control for Scalable Transactional Memory · pdf, pdf |
Pang, Di · more Di Pang (West Virginia University) | Scalable Solutions for Automated Single Pulse Identification and Classification in Radio Astronomy · pdf, pdf |
Panja, Rintu · more Rintu Panja (IISC Banagalore, IISC) | MND-MST: A Multi-Node Multi-Device Parallel Boruvka's MST Algorithm · pdf, pdf |
Parashar, Manish · more Manish Parashar (University of Rutgers and NSF OAC Office Director) Manish is Office Director of the Office of Advanced Cyberinfrastructure at NSF. He joins NSF from Rutgers, The State University of New Jersey, where he is currently a Distinguished Professor and the founding Director of the Rutgers Discovery Informatics Institute. His research interests are in the broad areas of Parallel and Distributed Computing and Computational and Data-Enabled Science and Engineering. Manish is Fellow of AAAS, Fellow of IEEE/IEEE Computer Society and ACM Distinguished Scientist. | Transforming Science through Cyberinfrastructure · view |
Paraskevakos, Ioannis · more Ioannis Paraskevakos (Rutgers University) | Middleware for Data Intensive Analytics on HPC · pdf, pdf, pdf, pdf Task-parallel Analysis of Molecular Dynamics Trajectories · pdf, pdf |
Paudel, Anmol · more Anmol Paudel (Marquette University) | A HPC Framework for Big Spatial Data Processing and Analytics · pdf, pdf, pdf, pdf MPI-Vector-IO: Parallel I/O and Partitioning for Geospatial Vector Data · pdf, pdf |
Pawlik, Maciej · more Maciej Pawlik (AGH University of Science and Technology) | Performance evaluation of parallel cloud functions · pdf, pdf, pdf, pdf |
Peluso, Sebastiano · more Sebastiano Peluso (Virginia Tech) | Nemo: NUMA-aware Concurrency Control for Scalable Transactional Memory · pdf, pdf |
Perarnau, Swann · more Swann Perarnau (Argonne National Laboratory) | A Performance Model to Execute Workflows on High-Bandwidth-Memory Architectures · pdf, pdf |
Peterson, Matt · more Matt Peterson (University of New Mexico) | KeyBin2: Distributed Clustering for Scalable and In-Situ Analysis · pdf, pdf |
Phan, Tien-Dat · more Tien-Dat Phan (Dassault Systemes, ENS Rennes / IRISA) | Energy-Efficient Speculative Execution using Advanced Reservation for Heterogeneous Clusters · pdf, pdf |
Pottier, Loďc · more Loďc Pottier (ENS Lyon & Inria) | A Performance Model to Execute Workflows on High-Bandwidth-Memory Architectures · pdf, pdf |
PRAT, Raphaël · more Raphaël PRAT (CEA) | Combining Task-based Parallelism and Adaptive Mesh Refinement Techniques in Molecular Dynamics Simulations · pdf, pdf |
Predari, Maria · more Maria Predari (University of Cologne) | Topology-induced Enhancement of Mappings · pdf, pdf |
Pumma, Sarunya · more Sarunya Pumma (Virginia Tech) | I/O Bottleneck Investigation in Deep Learning Systems · pdf, pdf, pdf, pdf |
Puri, Satish · more Satish Puri (Marquette University) | A HPC Framework for Big Spatial Data Processing and Analytics · pdf, pdf, pdf, pdf MPI-Vector-IO: Parallel I/O and Partitioning for Geospatial Vector Data · pdf, pdf |
Qian, Chen · more Chen Qian (University of California, Santa Cruz) | H2Cloud: Maintaining the Whole Filesystem in an Object Storage Cloud · pdf, pdf |
Qian, Cheng · more Cheng Qian (National University of Defense Technology, University of Pittsburgh) | CGAcc: CSR-based Graph Traversal Accelerator on HMC · pdf, pdf, pdf, pdf |
Qian, Zhuzhong · more Zhuzhong Qian (Nanjing University) | ran-GJS: Orchestrating Data Analytics for Heterogeneous Geo-distributed Edges · pdf, pdf |
Qiao, Zhi · more Zhi Qiao (University of North Texas) | In-Depth Reliability Characterization of NAND Flash based Solid State Drives in High Performance Computing Systems · pdf, pdf, pdf, pdf |
Qiu, Kun · more Kun Qiu (Fudan University) | ParaPLL: Fast Parallel Shortest-path Distance Query on Large-scale Weighted Graphs · pdf, pdf |
Rafique, Muhammad · more Muhammad Rafique (University of Illinois at Chicago) | CAMPS: Conflict-Aware Memory-Side Prefetching Scheme for Hybrid Memory Cube · pdf, pdf |
Rahman, Md Shafayat · more Md Shafayat Rahman (Florida State University) | Topologies and Adaptive Routing on Large-Scale Interconnects · pdf, pdf, pdf, pdf Load-Balanced Slim Fly Networks · pdf, pdf |
Ramaswamy, Ajay · more Ajay Ramaswamy (University of Florida) | Scalable Behavioral Emulation of Extreme-Scale Systems Using Structural Simulation Toolkit · pdf, pdf |
Rang, Wei · more Wei Rang (University of North Carolina at Charlotte) | Joint Optimization of MapReduce Scheduling and Network Policy in Hierarchical Clouds · pdf, pdf |
Ranka, Sanjay · more Sanjay Ranka (University of Florida) | A Multilevel Subtree Method for Single and Batched Sparse Cholesky Factorization · pdf, pdf |
Rao, Jia · more Jia Rao (UTA) | Improving Resource Utilization through Demand Aware Process Scheduling · pdf, pdf |
Rathnayake, Sunimal · more Sunimal Rathnayake (National University of Singapore) | Cost-Time Performance of Scaling Applications on the Cloud · pdf, pdf, pdf, pdf |
Ren, Bangbang · more Bangbang Ren (National University of Defense Technology) | DAG-SFC: Minimize the Embedding Cost of SFC with Parallel VNFs · pdf, pdf |
Ren, Fengyuan · more Fengyuan Ren (Tsinghua University) | Power Efficient High Performance Packet I/O · pdf, pdf |
Ren, Xiao li · more Xiao li Ren (College of Meteorology and Oceanology,National University of Defense Technology) | PBCS: An Efficient Parallel Characteristic Set Method for Solving Boolean Polynomial Systems · pdf, pdf |
Rennich, Steven · more Steven Rennich (Nvidia) | A Multilevel Subtree Method for Single and Batched Sparse Cholesky Factorization · pdf, pdf |
Robert, Yves · more Yves Robert (Univ Lyon, CNRS, ENS de Lyon, Inria, Université Claude-Bernard Lyon 1, LIP UMR5668 LYON Cedex 07 France; University of Tennessee Knoxville, USA) | A Generic Approach to Scheduling and Checkpointing Workflows · pdf, pdf A Performance Model to Execute Workflows on High-Bandwidth-Memory Architectures · pdf, pdf |
Robins, Mark · more Mark Robins (Intel, Sr. Director / Head of the AI Strategy Office at Intel) Mark Robins was recently named Sr. Director / Head of the AI Strategy Office at Intel, responsible for establishing and driving the AI strategy for the company. Previously, Mark served as Sr. Director / Head of AI Products at Intel, responsible for product management and planning for Intel’s hardware and software AI products.
Mark served as VP Products for Nervana prior to the acquisition by Intel in July 2016. Prior to Nervana, Mark served as VP Products for Influitive and, before that, as VP Products for Chegg through their IPO in 2013. Before Chegg, Mark was co-founder/CEO of Grouply, a social networking startup funded by Reid Hoffman and O’Reilly Alphatech Ventures that was acquired in 2010. Before Grouply, Mark was Sr. Director of Product Management at Siebel Systems through its acquisition by Oracle in 2006. Mark started his career as a satellite communications systems engineer for Hughes Aircraft Company (now Boeing). Mark earned a BS and MS in electrical engineering from Cornell and Caltech, respectively, where he also studied neural networks. Mark holds an MBA from Harvard Business School. | AI and HPC: Challenges and Opportunities · view |
Rupp, Karl · more Karl Rupp (TU Wien) | Vectorized Parallel Sparse Matrix-Vector Multiplication in PETSc Using AVX-512 · pdf, pdf |
Sasanka, Ruchira · more Ruchira Sasanka (Intel Corporation) | A Framework for Auto-Parallelization and Code Generation: An Integrative Case Study with Legacy FORTRAN Codes · pdf, pdf |
Sathre, Paul · more Paul Sathre (Virginia Tech) | A Framework for Auto-Parallelization and Code Generation: An Integrative Case Study with Legacy FORTRAN Codes · pdf, pdf |
Savas, Suleyman · more Suleyman Savas (Halmstad University) | Designing Domain-Specific Heterogenous Manycores from Dataflow Programs · pdf, pdf, pdf, pdf |
Schickedanz, Alexander · more Alexander Schickedanz (Goethe University, Frankfurt) | An Empirical Comparison of k-Shortest Simple Path Algorithms on Multicores · pdf, pdf |
Schmidt, Bertil · more Bertil Schmidt (Johannes Gutenberg University of Mainz) | SPECTR: Scalable Parallel Short Read Error Correction on Multi-core and Many-core Architectures · pdf, pdf Massively Parallel Huffman Decoding on GPUs · pdf, pdf |
Schonbein, Whit · more Whit Schonbein (Sandia National Laboratories, University of New Mexico) | The Case for Semi-Permanent Cache Occupancy · pdf, pdf |
Schulz, Martin · more Martin Schulz (Technical University of Munich) | Toward Footprint-Aware Power Shifting for Hybrid Memory Based Systems · pdf, pdf, pdf, pdf Interference between I/O and MPI Traffic on Fat-tree Networks · pdf, pdf |
Selva, Manuel · more Manuel Selva (Univ. Grenoble Alpes, CNRS, Inria, Grenoble INP, LIG) | NumaMMA: NUMA MeMory Analyzer · pdf, pdf |
Sens, Pierre · more Pierre Sens (Sorbonne Université, CNRS, INRIA, LIP6) | A Communication-Efficient Causal Broadcast Protocol · pdf, pdf |
Seth, Sharad · more Sharad Seth (University of Nebraska-Lincoln) | Leverage Redundancy in Hardware Transactional Memory to Improve Cache Reliability · pdf, pdf |
Shaik, Shehenaz · more Shehenaz Shaik (Auburn University) | Resource and Service Management in Fog Computing · pdf, pdf, pdf, pdf |
Shen, Yulong · more Yulong Shen (Xidian University) | DAG-SFC: Minimize the Embedding Cost of SFC with Parallel VNFs · pdf, pdf |
Shen, Zhirong · more Zhirong Shen (The Chinese University of Hong Kong) | Cross-Rack-Aware Updates in Erasure-Coded Data Centers · pdf, pdf |
Shi, Weisong · more Weisong Shi (Wayne State University) | In-Depth Reliability Characterization of NAND Flash based Solid State Drives in High Performance Computing Systems · pdf, pdf, pdf, pdf |
Si, Min · more Min Si (Argonne National Laboratory) | I/O Bottleneck Investigation in Deep Learning Systems · pdf, pdf, pdf, pdf |
Smith, Barry F. · more Barry F. Smith (Argonne National Laboratory) | Vectorized Parallel Sparse Matrix-Vector Multiplication in PETSc Using AVX-512 · pdf, pdf |
Snir, Marc · more Marc Snir (University of Illinois) | Fast and generic concurrent message-passing · pdf, pdf, pdf, pdf FULT: Fast User-Level Thread Scheduling Using Bit-Vectors · pdf, pdf |
Snyder, John · more John Snyder (Rhodes College) | Efficient Runtime Support for a Partitioned Global Logical Address Space · pdf, pdf |
Song, Jun qiang · more Jun qiang Song (College of Meteorology and Oceanology,National University of Defense Technology) | PBCS: An Efficient Parallel Characteristic Set Method for Solving Boolean Polynomial Systems · pdf, pdf |
Sood, Kanika · more Kanika Sood (University of Oregon) | Iterative Solver Selection Techniques for Sparse Linear Systems · pdf, pdf, pdf, pdf |
Soori, Saeed · more Saeed Soori (Rutgers University) | Reducing Communication in Proximal Newton Methods for Sparse Least Squares Problems · pdf, pdf |
Srisa-an, Witawas · more Witawas Srisa-an (University of Nebraska at Lincoln) | Leverage Redundancy in Hardware Transactional Memory to Improve Cache Reliability · pdf, pdf |
Stitt, Greg · more Greg Stitt (University of Florida) | Scalable Behavioral Emulation of Extreme-Scale Systems Using Structural Simulation Toolkit · pdf, pdf |
Strout, Michelle · more Michelle Strout (University of Arizona) | Abstractions for Specifying Sparse Matrix Data Transformations · pdf, pdf, pdf, pdf |
subasi, omer · more omer subasi (Pacific Northwest National Laboratory) | Characterizing the Impact of Soft Errors Affecting Floating-point ALUs using RTL-level Fault Injection · pdf, pdf |
Sun, Jizhou · more Jizhou Sun (Tianjin University) | GLP4NN: A Convergence-invariant and Network-agnostic Light-weight Parallelization Framework for Deep Neural Networks on Modern GPUs · pdf, pdf |
Sun, Ke · more Ke Sun (Nanjing University) | Charging Task Scheduling for Directional Wireless Charger Networks · pdf, pdf |
Sun, Ninghui · more Ninghui Sun (Institute of Computing Technology, Chinese Academy of Sciences) | Accelerating FM-index Search for Genomic Data Processing · pdf, pdf |
Sun, Qiao · more Qiao Sun (Institute of Software, Chinese Academy of Sciences) | Bandwidth Reduced Parallel SpMV on the SW26010 Many-Core Platform · pdf, pdf |
Sun, Shixuan · more Shixuan Sun (Hong Kong University of Science and Technology) | Parallelizing Pruning-based Graph Structural Clustering · pdf, pdf |
Suriyakumar, Yasodha · more Yasodha Suriyakumar (Portland State University) | Performance Analysis of DroughtHPC and Holistic HPC Workflows · pdf, pdf, pdf, pdf |
Tan, Guangming · more Guangming Tan (Institute of Computing Technology, Chinese Academy of Sciences) | Communication-Avoiding for Dynamical Core of Atmospheric General Circulation Model · pdf, pdf Accelerating FM-index Search for Genomic Data Processing · pdf, pdf |
Tan, Wei · more Wei Tan (Citadel) | Matrix Factorization on GPUs with Memory Optimization and Approximate Computing · pdf, pdf |
Tan, Yujuan · more Yujuan Tan (Chongqing University) | Leverage Redundancy in Hardware Transactional Memory to Improve Cache Reliability · pdf, pdf |
Tang, Bingchang · more Bingchang Tang (Beijing University of Posts and Telecomm. (BUPT)) | Learning Driven Parallelization for Large-Scale Video Workload in Hybrid CPU-GPU Cluster · pdf, pdf |
Tang, Guoming · more Guoming Tang (National University of Defense Technology) | DAG-SFC: Minimize the Embedding Cost of SFC with Parallel VNFs · pdf, pdf |
Tang, Meng · more Meng Tang (University of Florida) | A Multilevel Subtree Method for Single and Batched Sparse Cholesky Factorization · pdf, pdf |
Tang, Shanjiang · more Shanjiang Tang (Tianjin University) | GLP4NN: A Convergence-invariant and Network-agnostic Light-weight Parallelization Framework for Deep Neural Networks on Modern GPUs · pdf, pdf |
Tang, Xueyan · more Xueyan Tang (Nanyang Technological University) | Index Shard Replication Strategies for Improving Resource Utilization in Large Scale Search Engines · pdf, pdf |
Taufer, Michela · more Michela Taufer (University of Tennessee) | KeyBin2: Distributed Clustering for Scalable and In-Situ Analysis · pdf, pdf PRIONN: Predicting Runtime and IO using Neural Networks · pdf, pdf |
Teo, Yong Meng · more Yong Meng Teo (National University of Singapore) | Cost-Time Performance of Scaling Applications on the Cloud · pdf, pdf, pdf, pdf |
Teodorescu, Radu · more Radu Teodorescu (The Ohio State University) | C-Graph: A Highly Efficient Concurrent Graph Reachability Query Framework · pdf, pdf |
Tian, Qi · more Qi Tian (University of Texas at San Antonio) | UHCL-Darknet: An OpenCL-based Deep Neural Network Framework for Heterogeneous Multi-/Many-core Clusters · pdf, pdf |
Tong, Jiancong · more Jiancong Tong (Baidu Inc.) | Index Shard Replication Strategies for Improving Resource Utilization in Large Scale Search Engines · pdf, pdf |
Trahay, François · more François Trahay (SAMOVAR, CNRS, Télécom SudParis, Université Paris-Saclay) | NumaMMA: NUMA MeMory Analyzer · pdf, pdf |
Tyson, Gareth · more Gareth Tyson (Queen Mary University of London) | H2Cloud: Maintaining the Whole Filesystem in an Object Storage Cloud · pdf, pdf |
Tzovas, Charilaos · more Charilaos Tzovas (University of Cologne) | Balanced k-means for Parallel Geometric Partitioning · pdf, pdf |
Vadhiyar, Sathish · more Sathish Vadhiyar (IISC Banagalore, IISC) | MND-MST: A Multi-Node Multi-Device Parallel Boruvka's MST Algorithm · pdf, pdf |
Van Straalen, Brian · more Brian Van Straalen (Lawrence Berkeley National Laboratory) | A Low-Communication Method to Solve Poisson's Equation on Locally-Structured Grids · pdf, pdf, pdf, pdf |
Varghese, Robin · more Robin Varghese (Edward Via College of Osteopathic Medicine, Blacksburg) | Identifying Carcinogenic Multi-hit Combinations usingWeighted Set Cover Algorithm · pdf, pdf, pdf, pdf |
Velesko, Paulius · more Paulius Velesko (Intel Corporation) | Vectorised Computation of Diverging Ensembles · pdf, pdf |
Vivien, Frédéric · more Frédéric Vivien (Univ Lyon, CNRS, ENS de Lyon, Inria, Université Claude-Bernard Lyon 1, LIP UMR5668 LYON Cedex 07 France) | A Generic Approach to Scheduling and Checkpointing Workflows · pdf, pdf |
von Looz, Moritz · more Moritz von Looz (University of Cologne) | Balanced k-means for Parallel Geometric Partitioning · pdf, pdf |
Wang, Fang · more Fang Wang (Huazhong University of Science and Technology) | HUS-Graph: I/O-Efficient Out-of-Core Graph Processing with Hybrid Update Strategy · pdf, pdf |
Wang, Fei · more Fei Wang (Shanghai Jiao Tong University) | IS-ASGD: Accelerating Asynchronous SGD using Importance Sampling · pdf, pdf |
Wang, Gang · more Gang Wang (Nankai University) | Index Shard Replication Strategies for Improving Resource Utilization in Large Scale Search Engines · pdf, pdf |
Wang, Hua · more Hua Wang (Wuhan National Labo for Optoelectronics, HuaZhong University of Science and Technology) | Efficient SSD Caching by Avoiding Unnecessary Writes using Machine Learning · pdf, pdf |
Wang, Jiayao · more Jiayao Wang (University of Shanghai for Science and Technology) | Toward Performant and Energy-efficient Queries in Three-tier Wireless Sensor Networks · pdf, pdf |
Wang, Jue · more Jue Wang (Computer Network Information Center, Chinese Academy of Sciences) | Massively Scaling the Metal Microscopic Damage Simulation on Sunway TaihuLight Supercomputer · pdf, pdf |
Wang, Weijun · more Weijun Wang (Nanjing University) | Heterogeneous Wireless Charger Placement with Obstacles · pdf, pdf |
Wang, Xi · more Xi Wang (Texas Tech University) | Exploring Memory Coalescing for 3D-Stacked Hybrid Memory Cube · pdf, pdf, pdf, pdf Memory Coalescing for Hybrid Memory Cube · pdf, pdf |
Wang, Xiangmeng · more Xiangmeng Wang (University of Science and Technology Beijing) | Massively Scaling the Metal Microscopic Damage Simulation on Sunway TaihuLight Supercomputer · pdf, pdf |
Wang, Xiaoliang · more Xiaoliang Wang (Nanjing University) | ran-GJS: Orchestrating Data Analytics for Heterogeneous Geo-distributed Edges · pdf, pdf |
Wang, Xiaoyu · more Xiaoyu Wang (State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, Jiangsu) | Cache Assisted Randomized Sharing Counters in Network Measurement · pdf, pdf Heterogeneous Wireless Charger Placement with Obstacles · pdf, pdf |
Wang, Xin · more Xin Wang (Fudan University) | ParaPLL: Fast Parallel Shortest-path Distance Query on Large-scale Weighted Graphs · pdf, pdf |
Wang, Xinliang · more Xinliang Wang (Tsinghua University, National Supercomputer Center in Wuxi) | A Fast Sparse Triangular Solver for Structured-grid Problems on Sunway Many-core Processor SW26010 · pdf, pdf |
Wang, Yuanrong · more Yuanrong Wang (Institute of Computing Technology, Chinese Academy of Sciences; University of Chinese Academy of Sciences) | Accelerating FM-index Search for Genomic Data Processing · pdf, pdf |
Wang, Zhenlin · more Zhenlin Wang (Michigan Technological University) | Utilization of Random Profiling for System Modeling and Dynamic Configuration · pdf, pdf, pdf, pdf Constructing Dynamic Policies for Paging Mode Selection · pdf, pdf |
Wang, Zhiying · more Zhiying Wang (National University of Defense Technology) | DSAP: Data Structure-Aware Prefetching for Breadth First Search on GPU · pdf, pdf, pdf, pdf CGAcc: CSR-based Graph Traversal Accelerator on HMC · pdf, pdf, pdf, pdf |
Wang, Zijun · more Zijun Wang (IBM Research) | Matrix Factorization on GPUs with Memory Optimization and Approximate Computing · pdf, pdf |
Weber, Kenneth · more Kenneth Weber (University of Mount Union, Department of Computer Science) | Toward a Multi-GPU Implementation of the Modular Integer GCD Algorithm: Extended Abstract · pdf, pdf, pdf, pdf |
Wei, Dengping · more Dengping Wei (National University of Defense Technology) | Duchy: Achieving Both SSD Durability and Controllable SMR Cleaning Overhead in Hybrid Storage Systems · pdf, pdf |
Wei, Yanjie · more Yanjie Wei (Shenzhen Institutes of Advanced Technology, CAS) | SPECTR: Scalable Parallel Short Read Error Correction on Multi-core and Many-core Architectures · pdf, pdf |
Weissenberger, Andre · more Andre Weissenberger (Johann Wolfgang Goethe University) | Massively Parallel Huffman Decoding on GPUs · pdf, pdf |
Wernsman, Robert · more Robert Wernsman (Iowa State University) | Improving First Level Cache Efficiency for GPUs Using Dynamic Line Protection · pdf, pdf |
Wolf, Felix · more Felix Wolf (Technische Universitaet Darmstadt) | Unveiling Thread Communication Bottlenecks Using Hardware-Independent Metrics · pdf, pdf |
Wolf, Tilman · more Tilman Wolf (University of Massachusetts Amherst) | ParaPLL: Fast Parallel Shortest-path Distance Query on Large-scale Weighted Graphs · pdf, pdf |
Wood, Chad · more Chad Wood (University of Oregon) | SOSflow: A Scalable Observation System for Introspection and In Situ Analytics · pdf, pdf, pdf, pdf |
Wu, Baodong · more Baodong Wu (Institute of Computing Technology, Chinese Academy of Sciences) | Communication-Avoiding for Dynamical Core of Atmospheric General Circulation Model · pdf, pdf Massively Scaling the Metal Microscopic Damage Simulation on Sunway TaihuLight Supercomputer · pdf, pdf |
Wu, Changmao · more Changmao Wu (Institute of Software, Chinese Academy of Sciences) | Bandwidth Reduced Parallel SpMV on the SW26010 Many-Core Platform · pdf, pdf |
Wu, Jie · more Jie Wu (Temple University) | NFV Middlebox Placement with Balanced Set-up Cost and Bandwidth Consumption · pdf, pdf |
Wu, Kai · more Kai Wu (University of California Merced) | Modeling Application Resilience in Large-scale Parallel Execution · pdf, pdf |
Wu, Song · more Song Wu (Huazhong University of Science & Technology) | Disk Failure Prediction in Data Centers via Online Learning · pdf, pdf Dual-Paradigm Stream Processing · pdf, pdf |
Wu, Xiaobing · more Xiaobing Wu (University of Canterbury) | Heterogeneous Wireless Charger Placement with Obstacles · pdf, pdf |
Wyatt, Michael R. · more Michael R. Wyatt (University of Delaware) | PRIONN: Predicting Runtime and IO using Neural Networks · pdf, pdf |
Xia, Yinglong · more Yinglong Xia (Huawei Research America) | C-Graph: A Highly Efficient Concurrent Graph Reachability Query Framework · pdf, pdf |
Xiao, Jiang · more Jiang Xiao (Huazhong University of Science & Technology) | Disk Failure Prediction in Data Centers via Online Learning · pdf, pdf |
Xiao, Junmin · more Junmin Xiao (Institute of Computing Technology, Chinese Academy of Sciences) | Communication-Avoiding for Dynamical Core of Atmospheric General Circulation Model · pdf, pdf |
Xiao, Liquan · more Liquan Xiao (National University of Defense Technology) | Duchy: Achieving Both SSD Durability and Controllable SMR Cleaning Overhead in Hybrid Storage Systems · pdf, pdf |
Xie, Jing · more Jing Xie (Tsinghua University) | Power Efficient High Performance Packet I/O · pdf, pdf |
Xie, Xuchao · more Xuchao Xie (National University of Defense Technology) | Duchy: Achieving Both SSD Durability and Controllable SMR Cleaning Overhead in Hybrid Storage Systems · pdf, pdf |
Xiong, Zhuang · more Zhuang Xiong (Huazhong University of Science & Technology) | Disk Failure Prediction in Data Centers via Online Learning · pdf, pdf |
Xu, Kai · more Kai Xu (School of Software, Shandong University) | SPECTR: Scalable Parallel Short Read Error Correction on Multi-core and Many-core Architectures · pdf, pdf |
Xu, Ping · more Ping Xu (Tsinghua University, National Supercomputer Center in Wuxi) | A Fast Sparse Triangular Solver for Structured-grid Problems on Sunway Many-core Processor SW26010 · pdf, pdf |
Xu, Xianghao · more Xianghao Xu (Huazhong University of Science and Technology) | HUS-Graph: I/O-Efficient Out-of-Core Graph Processing with Hybrid Update Strategy · pdf, pdf |
Xue, Wei · more Wei Xue (Tsinghua University, National Supercomputer Center in Wuxi) | A Fast Sparse Triangular Solver for Structured-grid Problems on Sunway Many-core Processor SW26010 · pdf, pdf |
Yan, Zhichao · more Zhichao Yan (University of Texas at Arlington) | Leverage Redundancy in Hardware Transactional Memory to Improve Cache Reliability · pdf, pdf |
Yang, Bailong · more Bailong Yang (High-Tech Institute of Xi'an) | Power Efficient High Performance Packet I/O · pdf, pdf |
Yang, Canqun · more Canqun Yang (National University of Defense Technology, State Key Laboratory of High Performance Computing) | UHCL-Darknet: An OpenCL-based Deep Neural Network Framework for Heterogeneous Multi-/Many-core Clusters · pdf, pdf |
Yang, Carl · more Carl Yang (University of California, Davis; Lawrence Berkeley National Laboratory) | Push-Pull on Graphs is Column- and Row-based SpMV Plus Masks · pdf, pdf, pdf, pdf Implementing Push-Pull Efficiently in GraphBLAS · pdf, pdf |
Yang, Chao · more Chao Yang (School of Mathematical Sciences &National Engineering Laboratory forVideo Technology, Peking University) | A Fast Sparse Triangular Solver for Structured-grid Problems on Sunway Many-core Processor SW26010 · pdf, pdf |
Yang, Donglin · more Donglin Yang (University of North Carolina at Charlotte) | Joint Optimization of MapReduce Scheduling and Network Policy in Hierarchical Clouds · pdf, pdf |
Yang, Guangwen · more Guangwen Yang (Tsinghua University, National Supercomputing Center in Wuxi) | A Fast Sparse Triangular Solver for Structured-grid Problems on Sunway Many-core Processor SW26010 · pdf, pdf |
Yang, Tianye · more Tianye Yang (National University of Defense Technology) | Duchy: Achieving Both SSD Durability and Controllable SMR Cleaning Overhead in Hybrid Storage Systems · pdf, pdf |
Yao, Erlin · more Erlin Yao (Institute of Computing Technology, Chinese Academy of Sciences) | Communication-Avoiding for Dynamical Core of Atmospheric General Circulation Model · pdf, pdf |
Ye, Jun · more Jun Ye (Intel Asia-Pacificn R&D Ltd.) | IS-ASGD: Accelerating Asynchronous SGD using Importance Sampling · pdf, pdf |
Yi, Qing · more Qing Yi (UCCS) | Improving Resource Utilization through Demand Aware Process Scheduling · pdf, pdf |
Yi, Xinbo · more Xinbo Yi (Wuhan National Labo for Optoelectronics, HuaZhong University of Science and Technology) | Efficient SSD Caching by Avoiding Unnecessary Writes using Machine Learning · pdf, pdf |
Yi, Yusheng · more Yusheng Yi (Huazhong University of Science & Technology) | Disk Failure Prediction in Data Centers via Online Learning · pdf, pdf |
You, Yang · more Yang You (UC Berkeley) | ImageNet Training in Minutes · pdf, pdf |
Yu, Ce · more Ce Yu (Tianjin University) | GLP4NN: A Convergence-invariant and Network-agnostic Light-weight Parallelization Framework for Deep Neural Networks on Modern GPUs · pdf, pdf |
Yu, Hongfeng · more Hongfeng Yu (University of Nebraska-Lincoln) | A Distributed Infomap Algorithm for Scalable and High-Quality Community Detection · pdf, pdf |
Yu, Qi · more Qi Yu (National University of Defense Technology) | DSAP: Data Structure-Aware Prefetching for Breadth First Search on GPU · pdf, pdf, pdf, pdf CGAcc: CSR-based Graph Traversal Accelerator on HMC · pdf, pdf, pdf, pdf |
Yuan, Jing · more Jing Yuan (Fudan University) | ParaPLL: Fast Parallel Shortest-path Distance Query on Large-scale Weighted Graphs · pdf, pdf |
Yuan, Xin · more Xin Yuan (Florida State University) | Load-Balanced Slim Fly Networks · pdf, pdf |
Zambreno, Joseph · more Joseph Zambreno (Iowa State University) | Improving First Level Cache Efficiency for GPUs Using Dynamic Line Protection · pdf, pdf |
Zang, Dawei · more Dawei Zang (Institute of Computing Technology, Chinese Academy of Sciences) | Accelerating FM-index Search for Genomic Data Processing · pdf, pdf |
Zeng, Jianping · more Jianping Zeng (University of Nebraska-Lincoln) | A Distributed Infomap Algorithm for Scalable and High-Quality Community Detection · pdf, pdf |
Zhai, Ennan · more Ennan Zhai (Yale University) | H2Cloud: Maintaining the Whole Filesystem in an Object Storage Cloud · pdf, pdf |
Zhang, Changyou · more Changyou Zhang (Institute of Software, Chinese Academy of Sciences) | Bandwidth Reduced Parallel SpMV on the SW26010 Many-Core Platform · pdf, pdf |
Zhang, Chen · more Chen Zhang (Huazhong University of Science and Technology) | FFS-VA: A Fast Filtering System for Large-scale Video Analytics · pdf, pdf |
Zhang, Haitao · more Haitao Zhang (Beijing University of Posts and Telecomm. (BUPT)) | Learning Driven Parallelization for Large-Scale Video Workload in Hybrid CPU-GPU Cluster · pdf, pdf |
Zhang, He · more He Zhang (Institute of Atmospheric Physics, Chinese Academy of Sciences) | Communication-Avoiding for Dynamical Core of Atmospheric General Circulation Model · pdf, pdf |
Zhang, Hong · more Hong Zhang (Argonne National Laboratory) | Vectorized Parallel Sparse Matrix-Vector Multiplication in PETSc Using AVX-512 · pdf, pdf |
Zhang, Huazhe · more Huazhe Zhang (University of Chicago) | Performance & Energy Tradeoffs for Dependent Distributed Applications Under System-wide Power Caps · pdf, pdf |
Zhang, Jiajia · more Jiajia Zhang (Institute of Software, Chinese Academy of Sciences) | Bandwidth Reduced Parallel SpMV on the SW26010 Many-Core Platform · pdf, pdf |
Zhang, Jiling · more Jiling Zhang (School of Physical Science and Engineering, Lanzhou University) | Click-Based Asynchronous Mesh Network with Bounded Bundled Data · pdf, pdf |
Zhang, Lijun · more Lijun Zhang (Nanjing University) | Charging Task Scheduling for Directional Wireless Charger Networks · pdf, pdf |
Zhang, Qifei · more Qifei Zhang (Zhejiang University) | Delta-Stepping Synchronous Parallel Model · pdf, pdf, pdf, pdf |
Zhang, Rongqi · more Rongqi Zhang (School of Computer Science and Technology, Tianjin University; Tianjin Key Laboratory of Advanced Networking) | Less Provisioning: A Fine-Grained Resource Scaling Engine for Long-Running Services with Tail Latency Guarantees · pdf, pdf |
Zhang, Sheng · more Sheng Zhang (Nanjing University) | ran-GJS: Orchestrating Data Analytics for Heterogeneous Geo-distributed Edges · pdf, pdf |
Zhang, Tong · more Tong Zhang (Tsinghua University) | Power Efficient High Performance Packet I/O · pdf, pdf |
Zhang, Weidong · more Weidong Zhang (Peking University) | Delta-Stepping Synchronous Parallel Model · pdf, pdf, pdf, pdf |
Zhang, Xiaoyi · more Xiaoyi Zhang (Huazhong University of Science and Technology) | A Write-efficient and Consistent Hashing Scheme for Non-Volatile Memory · pdf, pdf |
Zhang, Yongxuan · more Yongxuan Zhang (Huazhong University of Science and Technology) | HUS-Graph: I/O-Efficient Out-of-Core Graph Processing with Hybrid Update Strategy · pdf, pdf |
Zhang, Yunquan · more Yunquan Zhang (Institute of Computing Technology, Chinese Academy of Sciences) | Communication-Avoiding for Dynamical Core of Atmospheric General Circulation Model · pdf, pdf Massively Scaling the Metal Microscopic Damage Simulation on Sunway TaihuLight Supercomputer · pdf, pdf |
Zhang, Zhao · more Zhao Zhang (TACC) | ImageNet Training in Minutes · pdf, pdf |
Zhao, Dongfang · more Dongfang Zhao (University of California, Davis; University of Nevada, Reno) | Toward Performant and Energy-efficient Queries in Three-tier Wireless Sensor Networks · pdf, pdf |
Zhao, Jin · more Jin Zhao (Fudan University) | ParaPLL: Fast Parallel Shortest-path Distance Query on Large-scale Weighted Graphs · pdf, pdf |
Zhao, Juan · more Juan Zhao (College of Meteorology and Oceanology,National University of Defense Technology; College of Computer,National University of Defense Technology) | PBCS: An Efficient Parallel Characteristic Set Method for Solving Boolean Polynomial Systems · pdf, pdf |
Zhao, Laiping · more Laiping Zhao (School of Computer Software, Tianjin University; Tianjin Key Laboratory of Advanced Networking) | Less Provisioning: A Fine-Grained Resource Scaling Engine for Long-Running Services with Tail Latency Guarantees · pdf, pdf |
Zhao, Leiyu · more Leiyu Zhao (Tsinghua University) | H2Cloud: Maintaining the Whole Filesystem in an Object Storage Cloud · pdf, pdf |
Zhao, Minghao · more Minghao Zhao (Tsinghua University) | H2Cloud: Maintaining the Whole Filesystem in an Object Storage Cloud · pdf, pdf |
Zhao, Xinghui · more Xinghui Zhao (Washington State University) | WebNN: A Distributed Framework for Deep Learning · pdf, pdf, pdf, pdf A Comprehensive Study on Bugs in Actor Systems · pdf, pdf |
Zheng, Jiaqi · more Jiaqi Zheng (State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, Jiangsu) | Charging Task Scheduling for Directional Wireless Charger Networks · pdf, pdf Cache Assisted Randomized Sharing Counters in Network Measurement · pdf, pdf Heterogeneous Wireless Charger Placement with Obstacles · pdf, pdf |
Zheng, Weimin · more Weimin Zheng (Tsinghua University) | A Fast Sparse Triangular Solver for Structured-grid Problems on Sunway Many-core Processor SW26010 · pdf, pdf |
Zhou, Amelie Chi · more Amelie Chi Zhou (Shenzhen University) | Energy-Efficient Speculative Execution using Advanced Reservation for Heterogeneous Clusters · pdf, pdf |
Zhou, Ke · more Ke Zhou (Wuhan National Labo for Optoelectronics, HuaZhong University of Science and Technology) | Efficient SSD Caching by Avoiding Unnecessary Writes using Machine Learning · pdf, pdf |
Zhou, Li · more Li Zhou (The Ohio State University) | C-Graph: A Highly Efficient Concurrent Graph Reachability Query Framework · pdf, pdf |
Zhou, Xiaobo · more Xiaobo Zhou (University of Colorado Colorado Springs) | Leveraging Resource Bottleneck Awareness and Optimizations for Data Analytics Performance · pdf, pdf, pdf, pdf Reference-distance Eviction and Prefetching for Cache Management in Spark · pdf, pdf |
Zhu, Min · more Min Zhu (College of Meteorology and Oceanology,National University of Defense Technology) | PBCS: An Efficient Parallel Characteristic Set Method for Solving Boolean Polynomial Systems · pdf, pdf |
Zhu, Xian · more Xian Zhu (Iowa State University) | Improving First Level Cache Efficiency for GPUs Using Dynamic Line Protection · pdf, pdf |
Zhu, Yuanyang · more Yuanyang Zhu (Fudan University) | ParaPLL: Fast Parallel Shortest-path Distance Query on Large-scale Weighted Graphs · pdf, pdf |
Zhu, Zhichun · more Zhichun Zhu (University of Illinois at Chicago) | CAMPS: Conflict-Aware Memory-Side Prefetching Scheme for Hybrid Memory Cube · pdf, pdf |