| Select | A Parallel Implementation of Viterbi Training for Acoustic Models using Graphics Processing Units |
| Select | A Study of Persistent Threads Style GPU Programming for GPGPU Workloads |
| Select | An Algorithm for Fast Edit Distance Computation on GPUs |
| Select | Auto-tuning a High-Level Language Targeted to GPU Codes |
| Select | DL: A Data Layout Transformation System for Heterogeneous Computing |
| Select | Efficient Parallel Merge Sort for Fixed and Variable Length Keys |
| Select | Efficient sparse matrix-vector multiplication on Fermi GPUs |
| Select | GPU Accelerated Nonlinear Optimization in Radio Interferometric Calibration |
| Select | GPU acceleration for the pricing of the CMS spread option |
| Select | GPU-Accelerated Monte Carlo Simulations of Dense Stellar Systems |
| Select | Heterogeneous Tasks and Conduits Framework for Rapid Application Portability and Deployment |
| Select | High-efficiency Lattice QCD computations on the Fermi architecture |
| Select | Implementation and Optimization of a Thermal Lattice Boltzmann Algorithm on a multi-GPU cluster |
| Select | ispc: A SPMD Compiler for High-Performance CPU Programming |
| Select | Machine Learning for Predictive Auto-Tuning with Boosted Regression Trees |
| Select | Modestly Faster Histogram Computation on GPUs |
| Select | OP2: An Active Library Framework for Solving Unstructured Mesh-based Applications on Multi-Core and
Many-Core Architectures |
| Select | Optimization and Architecture Effects on GPU Computing Workload Performance |
| Select | Optimization of the parallel black-box fast multipole method on CUDA |
| Select | Optimized Strategies for Mapping Three-dimensional FFTs onto CUDA GPUs |
| Select | Parallel Lossless Data Compression on the GPU |
| Select | Parallel Speculative Encryption of Multiple AES Contexts on GPUs |
| Select | Policy-based Tuning for Performance Portability and Library Co-optimization |
| Select | ScatterAlloc: Massively Parallel Dynamic Memory Allocation for the GPU |
| Select | VOCL: An Optimized Environment for Transparent Virtualization of Graphics Processing Units |