Accepted Papers


SelectA Parallel Implementation of Viterbi Training for Acoustic Models using Graphics Processing Units
SelectA Study of Persistent Threads Style GPU Programming for GPGPU Workloads
SelectAn Algorithm for Fast Edit Distance Computation on GPUs
SelectAuto-tuning a High-Level Language Targeted to GPU Codes
SelectDL: A Data Layout Transformation System for Heterogeneous Computing
SelectEfficient Parallel Merge Sort for Fixed and Variable Length Keys
SelectEfficient sparse matrix-vector multiplication on Fermi GPUs
SelectGPU Accelerated Nonlinear Optimization in Radio Interferometric Calibration
SelectGPU acceleration for the pricing of the CMS spread option
SelectGPU-Accelerated Monte Carlo Simulations of Dense Stellar Systems
SelectHeterogeneous Tasks and Conduits Framework for Rapid Application Portability and Deployment
SelectHigh-efficiency Lattice QCD computations on the Fermi architecture
SelectImplementation and Optimization of a Thermal Lattice Boltzmann Algorithm on a multi-GPU cluster
Selectispc: A SPMD Compiler for High-Performance CPU Programming
SelectMachine Learning for Predictive Auto-Tuning with Boosted Regression Trees
SelectModestly Faster Histogram Computation on GPUs
SelectOP2: An Active Library Framework for Solving Unstructured Mesh-based Applications on Multi-Core and Many-Core Architectures
SelectOptimization and Architecture Effects on GPU Computing Workload Performance
SelectOptimization of the parallel black-box fast multipole method on CUDA
SelectOptimized Strategies for Mapping Three-dimensional FFTs onto CUDA GPUs
SelectParallel Lossless Data Compression on the GPU
SelectParallel Speculative Encryption of Multiple AES Contexts on GPUs
SelectPolicy-based Tuning for Performance Portability and Library Co-optimization
SelectScatterAlloc: Massively Parallel Dynamic Memory Allocation for the GPU
SelectVOCL: An Optimized Environment for Transparent Virtualization of Graphics Processing Units