UNDER CONSTRUCTION: MGPU Search - Fastest search of sorted arrays.
UNDER CONSTRUCTION: MGPU Select - Fastest k'th-smallest/median selection library.
FEATURED: MGPU Sort - Fastest GPU sort library.
Overview of APIs. CPU and GPU architectures compared. Scan introduced. Strided order to thread order and bank conflicts explained. GPU Idioms presented.
UNDER CONSTRUCTION. Benchmark and usage for scan and segmented scan library. Global scan developed as a model for upsweep-reduce-downsweep functions. Segmented scan developed.
Benchmark and usage for high-performance CUDA radix sort library. Develop radix sort algorithm and analyze scatter efficiency on GPU memory architecture. Study three phases of MGPU Sort implementation: count, histogram, sort.
UNDER CONSTRUCTION. Benchmark and usage for high-performance CUDA search library. Provides std::lower_bound-like functionality for vectorized queries on sorted arrays. Demonstrates cooperative thread programming for complete utilization of retrieved memory segments.
UNDER CONSTRUCTION. High-performance GPU k'th-smallest/median selection. Select as complement of sort. Radix counting.
UNDER CONSTRUCTION. Benchmark and usage for high-performance CUDA sparse matrix library. Demonstrates segmented scan for balancing jagged work loads across parallel processors.
COMING SOON. Model a differentiable field with moving particles. Establish a hash using support domain for derivative operator and use MGPU Sort to sort particles by hash index each frame, providing fast access to particle neighborhoods.
Modern GPU is a set of GPU computing programs and companion articles. The articles form a tutorial and cover the programs as case studies. GPGPU literature has avoided coverage of functions requiring complex inter-thread communication, favoring algorithms that have obvious parallelizations. But these are not the most interesting problems in this emerging field.
MGPU takes a scan-centric approach. You'll learn to use blocks as cooperative thread arrays to tackle more intricate problems. GPU idioms are identified to help develop instincts for making good programming decisions.
The MGPU Sort library is the first case study. This flexible radix sort implementation has excellent performance, sorting over 1.3 billion keys per second on a Geforce GTX 570. The code was written to help illustrate fast hierarchical scan strategies.
MGPU Select is the complement of MGPU Sort. Selection algorithms find the k'th-smallest value or median in a sequence. We reuse the sort library's radix counting code but process from the most-significant digit to the least significant (radix sort goes the opposite way) to find the k'th smallest value in an array ten times faster than direct sorting.
Segmented scan, an extension of the function studied in the first two sections, is a method for balancing jagged work loads over many parallel processors. It is at the heart of the third library, MGPU Sparse Matrix.
Another program, MGPU Particles, is written but not yet documented. It builds on the lessons and libraries of the preceding sections and adds physics and 3D graphics. You'll see how to develop a system allowing real-time exploration of physical parameter spaces.
The MGPU library is open sourced under the permissive BSD license. It is hosted by github, here.
My name is Sean Baxter. I am currently a research scientist at NVIDIA. I graduated from Central Washington University with degrees in physics and mathematics.
I have been programming C++ since 1996, specializing in Win32, COM, and Direct3D. Recently I have focused on parser design with boost.spirit.
My previous large project was Earth RSE, a real-time 3D exploration and analysis tool for multi-spectral earth science datasets.
In my early days I wrote a number of COM programming tutorials under the alias Captain COM. My first-ever tutorial demonstrated the use of VBE2 extensions in Watcom C++ using DPMI translation services. It is a gem of mid-90s teenage web design.
My latest infatuation is GPGPU. I come at it from the computer graphics side, not the computer science side, so readers will get a different perspective compared to academic texts.
You can email me at moderngpu at gmail. I'm also often available in #cuda on freenode irc.
BackForty Computing - Duane Merrill's CUDA library
Perils of Parallel - Greg Pfister's posts on the state of the industry
Real World Technologies - David Kanter's hardware architecture overviews
SemiAccurate - Tech gossip from Charlie Demerjian