MODERN GPU

Download the repository

Follow me on twitter

UNDER CONSTRUCTION: MGPU Search - Fastest search of sorted arrays.

UNDER CONSTRUCTION: MGPU Select - Fastest k'th-smallest/median selection library.

FEATURED: MGPU Sort - Fastest GPU sort library.

MGPU Scan

UNDER CONSTRUCTION. Benchmark and usage for scan and segmented scan library. Global scan developed as a model for upsweep-reduce-downsweep functions. Segmented scan developed.

MGPU Sort

Benchmark and usage for high-performance CUDA radix sort library. Develop radix sort algorithm and analyze scatter efficiency on GPU memory architecture. Study three phases of MGPU Sort implementation: count, histogram, sort.

MGPU Search

UNDER CONSTRUCTION. Benchmark and usage for high-performance CUDA search library. Provides std::lower_bound-like functionality for vectorized queries on sorted arrays. Demonstrates cooperative thread programming for complete utilization of retrieved memory segments.

MGPU Select

UNDER CONSTRUCTION. High-performance GPU k'th-smallest/median selection. Select as complement of sort. Radix counting.

MGPU Sparse

UNDER CONSTRUCTION. Benchmark and usage for high-performance CUDA sparse matrix library. Demonstrates segmented scan for balancing jagged work loads across parallel processors.

Particle Systems

COMING SOON. Model a differentiable field with moving particles. Establish a hash using support domain for derivative operator and use MGPU Sort to sort particles by hash index each frame, providing fast access to particle neighborhoods.

Mailbag

I want to know what people are using their GPUs for, so send in your programming problems. If it's interesting, I may write the kernel and post the solution here.

The Library and Tutorial

Modern GPU is a set of GPU computing programs and companion articles. The articles form a tutorial and cover the programs as case studies. GPGPU literature has avoided coverage of functions requiring complex inter-thread communication, favoring algorithms that have obvious parallelizations. But these are not the most interesting problems in this emerging field.

MGPU takes a scan-centric approach. You'll learn to use blocks as cooperative thread arrays to tackle more intricate problems. GPU idioms are identified to help develop instincts for making good programming decisions.

The MGPU Sort library is the first case study. This flexible radix sort implementation has excellent performance, sorting over 1.3 billion keys per second on a Geforce GTX 570. The code was written to help illustrate fast hierarchical scan strategies.

MGPU Select is the complement of MGPU Sort. Selection algorithms find the k'th-smallest value or median in a sequence. We reuse the sort library's radix counting code but process from the most-significant digit to the least significant (radix sort goes the opposite way) to find the k'th smallest value in an array ten times faster than direct sorting.

Segmented scan, an extension of the function studied in the first two sections, is a method for balancing jagged work loads over many parallel processors. It is at the heart of the third library, MGPU Sparse Matrix.

Another program, MGPU Particles, is written but not yet documented. It builds on the lessons and libraries of the preceding sections and adds physics and 3D graphics. You'll see how to develop a system allowing real-time exploration of physical parameter spaces.

The MGPU library is open sourced under the permissive BSD license. It is hosted by github, here.

About Me

My name is Sean Baxter. I am currently a research scientist at NVIDIA. I graduated from Central Washington University with degrees in physics and mathematics.

I have been programming C++ since 1996, specializing in Win32, COM, and Direct3D. Recently I have focused on parser design with boost.spirit.

My previous large project was Earth RSE, a real-time 3D exploration and analysis tool for multi-spectral earth science datasets.

In my early days I wrote a number of COM programming tutorials under the alias Captain COM. My first-ever tutorial demonstrated the use of VBE2 extensions in Watcom C++ using DPMI translation services. It is a gem of mid-90s teenage web design.

My latest infatuation is GPGPU. I come at it from the computer graphics side, not the computer science side, so readers will get a different perspective compared to academic texts.

Contact Me

You can email me at moderngpu at gmail. I'm also often available in #cuda on freenode irc.

Links

BackForty Computing - Duane Merrill's CUDA library

Perils of Parallel - Greg Pfister's posts on the state of the industry

Real World Technologies - David Kanter's hardware architecture overviews

SemiAccurate - Tech gossip from Charlie Demerjian

omeprazole cheap - purchase mifepristone and misoprostol revatio caverta spying software