algorithm - Guidance for DIY Implementation of Parallel Merge Sort (parallel merge) on CUDA 8 -


for past few days i've been working on scratch parallel merge sort algorithm, have been stewing on lot , while easy run sequential sort in parallel. see no reason why can't merge in parallel ground up. following link:

http://www.mcs.anl.gov/~itf/dbpp/text/node127.html

i doing parallel merge. i've been working on how recursion , loops , i'd rather recursion think have working model loop. i've scratched out work , i'll describe how want set algorithm first i'll describe merge kernel

__global__ void merge(int* data, int depth) {     sequential merge (data, left = threadidx.x, right = left + 2 ^ depth - 1); } 

right left offset 2 ^ depth - 1 because that's relationship noticed in working algorithm out hand

sort 3  1  7  4  9  0  2  6  8  5        \ /   \ /   \ /   \ /   \ /      merge(left, left + 2 - 1)      1 3   4 7   0 9   2 6   5 8          \ /         \ /       /       merge(left, left + 4 - 1)      1 3 4 7     0 2 6 9    5 8             \   /          /          merge(left, left + 8 - 1)      0 1 2 3 4 6 7 9     5 8                     \  /              merge(left, size)      0 1 2 3 4 5 6 7 8 9 

i thinking of doing in loop this

mergesort(int* data, int size) {     int depth = 1;      int maxdepth = log base 2 (size)      // # blocks = size / 512 threads per block     int blocks = size / 512;      // # threads = size / blocks / 2 because 1 merge per 2 elements     int threads = (size / blocks) / 2;      int index = 0;      while (depth < maxdepth)     {         merge<<< blocks / depth, threads / depth>>>(data, depth++);     } } 

do seem on right track or thinking off?


Comments

Popular posts from this blog

Is there a better way to structure post methods in Class Based Views -

performance - Why is XCHG reg, reg a 3 micro-op instruction on modern Intel architectures? -

jquery - Responsive Navbar with Sub Navbar -