algorithm - Guidance for DIY Implementation of Parallel Merge Sort (parallel merge) on CUDA 8 -
for past few days i've been working on scratch parallel merge sort algorithm, have been stewing on lot , while easy run sequential sort in parallel. see no reason why can't merge in parallel ground up. following link:
http://www.mcs.anl.gov/~itf/dbpp/text/node127.html
i doing parallel merge. i've been working on how recursion , loops , i'd rather recursion think have working model loop. i've scratched out work , i'll describe how want set algorithm first i'll describe merge kernel
__global__ void merge(int* data, int depth) {     sequential merge (data, left = threadidx.x, right = left + 2 ^ depth - 1); } right left offset 2 ^ depth - 1 because that's relationship noticed in working algorithm out hand
sort 3  1  7  4  9  0  2  6  8  5        \ /   \ /   \ /   \ /   \ /      merge(left, left + 2 - 1)      1 3   4 7   0 9   2 6   5 8          \ /         \ /       /       merge(left, left + 4 - 1)      1 3 4 7     0 2 6 9    5 8             \   /          /          merge(left, left + 8 - 1)      0 1 2 3 4 6 7 9     5 8                     \  /              merge(left, size)      0 1 2 3 4 5 6 7 8 9 i thinking of doing in loop this
mergesort(int* data, int size) {     int depth = 1;      int maxdepth = log base 2 (size)      // # blocks = size / 512 threads per block     int blocks = size / 512;      // # threads = size / blocks / 2 because 1 merge per 2 elements     int threads = (size / blocks) / 2;      int index = 0;      while (depth < maxdepth)     {         merge<<< blocks / depth, threads / depth>>>(data, depth++);     } } do seem on right track or thinking off?
 
 
Comments
Post a Comment