algorithm - Guidance for DIY Implementation of Parallel Merge Sort (parallel merge) on CUDA 8 -
for past few days i've been working on scratch parallel merge sort algorithm, have been stewing on lot , while easy run sequential sort in parallel. see no reason why can't merge in parallel ground up. following link:
http://www.mcs.anl.gov/~itf/dbpp/text/node127.html
i doing parallel merge. i've been working on how recursion , loops , i'd rather recursion think have working model loop. i've scratched out work , i'll describe how want set algorithm first i'll describe merge kernel
__global__ void merge(int* data, int depth) { sequential merge (data, left = threadidx.x, right = left + 2 ^ depth - 1); }
right left offset 2 ^ depth - 1 because that's relationship noticed in working algorithm out hand
sort 3 1 7 4 9 0 2 6 8 5 \ / \ / \ / \ / \ / merge(left, left + 2 - 1) 1 3 4 7 0 9 2 6 5 8 \ / \ / / merge(left, left + 4 - 1) 1 3 4 7 0 2 6 9 5 8 \ / / merge(left, left + 8 - 1) 0 1 2 3 4 6 7 9 5 8 \ / merge(left, size) 0 1 2 3 4 5 6 7 8 9
i thinking of doing in loop this
mergesort(int* data, int size) { int depth = 1; int maxdepth = log base 2 (size) // # blocks = size / 512 threads per block int blocks = size / 512; // # threads = size / blocks / 2 because 1 merge per 2 elements int threads = (size / blocks) / 2; int index = 0; while (depth < maxdepth) { merge<<< blocks / depth, threads / depth>>>(data, depth++); } }
do seem on right track or thinking off?
Comments
Post a Comment