c++ - Can a thread-local copy of select elements be created of a shared 2D array in a parallel region? (Shared, private, barrier: OPenMP) -


i have 2-d grid of nxn elements. in 1 iteration, i'm calculating value of 1 element averaging values of neighbors. is:

    for(int i=0;i<n;i++)         for(int j=0;j<n;j++)             grid[i][j] = (grid[i-1][j] + grid[i][j-1] + grid[i+1][j] + grid[i][j+1])/4.0; 

and need run above nested loop iter number of iterations. need following:

  1. i need threads calculate average, wait till threads have finished calculating , update grid in 1 go.
  2. the loop iter iterations run sequentially, during every iteration, value of grid[i][j] every i , j should calculated in parallel.

in order have following ideas , questions:

  1. maybe make grid shared , put copy of select 4 elements of grid needed calculating grid[i][j] making 4 elements private thread. (basically grid shared threads, there local copy of 4 iteration-specific elements in every thread too.) is possible?
  2. would barrier in fact needed threads finish , start onto next iteration?

i'm new openmp way of thinking , i'm utterly lost in simple problem. i'd grateful if resolve confusion.

  1. in practice, you'd want have (much) fewer threads grid points, each thread calculating whole bunch of points (for example, 1 row). there overhead associated starting openmp (or other kind of) threads, , program memory-bound rather cpu-bound anyway. starting thread per grid point defeat whole purpose of parallelizing computation. hence, idea #1 not recommended (i not quite sure understood correctly though; maybe not proposing).

  2. i recommend (also pointed out others in op comments) allocate twice memory needed store grid values , use 2 pointers swapped between iterations: 1 points memory holding previous iteration values read only, other 1 new iteration values write-only. note swap pointers, not copy memory. after iteration done, can copy final result desired location.

  3. yes, need synchronize threads between iterations, in openmp done implicitly opening parallel region within iteration loop (there implicit barrier @ end of parallel region):

    for (int iter = 0; iter < niter; ++iter) {     #pragma omp parallel     {         // range of points current thread         // loop on thread's points , apply stencil     } } 

    or, using parallel for construct:

    const int np = n*n; (int iter = 0; iter < niter; ++iter) {     #pragma omp parallel     (int ip = 0; ip < np; ++ip)     {         const int = ip / n;         const int j = ip % n;         // apply stencil [i,j]     } } 

    the second version auto-distribute work evenly between available threads, want. in first have manually.


Comments

Popular posts from this blog

Is there a better way to structure post methods in Class Based Views -

performance - Why is XCHG reg, reg a 3 micro-op instruction on modern Intel architectures? -

jquery - Responsive Navbar with Sub Navbar -