cuda - lattency of GMEM for write -
streight doubt:
so, if kernel not read gmem (only reads @ begin), , uses gmem storage storage only, writing gmem stall warp reading gmem do?
background:
the kernel sweep across set of partiotioning patterns can predicted , managed. single pattern may use lot of sem , registers calculated , advanced. if statistical value of specific parttern higher previous highest value among processed values in warp, highest value replaced , pattern has saved. kernel carries on evaluationg alternative patterns until end. number of alternative patterns high , necessary recover alternative pattern of highest statistical value @ kernel end , every warp should elect 1 winning pattern (every warp can have own space in gmem) , can keep highest value in register.
using more sem store best pattern reduces pattern lenght or number of threads in block, may work to... indeed initial problem little computing in warp call each pattern (poor performance)...
thanks in advance
Comments
Post a Comment