algorithm - Uniform sampling of k integers from [0:n) -

February 15, 2015

my goal sample k integers 0, ... n-1 without duplication. order of sampled integers doesn't matter. @ every each call (which occurs often), n , k vary not (n 250,000 , k 2,000). i've come following amortized o(k) algorithm:

prepare array items 0, 1, 2, ... , n-1. takes o(n) since n relatively stable, cost can made amortized constant.
sample random number r [0:i] = n - 1. here cost in fact related n, n not big, dependency not critical.
swap rth item , ith item in array a.
decrease 1.
repeat k times steps 2~4; have random permutation of length k @ tail of a. copy this.
we should roll initial state (0, ... , n-1) keep cost of step 1 constant. can done push r stack of length k @ each pass of step 2. preparation of stack requires amortized constant cost.

i think uniform sampling of permutation/combination should exhaustively studied problem, either (1) there better solution, or @ least (2) solution (minor modification of) well-known solution. thus,

in case (1), want know better solution.
in case (2), want find reference.

please me. thanks.

if k less n -- say, less half of n -- efficient solution keep numbers generated in hash table (actually, hash set, since there no value associated key). if random number happens in hash table, reject , generate 1 in place. actual values of k , n suggested (k ∼ 2000; n ∼ 250,000) expected number of rejections generate k unique samples less 10, hardly noticeable. size of hash table o(k), , can deleted @ end of sample generation.
it possible simulate fyk shuffle algorithm using hash table instead of vector of n values, thereby avoiding having reject generated random numbers. if using vector a, start initializing a[i] i, every 0 ≤ < k. hash table h, start empty hash table, , use convention h[i] considered i if key i not in hash table. step 3 in algorithm -- "swap a[r] a[i]" -- becomes "add h[r] next element of sample , set h[r] h[i]". note unnecessary set h[i] because element never referred again: subsequent random numbers r generate range not include i.

because hash table in case contains both keys , values, larger hash set used in alternative 1, above, , increased size (and consequent increase in memory cache misses) cause more overhead saved eliminating rejections. however, has advantage of working if k close n.
finally, in proposed algorithm, quite easy restore a in o(k) time. value a[j] have been modified algorithm if:

a. n − k ≤ j < n, or

b. there i such n − k ≤ < n , a[i] = j.

consequently, can restore vector a looking @ each a[i] n − k ≤ < n: first, if a[i] < n−k, set a[a[i]] a[i]; then, unconditionally set a[i] i.

Search This Blog

How Y

algorithm - Uniform sampling of k integers from [0:n) -

Comments

Post a Comment

Popular posts from this blog

meteor - inserting data to database gives error "insert failed: Method '/texts/insert' not found" -

angular - DownloadURL return null in below code -

html - unterminated string literal “onclick” event in anchor -