machine learning - Optimizing word2vec model comparisons -

July 15, 2013

i have word2vec model every user, understand 2 words on different models. there more optimized way compare trained models this?

useravec = word2vec.load(useravec.w2v)   userbvec = word2vec.load(userbvec.w2v)    #for word in vocab, perform dot product:  cosine_similarity = np.dot(useravec['president'], userbvec['president'])/(np.linalg.norm(useravec['president'])* np.linalg.norm(userbvec['president']))

is best way compare 2 models? there stronger way see how 2 models compare rather word word? picture 1000 users/models, each similar number of words in vocab.

there's faulty assumption @ heart of question.

if models useravec , userbvec trained in separate sessions, on separate data, calculated angle between useravec['president'] , userbvec['president'] is, alone, meaningless. there's randomness in algorithm initialization, , in modes of training – via things negative-sampling, frequent-word-downsampling, , arbitrary reordering of training examples due thread-scheduling variability). result, repeated model-training exact same corpus , parameters can result in different coordinates same words.

it's relative distances/directions, among words co-trained in same iterative process, have significance.

so might interesting compare whether 2 model's lists of top-n similar words, particular word, similar. raw value of angle, between coordinates of same word in alternate models, isn't meaningful measure.

Search This Blog

How Y

machine learning - Optimizing word2vec model comparisons -

Comments

Post a Comment

Popular posts from this blog

meteor - inserting data to database gives error "insert failed: Method '/texts/insert' not found" -

angular - DownloadURL return null in below code -

html - unterminated string literal “onclick” event in anchor -