machine learning - Optimizing word2vec model comparisons -


i have word2vec model every user, understand 2 words on different models. there more optimized way compare trained models this?

useravec = word2vec.load(useravec.w2v)   userbvec = word2vec.load(userbvec.w2v)    #for word in vocab, perform dot product:  cosine_similarity = np.dot(useravec['president'], userbvec['president'])/(np.linalg.norm(useravec['president'])* np.linalg.norm(userbvec['president'])) 

is best way compare 2 models? there stronger way see how 2 models compare rather word word? picture 1000 users/models, each similar number of words in vocab.

there's faulty assumption @ heart of question.

if models useravec , userbvec trained in separate sessions, on separate data, calculated angle between useravec['president'] , userbvec['president'] is, alone, meaningless. there's randomness in algorithm initialization, , in modes of training – via things negative-sampling, frequent-word-downsampling, , arbitrary reordering of training examples due thread-scheduling variability). result, repeated model-training exact same corpus , parameters can result in different coordinates same words.

it's relative distances/directions, among words co-trained in same iterative process, have significance.

so might interesting compare whether 2 model's lists of top-n similar words, particular word, similar. raw value of angle, between coordinates of same word in alternate models, isn't meaningful measure.


Comments

Popular posts from this blog

What is happening when Matlab is starting a "parallel pool"? -

angular - DownloadURL return null in below code -

php - Cannot override Laravel Spark authentication with own implementation -