machine learning - Optimizing word2vec model comparisons -
i have word2vec model every user, understand 2 words on different models. there more optimized way compare trained models this?
useravec = word2vec.load(useravec.w2v) userbvec = word2vec.load(userbvec.w2v) #for word in vocab, perform dot product: cosine_similarity = np.dot(useravec['president'], userbvec['president'])/(np.linalg.norm(useravec['president'])* np.linalg.norm(userbvec['president'])) is best way compare 2 models? there stronger way see how 2 models compare rather word word? picture 1000 users/models, each similar number of words in vocab.
there's faulty assumption @ heart of question.
if models useravec , userbvec trained in separate sessions, on separate data, calculated angle between useravec['president'] , userbvec['president'] is, alone, meaningless. there's randomness in algorithm initialization, , in modes of training – via things negative-sampling, frequent-word-downsampling, , arbitrary reordering of training examples due thread-scheduling variability). result, repeated model-training exact same corpus , parameters can result in different coordinates same words.
it's relative distances/directions, among words co-trained in same iterative process, have significance.
so might interesting compare whether 2 model's lists of top-n similar words, particular word, similar. raw value of angle, between coordinates of same word in alternate models, isn't meaningful measure.
Comments
Post a Comment