machine learning - Word embedding training -


i have 1 corpus word embedding. using corpus, trained word embedding. however, whenever train word embedding, results quite different(this results based on k-nearest neighbor(knn)). example, in first training, 'computer' nearest neighbor words 'laptops', 'computerized' ,'hardware'. but, in second training, knn words 'software', 'machine',...('laptops' low ranked!) - training performed independently 20 epochs, , hyper-parameters same.

i want train word embedding similar(e.g., 'laptops' high ranked). how should do? should modulate hyper-parameters(learning rate, initializing, etc)?

you didn't word2vec software you're using, might change relevant factors.

the word2vec algorithm inherently uses randomness, in both initialization , several aspects of training (like selection of negative-examples, if using negative-sampling, or random downsampling of very-frequent words). additionally, if you're doing multithreaded training, essentially-random jitter in os thread scheduling change order of training examples, introducing source of randomness. shouldn't expect subsequent runs, exact same parameters , corpus, give identical results.

still, enough data, suitable parameters, , proper training loop, relative-neighbors results should similar run-to-run. if it's not, more data or more iterations might help.

wildly-different results if model overlarge (too many dimensions/words) corpus – , prone overfitting. is, finds great configuration data, through memorizing idiosyncracies, without achieving generalization power. , if such overfitting possible, there typically many equally-good such memorizations – can different run-to-tun. meanwhile, right-sized model lots of data instead capturing true generalities, , those more consistent run-to-run, despite randomization.

getting more data, using smaller vectors, using more training passes, or upping minimum-count of word-occurrences retain/train word might help. (very-infrequent words don't high-quality vectors, wind interfering quality of other words, , randomly intruding in most-similar lists.)

to know else might awry, should clarify in question things like:

  • software used
  • modes/metaparameters used
  • corpus size, in number of examples, average example size in words, , unique-words count (both in raw corpus, , after minumum-count applied)
  • methods of preprocessing
  • code you're using training (if you're managing multiple training-passes yourself)

Comments

Popular posts from this blog

Is there a better way to structure post methods in Class Based Views -

performance - Why is XCHG reg, reg a 3 micro-op instruction on modern Intel architectures? -

jquery - Responsive Navbar with Sub Navbar -