python - Finding closest spellings by the same first letter via different distance measures -
i trying write function find closest spellings word (which may have been incorrectly spelled) 'by same first letter' through different n-grams , distance measures.
for have
from nltk.corpus import words nltk import ngrams nltk.metrics.distance import edit_distance, jaccard_distance first_letters = ['a','b','c'] spellings = words.words() def recommendation(word): n = 3 # n means 'n'-grams, here use 3 example spellings_new = [w w in spellings if (w[0] in first_letters)] dists = [________(set(ngrams(word, n)), set(ngrams(w, n))) w in spellings_new] # ______ distance measure return spellings_new[dists.index(min(dists))] the rest seems straightforward, don't know how specify 'same initial letter' condition. in particular, if misspelled word starts letter 'a', corrected word recommended '.words' having minimum distance measure misspelled word should starts 'a'. on , forth. can see above function block, use '(w[0] in first_letters)' 'initial letter condition,' doesn't trick , return letters start different initials. have yet find similar threads on board addressing question here, appreciated if enlighten me on how specify 'initial letter condition'. if question has somehow been asked before , deemed inappropriate, remove it.
thank you.
you're quite close. w[0] == word[0] can used check if first letter same. after set(w) , set(word) can used change words sets of letters. passed jaccard_distance, because that's had imported. it's possible there's better solution.
def recommendation(word): n = 3 # n means 'n'-grams, here use 3 example spellings_new = [w w in spellings if (w[0] == word[0])] dists = [jaccard_distance(set(w), set(word)) w in spellings_new] return spellings_new[dists.index(min(dists))]
Comments
Post a Comment