python - Find an element in the list, then to compare with percent vs a string element and use SequenceMatcher -
i try comparison between element list , string element. however, i've used nltk lib i'm forced use difflib sequencematcher. here script nltk lib:
import nltk import nltk.corpus import nltk.tokenize import nltk.stem.snowball import string import re newinputlist = 'cars (2006)' newlistmovies = ['adult world (2013)', 'trolls (2016)', 'cars (2006)', 'harry potter , prisoner of azkaban (2004)', 'the sex monster (1999)', 'pitch perfect 2 (2015)', 'avengers: age of ultron (2015)', 'jurrasic world (2015)'] stopwords = nltk.corpus.stopwords.words('english') stopwords.extend(string.punctuation) stopwords.append('') stemmer = nltk.stem.snowball.snowballstemmer('english') def get_match_ratio(s1, s2): tokens_s1 = [token token in nltk.word_tokenize(s1.lower())] tokens_s2 = [token token in nltk.word_tokenize(s2.lower())] stems_s1 = [stemmer.stem(token) token in tokens_s1] stems_s2 = [stemmer.stem(token) token in tokens_s2] ratio = len(set(stems_s1).intersection(stems_s2)) / float(len(set(stems_s1).union(stems_s2))) return ratio similarity = [[item, get_match_ratio(newinputlist, item)] item in newlistmovies] itemmatch = [x[0] x in similarity if x[1] > 0.5] print itemmatch output :
'cars (2006)' # percent value : 1.0 any ideas code difflib only?
Comments
Post a Comment