python - Modify tf-idf vectorizer for some keywords -
i creating tf-idf matrix finding cosine similarity. want frequent words set have more weightage(i.e, tf-idf value).
tfidf_vectorizer = tfidfvectorizer() tfidf_matrix = tfidf_vectorizer.fit_transform(documents)
how can modify above tfidf_matrix words in particular set.
i converted tfidf-matrix of csr-type 2-d array using,
my_matrix = tfidf_matrix.toarray()
then, found out index of keyword using,
tfidf_vectorizer.vocabulary_.get(keyword)
after that, iterated on 2-d matrix , changed tf-idf value according requirements. here, keyword_list contains index of keywords want modify tf-idf value.
in range(0, len(my_matrix)): key in keyword_list: if key != none: key = (int)(key) if my_matrix[i][key] > 0.0: my_matrix[i][key] = new_value
again, changed my_matrix csr_type using,
tfidf_matrix = sparse.csr_matrix(my_matrix)
hence, tfidf_matrix modified list of keywords.
Comments
Post a Comment