python - Modify tf-idf vectorizer for some keywords -


i creating tf-idf matrix finding cosine similarity. want frequent words set have more weightage(i.e, tf-idf value).

tfidf_vectorizer = tfidfvectorizer() tfidf_matrix = tfidf_vectorizer.fit_transform(documents) 

how can modify above tfidf_matrix words in particular set.

i converted tfidf-matrix of csr-type 2-d array using,

my_matrix = tfidf_matrix.toarray() 

then, found out index of keyword using,

tfidf_vectorizer.vocabulary_.get(keyword) 

after that, iterated on 2-d matrix , changed tf-idf value according requirements. here, keyword_list contains index of keywords want modify tf-idf value.

    in range(0, len(my_matrix)):     key in keyword_list:         if key != none:             key = (int)(key)         if my_matrix[i][key] > 0.0:             my_matrix[i][key] = new_value 

again, changed my_matrix csr_type using,

tfidf_matrix = sparse.csr_matrix(my_matrix) 

hence, tfidf_matrix modified list of keywords.


Comments

Popular posts from this blog

java - Jasper subreport showing only one entry from the JSON data source when embedded in the Title band -

serialization - Convert Any type in scala to Array[Byte] and back -

SonarQube Plugin for Jenkins does not find SonarQube Scanner executable -