TF-IDF

Yash Jain
1 min readFeb 18, 2021

TF : Term Frequency

IDF : Inverse Document Frequency(idf) : The formulae is log(N/No. of document in which the word appear)

Here N is the total number of documents.

The intuition behind IDF is: Consider we have 100 documents . The frequency of word “insurance” is 100 and the frequency of word “try” is also 100. Now the thing is that try appears in all the document 1 time whereas insurance does not appear in all the document whereas it appear in some of the document multiple time.

IDF for term

insurance = log(100/50) = log(2) = 0.30

try = log(100/100) = log(1) = 0

Thus although the frequency of both the word in the corpus is same but “insurance” has more weight compare to “try”

TF: Term Frequency : The weight of a term that occurs in a document is the term frequency. It is calculated as follows

Number of times the word appear in the document / total number of words in the

Consider a document : “How are you today , yes today”

term frequency of “today” = 2/6 = 0.33

term frequency of “yes” = 1 /6 = 0.16

Hence the weight of term today is more compare to the weight of term yes as it occurs multiple time in the document

TF-IDF is the multiplication of tf * idf

--

--