Problem that idf becomes negative when writing tf-idf algorithm

We're working on a simple code related to information retrieval.

I'm using TF-IDF to make it, but I have a question as I got IDF.

When obtaining an IDF, see the article that it is common to add 1 to the denominator because the denominator becomes 0 if a specific word does not exist in the document,

log (number of complete documents / (1 + number of documents containing word t)

I have written the code according to the formula above.

But what I'm curious about is if the word t is included in all documents.

Then the denominator becomes larger than the molecule, and the value of the log idf becomes negative, but is there any problem in implementing the algorithm even if the value of the idf becomes negative?

First of all, the results come out the way I want them to, but... I can't find the contents of this part even if I search for tf-idf related articles, so I'm posting a question. Please reply. Thank you.

algorithm

2022-09-21 21:14

1 Answers

If it's a negative number, you can change it to zero.

2022-09-21 21:14

If you have any answers or tips

Popular Tags

python x 4647

android x 1593

java x 1494

javascript x 1427

c x 927

c++ x 878

ruby-on-rails x 696

php x 692

python3 x 685

html x 656