Tweeted By @deliprao
Why? The word “the”, for e.g, might appear in all documents. Similarly “a”, “an” ... As a consequence the inverted index blows up in size. And not just the construction cost, but also the retrieval cost goes up. Simple solution from the 70s: just drop the high frequency words.
— Delip Rao (@deliprao) November 30, 2018