Usage¶

The best model for topic classification is the quadsemble model, which incorporates a combination of the TF-IDF, TF-IDF Bigrams, distilBERT, and Doc2Vec embeddings.

To use one it to classify a piece of text, simply do the following:

>>> from mitnewsclassify import quadsemble

>>> quadsemble.gettags("Republicans proceeded with the third night of their national convention, but many Americans — particularly those in the path of Hurricane Laura — were focused on more immediate concerns.")
['elections', 'presidents and presidency (us)', 'presidential elections (us)', 'hurricanes and tropical storms', 'presidential election of 2004']

If you are interested in using these models for your own further finetuning or modeling, you can individually access the model features through getfeatures on the TF-IDF, TF-IDF Bigrams, Doc2Vec, GPT-2, and distilBERT models.