NEWS CATEGORIZATION USING MACHINE LEARNING
DOI:
https://doi.org/10.24867/16BE22RasetaKeywords:
text classification, logistic regression, naïve Bayes, Support Vector Machine, neural networkAbstract
In this paper many different models are used in order to classify news articles using their short description, mostly consisting of one or two sentences, in order to determine article’s category (sports, politics, entertainment…). Each of those models is given short description which is first transformed into its vector representation using different methods. Models used are: logistic regression, naïve Bayes, Support Vector Machine, artificial neural network, convolutional neural network and recurrent neural network. For representing text in vector form tf-idf, Word2Vec and GloVe are used. Models were trained on dataset containing articles from Huffington Post from 2012-2018, and evaluation was done using those articles, as well as articles obtained by scraping Huffington Post’s webpage. Models’ accuracies and F-measures are given.
References
[2] Burak Kerim Akkus, Ruket Cakici (2013), “Categorization of Turkish News Documents with Morphological Analysis”
[3] Adhy Rizaldy, Heru Agus Santoso (2017), “Performance improvement of Support Vector Machine (SVM) With information gain on categorization of Indonesian news documents”
[4] Juan Ramos (2003), “Using TF-IDF to Determine Word Relevance in Document Queries”
[5] KW Church (2017), “Word2Vec”
[6] Jeffrey Pennington, Richard Socher, Christopher D. Manning (2014), “GloVe: Global Vectors for Word Representation”