NEWS CATEGORIZATION USING MACHINE LEARNING

Authors

  • Marko Rašeta Autor

DOI:

https://doi.org/10.24867/16BE22Raseta

Keywords:

text classification, logistic regression, naïve Bayes, Support Vector Machine, neural network

Abstract

In this paper many different models are used in order to classify news articles using their short description, mostly consisting of one or two sentences, in order to determine article’s category (sports, politics, entertainment…). Each of those models is given short description which is first transformed into its vector representation using different methods. Models used are: logistic regression, naïve Bayes, Support Vector Machine, artificial neural network, convolutional neural network and recurrent neural network. For representing text in vector form tf-idf, Word2Vec and GloVe are used. Models were trained on dataset containing articles from Huffington Post from 2012-2018, and evaluation was done using those articles, as well as articles obtained by scraping Huffington Post’s webpage. Models’ accuracies and F-measures are given.

References

[1] Kavi Narayana Murthy (2003), “Automatic Categorization of Telugu News Articles”
[2] Burak Kerim Akkus, Ruket Cakici (2013), “Categorization of Turkish News Documents with Morphological Analysis”
[3] Adhy Rizaldy, Heru Agus Santoso (2017), “Performance improvement of Support Vector Machine (SVM) With information gain on categorization of Indonesian news documents”
[4] Juan Ramos (2003), “Using TF-IDF to Determine Word Relevance in Document Queries”
[5] KW Church (2017), “Word2Vec”
[6] Jeffrey Pennington, Richard Socher, Christopher D. Manning (2014), “GloVe: Global Vectors for Word Representation”

Published

2022-02-03

Issue

Section

Electrotechnical and Computer Engineering