Skip to main navigation menu Skip to main content Skip to site footer

Electrotechnical and Computer Engineering

Vol. 37 No. 02 (2022): Proceedings of the Faculty of Technical Sciences

NEWS CATEGORIZATION USING MACHINE LEARNING

  • Marko Rašeta
DOI:
https://doi.org/10.24867/16BE22Raseta
Submitted
September 30, 2021
Published
2022-02-03

Abstract

In this paper many different models are used in order to classify news articles using their short description, mostly consisting of one or two sentences, in order to determine article’s category (sports, politics, entertainment…). Each of those models is given short description which is first transformed into its vector representation using different methods. Models used are: logistic regression, naïve Bayes, Support Vector Machine, artificial neural network, convolutional neural network and recurrent neural network. For representing text in vector form tf-idf, Word2Vec and GloVe are used. Models were trained on dataset containing articles from Huffington Post from 2012-2018, and evaluation was done using those articles, as well as articles obtained by scraping Huffington Post’s webpage. Models’ accuracies and F-measures are given.

References

[1] Kavi Narayana Murthy (2003), “Automatic Categorization of Telugu News Articles”
[2] Burak Kerim Akkus, Ruket Cakici (2013), “Categorization of Turkish News Documents with Morphological Analysis”
[3] Adhy Rizaldy, Heru Agus Santoso (2017), “Performance improvement of Support Vector Machine (SVM) With information gain on categorization of Indonesian news documents”
[4] Juan Ramos (2003), “Using TF-IDF to Determine Word Relevance in Document Queries”
[5] KW Church (2017), “Word2Vec”
[6] Jeffrey Pennington, Richard Socher, Christopher D. Manning (2014), “GloVe: Global Vectors for Word Representation”