Skip to main navigation menu Skip to main content Skip to site footer

Electrotechnical and Computer Engineering

Vol. 35 No. 01 (2020): Proceedings of the Faculty of Technical Sciences

AUTOMATIC BOOK TOPIC DETECTION USING NATURAL LANGUAGE PROCESSING

  • Vlada Đurđević
DOI:
https://doi.org/10.24867/06BE45Djurdjevic
Submitted
December 30, 2019
Published
2019-12-30

Abstract

This paper presents a performance analysis of an LDA model created for determining topics from a book corpus. A detailed analysis of four crucial steps regarding the implementation of the model is presented, data preprocessing, NER method, determining the optimal number of topics and choosing the best implementation algorithm. For each of the steps, a number of different methods for overcoming the problems that arise are demonstrated. The obtained results for each of the different methods are presented and discussed in detail. Finally, the optimal method is chosen to be a part of the resulting model.

References

[1] O. Hrnjaković, V. Đurđević, D. Bujiša, Predikcija popularnosti knjiga, Fakultet tehničkih nauka, Novi Sad, 2019
[2] Goodreads. (2018). [online] Dostupno na: https://www.goodreads.com/
[3] J. Millar, G. Peterson, M. Mendenhall, Document Clustering and Visualization with Latent Dirichlet Allocation and Self-Organizing Maps, Air Force Institute of Technology, 2009
[4] S. Crossley, M. Dascalau, D. McNamara, How Important Is Size? An Investigation of Corpus Size and Meaning in both Latent Semantic Analysis and Latent Dirichlet Allocation
[5] D. Alvarez-Melis, M. Saveski, Topic Modeling in Twitter: Aggregating Tweets by Conversations, Massachusetts Institute of Technology, Cambridge, MA, USA, 2016
[6] W. Zhao, J. Chen, R. Perkins, Z. Liu, W. Ge, Y. Ding, W. Zou, A heuristic approach to determine an appropriate number of topics in topic modeling, 2015
[7] J. Murdock, C. Allen, Visualization Techniques for Topic Model Checking, Program in Cognitive Science, Indiana University, USA
[8] M. Roder, A. Both, A. Hinneburg, Exploring the Space of Topic Coherence Measures, Leipzig University, R&D, Unister GmbH, Martin-Luther University, Germany