AUTOMATIC RECONSTRUCTION OF DIACRITICAL MARKS IN TEXTS IN THE SERBIAN LANGUAGE USING MACHINE LEARNING

Authors

  • Vuk Stanojev Autor

DOI:

https://doi.org/10.24867/21BE20Stanojev

Keywords:

Reconstruction of diacritical marks, Machine learning, Classification

Abstract

This paper contains an overview of problems that follow reconstruction of diacritical marks in text in Serbian language. The problem is shown as a classification problem and three methods of machine learning are proposed with whom the results are obtained: Feed Forward Neural Networks, Short-Term Memory Neural Networks and Convolutional Neural Networks. Methods are compared based on classification metrics and for the best method, results of diacritic restoration is shown.

References

[1] Nikola Ljubešić, Tomaž Erjavec, Darja Fišer, „Corpus-Based Diacritic Restoration for South Slavic Languages“
[2] Jakub Náplava, Milan Straka, Pavel Straňák, Jan Hajič, “Diacritics Restoration Using Neural Networks”
[3] Ali Fadel, Ibraheem Tuffaha, Bara’ Al-Jawarneh, and Mahmoud Al-Ayyoub, “Neural Arabic Text Diacritization: State of the Art Results and a Novel Approach for Machine Translation”, Proceedings of the 6th Workshop on Asian Translation, pages 215–225 Hong Kong, China, November 4, 2019. Association for Computational Linguistics
[4] Stefan Ruseti, Teodor-Mihai Cotet, Mihai Dascalu: Romanian Diacritics Restoration Using Recurrent Neural Networks. Septembar 2020.
[5] https://machinelearningmastery.com/how-to-one-hot-encode-sequence-data-in-python/, 17.08.2022.
[6] Tijana Nosek, Branko Brkljač, Danica Despotović, Milan Sečujski, Tatjana Lončar-Turukalo: Praktikum iz mašinskog učenja, Fakultet Tehničkih Nauka, Univerzitet u Novom Sadu
[7] https://en.wikipedia.org/wiki/Feedforward_neural_network, 17.10.2022.
[8] https://towardsdatascience.com/illustrated-guide-to-recurrent-neural-networks-79e5eb8049c9, 17.10.2022
[9] https://colah.github.io/posts/2015-08-Understanding-LSTMs/, 17.10.2022.
[10] https://colah.github.io/posts/2014-07-Conv-Nets-Modular/, 18.10.2022.
[11] Yann LeCun, Leon Bottou, Yoshua Bengio, Patrick Haffner, “Gradient-Based Learning Applied to Document Recognition”, “Proc. Of the IEEE”, Novembar 1998
[12] Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, Dhruv Batra, Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization, International Journal of Computer Vision, 2019.

Published

2023-01-08

Issue

Section

Electrotechnical and Computer Engineering