AUTOMATSKA REKONSTRUKCIJA DIJAKRITIČKIH ZNAKOVA U TEKSTOVIMA NA SRPSKOM JEZIKU PRIMENOM MAŠINSKOG UČENJA

Vuk Stanojev

doi:10.24867/21BE20Stanojev

Authors

Vuk Stanojev Autor

DOI:

https://doi.org/10.24867/21BE20Stanojev

Keywords:

Reconstruction of diacritical marks, Machine learning, Classification

Abstract

This paper contains an overview of problems that follow reconstruction of diacritical marks in text in Serbian language. The problem is shown as a classification problem and three methods of machine learning are proposed with whom the results are obtained: Feed Forward Neural Networks, Short-Term Memory Neural Networks and Convolutional Neural Networks. Methods are compared based on classification metrics and for the best method, results of diacritic restoration is shown.

References

[1] Nikola Ljubešić, Tomaž Erjavec, Darja Fišer, „Corpus-Based Diacritic Restoration for South Slavic Languages“
[2] Jakub Náplava, Milan Straka, Pavel Straňák, Jan Hajič, “Diacritics Restoration Using Neural Networks”
[3] Ali Fadel, Ibraheem Tuffaha, Bara’ Al-Jawarneh, and Mahmoud Al-Ayyoub, “Neural Arabic Text Diacritization: State of the Art Results and a Novel Approach for Machine Translation”, Proceedings of the 6th Workshop on Asian Translation, pages 215–225 Hong Kong, China, November 4, 2019. Association for Computational Linguistics
[4] Stefan Ruseti, Teodor-Mihai Cotet, Mihai Dascalu: Romanian Diacritics Restoration Using Recurrent Neural Networks. Septembar 2020.
[5] https://machinelearningmastery.com/how-to-one-hot-encode-sequence-data-in-python/, 17.08.2022.
[6] Tijana Nosek, Branko Brkljač, Danica Despotović, Milan Sečujski, Tatjana Lončar-Turukalo: Praktikum iz mašinskog učenja, Fakultet Tehničkih Nauka, Univerzitet u Novom Sadu
[7] https://en.wikipedia.org/wiki/Feedforward_neural_network, 17.10.2022.
[8] https://towardsdatascience.com/illustrated-guide-to-recurrent-neural-networks-79e5eb8049c9, 17.10.2022
[9] https://colah.github.io/posts/2015-08-Understanding-LSTMs/, 17.10.2022.
[10] https://colah.github.io/posts/2014-07-Conv-Nets-Modular/, 18.10.2022.
[11] Yann LeCun, Leon Bottou, Yoshua Bengio, Patrick Haffner, “Gradient-Based Learning Applied to Document Recognition”, “Proc. Of the IEEE”, Novembar 1998
[12] Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, Dhruv Batra, Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization, International Journal of Computer Vision, 2019.

AUTOMATIC RECONSTRUCTION OF DIACRITICAL MARKS IN TEXTS IN THE SERBIAN LANGUAGE USING MACHINE LEARNING

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section

Developed By

Language

Information