UPOTREBA VEŠTAČKE INTELIGENCIJE ZA DETEKCIJU „PHISHING“ E-POŠTE: PRISTUP ZASNOVAN NA PRIRODNOJ OBRADI JEZIKA

Nemanja Šepa

doi:10.24867/32OI04Sepa

Information Systems Engineering

Vol. 40 No. 11 (2025): Proceedings of the Faculty of Technical Sciences

THE USAGE OF ARTIFICIAL INTELLIGENCE FOR “PHISHING” E-MAIL DETECTION: AN APPROACH BASED ON NATURAL LANGUAGE PROCESSING

Nemanja Šepa

32OI04Sepa.pdf (Serbian)

DOI:: https://doi.org/10.24867/32OI04Sepa
Submitted: November 12, 2025
Published: 2026-03-09

Abstract

This paper explores the application of advanced artificial intelligence and machine learning algorithms to detect phishing attacks by analyzing email content. By comparing the performance of Naive Bayesian, XGBoost, RNN, and GRU models, accuracy and efficiency are analyzed, while considering the key advantages and limitations of these models in modern cyber security systems.

References

[1] Jansson, K.; von Solms, R. (2011-11-09). "Phishing for phishing awareness". Behaviour & Information Technology. 32 (6): 584–593. doi:10.1080/0144929X.2011.632650. ISSN 0144-929X. S2CID 5472217
[2] Toolan F, Carthy J (2009). Phishing detection using classifier ensembles. In: eCrime researchers summit, IEEE conference Tacoma, WA, USA, 2009, pp 1–9
[3] Toolan F, Carthy J (2010). Feature selection for spam and phishing detection. E-Crime Researchers Summit, Dallas, pp 1–12
[4] Larsson, S., & Heintz, F. (2020). Transparency in artificial intelligence. Department of Technology and Society, Lund University
[5] Barredo Arrieta, A., Díaz-Rodríguez, N., Del Ser, J., Bennetot, A., Tabik, S., Barbado, A., Garcia, S., Gil-Lopez, S., Molina, D., Benjamins, R., Chatila, R., & Herrera, F. (2020). Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Information Fusion, 58, 82-115
[6] https://www.kaggle.com/datasets/subhajournal/phishingemails
[7] Kadhim, Ammar. (2018). An Evaluation of Preprocessing Techniques for Text Classification. International Journal of Computer Science and Information Security,. 16. 22-32
[8] Hassan Saif, Miriam Fernandez, Yulan He, and Harith Alani. 2014. On Stopwords, Filtering and Data Sparsity for Sentiment Analysis of Twitter. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 810–817, Reykjavik, Iceland. European Language Resources Association (ELRA)
[9] Khyani, Divya & B S, Siddhartha. (2021). An Interpretation of Lemmatization and Stemming in Natural Language Processing. Shanghai Ligong Daxue Xuebao/Journal of University of Shanghai for Science and Technology. 22. 350-357
[10] Cavnar, William & Trenkle, John. (2001). N-Gram-Based Text Categorization. Proceedings of the Third Annual Symposium on Document Analysis and Information Retrieval
[11] https://tita.lecturer.pens.ac.id/TextMining_SDT/03.%20Structured%20Data/NLP_%20Bag%20of%20words%20and%20TF-IDF%20explained!%20_%20by%20Koushik%20kumar%20_%20Medium.pdf

[1] [1] Jansson, K.; von Solms, R. (2011-11-09). "Phishing for phishing awareness". Behaviour & Information Technology. 32 (6): 584–593. doi:10.1080/0144929X.2011.632650. ISSN 0144-929X. S2CID 5472217

[2] [2] Toolan F, Carthy J (2009). Phishing detection using classifier ensembles. In: eCrime researchers summit, IEEE conference Tacoma, WA, USA, 2009, pp 1–9

[3] [3] Toolan F, Carthy J (2010). Feature selection for spam and phishing detection. E-Crime Researchers Summit, Dallas, pp 1–12

[4] [4] Larsson, S., & Heintz, F. (2020). Transparency in artificial intelligence. Department of Technology and Society, Lund University

[5] [5] Barredo Arrieta, A., Díaz-Rodríguez, N., Del Ser, J., Bennetot, A., Tabik, S., Barbado, A., Garcia, S., Gil-Lopez, S., Molina, D., Benjamins, R., Chatila, R., & Herrera, F. (2020). Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Information Fusion, 58, 82-115

[6] [6] https://www.kaggle.com/datasets/subhajournal/phishingemails

[7] [7] Kadhim, Ammar. (2018). An Evaluation of Preprocessing Techniques for Text Classification. International Journal of Computer Science and Information Security,. 16. 22-32

[8] [8] Hassan Saif, Miriam Fernandez, Yulan He, and Harith Alani. 2014. On Stopwords, Filtering and Data Sparsity for Sentiment Analysis of Twitter. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 810–817, Reykjavik, Iceland. European Language Resources Association (ELRA)

[9] [9] Khyani, Divya & B S, Siddhartha. (2021). An Interpretation of Lemmatization and Stemming in Natural Language Processing. Shanghai Ligong Daxue Xuebao/Journal of University of Shanghai for Science and Technology. 22. 350-357

[10] [10] Cavnar, William & Trenkle, John. (2001). N-Gram-Based Text Categorization. Proceedings of the Third Annual Symposium on Document Analysis and Information Retrieval

[11] [11] https://tita.lecturer.pens.ac.id/TextMining_SDT/03.%20Structured%20Data/NLP_%20Bag%20of%20words%20and%20TF-IDF%20explained!%20_%20by%20Koushik%20kumar%20_%20Medium.pdf