EXTRACTIVE TEXT SUMMARIZATION OF HOTEL REVIEWS USING WORD2VEC, DOC2VEC AND GPT-3.5 MODELS
DOI:
https://doi.org/10.24867/28BE33TomcicKeywords:
Extractive text summarization, NLP, Word2Vec, Doc2Vec, GPT-3.5, TextRankAbstract
This paper shows the extractive text summarization of hotel reviews using Word2Vec, Doc2Vec and GPT-3.5 models. The theoretical foundations of the NLP, the models and the TextRank algorithm are explained. The process of text summarization used in the implementation of this solution is presented. The dataset used for model training and evaluation was downloaded from the booking.com. The data is publicly available, and it is owned by the website. The paper shows experiments performed using different variations of the models, as well as using different datasets. The results of the experiments are compared with each other, as well as with the results of one of the related works.
References
[1] LI, JUN; HUANG, GUIMIN; FAN, CHUNLI; SUN, ZHENGLIN; and ZHU, HONGTAO (2019) "Key word extraction for short text via word2vec, doc2vec, and textrank," Turkish Journal of Electrical Engineering and Computer Sciences: Vol. 27: No. 3, Article 17.
[2] Milad Moradi, Maedeh Dashti, Matthias Samwald, "Summarization of biomediacal articles using domain-specific word embeddings and graph ranking", Journal of Biomedical Informatics: Vol. 107, July 2020, 103452.
[3] https://www.geeksforgeeks.org/natural-language-processing-overview/ (pristupljeno u januaru 2024.)
[4] https://medium.com/sciforce/towards-automatic-text-summarization-extractive-methods-e8439cd54715 (pristupljeno u februaru 2024.)
[5] https://www.geeksforgeeks.org/python-word-embedding-using-word2vec/ (pristupljeno u februaru 2024.)
[6]https://wiki.app.uib.no/info216/images/c/c4/IntroToWordEmbeddings.pdf (pristupljeno u februaru 2024.)
[7] https://medium.com/@manansuri/a-dummys-guide-to-word2vec-456444f3c673 (pristupljeno u februaru 2024.)
[8] https://www.geeksforgeeks.org/doc2vec-in-nlp/ (pristupljeno u februaru 2024.)
[9] A. Hernández-Castñeda, R. A. García-Hernández, Y. Ledeneva and C. E. Millán-Hernández, "Extractive automatic text summarization based on lexical-semantic keywords", IEEE Access, vol. 8, pp. 49896-49907, 2020.
[10] R. Mihalcea and P. Tarau, "Textrank: Bringing order into text", Proceedings of the 2004 conference on empirical methods in natural language processing, pp. 404-411, July 2004.
[11] https://www.linkedin.com/pulse/chatgpts-guide-understanding-gpt-35-architecture-heena-koshti#:~:text=The%20GPT%2D3.5%20architecture%20is%20a%20neural%20network%2Dbased%20language,text%20completion%2C%20and%20question%20answering (pristupljeno u februaru 2024.)
[12] https://medium.com/nerd-for-tech/gpt3-and-chat-gpt-detailed-architecture-study-deep-nlp-horse-db3af9de8a5d (pristupljeno u februaru 2024.)
[2] Milad Moradi, Maedeh Dashti, Matthias Samwald, "Summarization of biomediacal articles using domain-specific word embeddings and graph ranking", Journal of Biomedical Informatics: Vol. 107, July 2020, 103452.
[3] https://www.geeksforgeeks.org/natural-language-processing-overview/ (pristupljeno u januaru 2024.)
[4] https://medium.com/sciforce/towards-automatic-text-summarization-extractive-methods-e8439cd54715 (pristupljeno u februaru 2024.)
[5] https://www.geeksforgeeks.org/python-word-embedding-using-word2vec/ (pristupljeno u februaru 2024.)
[6]https://wiki.app.uib.no/info216/images/c/c4/IntroToWordEmbeddings.pdf (pristupljeno u februaru 2024.)
[7] https://medium.com/@manansuri/a-dummys-guide-to-word2vec-456444f3c673 (pristupljeno u februaru 2024.)
[8] https://www.geeksforgeeks.org/doc2vec-in-nlp/ (pristupljeno u februaru 2024.)
[9] A. Hernández-Castñeda, R. A. García-Hernández, Y. Ledeneva and C. E. Millán-Hernández, "Extractive automatic text summarization based on lexical-semantic keywords", IEEE Access, vol. 8, pp. 49896-49907, 2020.
[10] R. Mihalcea and P. Tarau, "Textrank: Bringing order into text", Proceedings of the 2004 conference on empirical methods in natural language processing, pp. 404-411, July 2004.
[11] https://www.linkedin.com/pulse/chatgpts-guide-understanding-gpt-35-architecture-heena-koshti#:~:text=The%20GPT%2D3.5%20architecture%20is%20a%20neural%20network%2Dbased%20language,text%20completion%2C%20and%20question%20answering (pristupljeno u februaru 2024.)
[12] https://medium.com/nerd-for-tech/gpt3-and-chat-gpt-detailed-architecture-study-deep-nlp-horse-db3af9de8a5d (pristupljeno u februaru 2024.)
Downloads
Published
2024-09-06
Issue
Section
Electrotechnical and Computer Engineering