AUTOMATSKA DETEKCIJA STAVKI MENIJA UNUTAR TEKSTOVA RECENZIJA RESTORANA

Igor Trpovski

doi:10.24867/06BE08Trpovski

Electrotechnical and Computer Engineering

Vol. 35 No. 01 (2020): Proceedings of the Faculty of Technical Sciences

AUTOMATIC DETECTION OF MENU ITEMS IN RESTAURANT REVIEWS

Igor Trpovski

06BE08Trpovski.pdf (Serbian)

DOI:: https://doi.org/10.24867/06BE08Trpovski
Submitted: December 21, 2019
Published: 2019-12-21

Abstract

The purpose of this research is to present one approach for automatically detecting menu items in restaurant reviews. Several machine and deep learning models were trained in order to detect food mentions. Afterwards, several string matching algorithms were applied in order to match food mentions with corresponding menu items. Data was acquired from the website Donesi.com and manually annotated. All used models and algorithms were evaluated.

References

[1] www.donesi.com
[2] Lafferty, J., McCallum A., and Pereira F., 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data.
[3] Hochreiter, S. and Schmidhuber, J., 1997. Long short-term memory. Neural computation, 9 no.8, pp.1735-1780.
[4] Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078
[5] Bojanowski, P., Grave, E., Joulin, A. and Mikolov, T., 2017. Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 5, pp.135-146.
[6] Jason Huggins, et al, 2004. Selenium, https://www.seleniumhq.org
[7] Leonard Richardson 2014, BeautifulSoup4 https://www.crummy. com/software/BeautifulSoup
[8] Kyeongmin Rim, "MAE2: Portable Annotation Tool for General Natural Language Use". In Proceedings of the 12th Joint ACL-ISO Workshop on Interoperable Semantic Annotation, Portorož, Slovenia, May 28, 2016.
[9] Ljubesic, Nikola, Tomaz Erjavec and Darja Fiser. “Corpus-Based Diacritic Restoration for South Slavic Languages.” LREC (2016).
[10] Ljubesic, Nikola and Tomaz Erjavec. “Corpus vs. Lexicon Supervision in Morphosyntactic Tagging: the Case of Slovene.” LREC (2016).
[11] Ljubesic, Nikola, Filip Klubicka, Zeljko Agic and Ivo-Pavao Jazbec. “New Inflectional Lexicons and Training Corpora for Improved Morphosyntactic Annotation of Croatian and Serbian.” LREC (2016).
[12] Agic, Zeljko and Nikola Ljubesic. “Universal Dependencies for Croatian (that work for Serbian, too).” BSNLP@RANLP (2015).
[13] Fišer, D., Ljubešić, N. & Erjavec, T. Lang Resources & Evaluation (2018). https://doi.org/10.1007/s10579-018-9425-z
[14] Milosevic, Nikola “Stemmer for Serbian Language. ” CoRR abs/ 1209.4471 (2012): n. pag.
[15] Taku Kudo, “CRF++: Yet another CRF toolkit“ 2005, https://taku910.github.io/crfpp
[16] Chollet, Francoise et al. “Keras“ 2015, https://keras.io
[17] Damerau, F.J., 1964. A technique for computer detection and correction of spelling errors. Communications of the ACM, 7(3), pp.171-176.
[18] Jaro, M.A., 1989. Advances in record-linkage methodology as applied to matching the 1985 census of Tampa, Florida. Journal of the American Statistical Association, 84(406), pp.414-420.
[19] Winkler, W.E., 1990. String Comparator Metrics and Enhanced Decision Rules in the Fellegi-Sunter Model of Record Linkage.
[20] Zhuo Yang Luo. python-string-similarity 2018 https://github.com/luozhouyang/python-string-similarity
[21] Jean-Bernard Ratte. Jaro Winkler Distance 2015, https:// github.com/nap/jaro-winkler-distance