AUTOMATIC DETECTION OF MENU ITEMS IN RESTAURANT REVIEWS
DOI:
https://doi.org/10.24867/06BE08TrpovskiKeywords:
text mining, natural language processing, named entity recognitionAbstract
The purpose of this research is to present one approach for automatically detecting menu items in restaurant reviews. Several machine and deep learning models were trained in order to detect food mentions. Afterwards, several string matching algorithms were applied in order to match food mentions with corresponding menu items. Data was acquired from the website Donesi.com and manually annotated. All used models and algorithms were evaluated.
References
[1] www.donesi.com
[2] Lafferty, J., McCallum A., and Pereira F., 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data.
[3] Hochreiter, S. and Schmidhuber, J., 1997. Long short-term memory. Neural computation, 9 no.8, pp.1735-1780.
[4] Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078
[5] Bojanowski, P., Grave, E., Joulin, A. and Mikolov, T., 2017. Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 5, pp.135-146.
[6] Jason Huggins, et al, 2004. Selenium, https://www.seleniumhq.org
[7] Leonard Richardson 2014, BeautifulSoup4 https://www.crummy. com/software/BeautifulSoup
[8] Kyeongmin Rim, "MAE2: Portable Annotation Tool for General Natural Language Use". In Proceedings of the 12th Joint ACL-ISO Workshop on Interoperable Semantic Annotation, Portorož, Slovenia, May 28, 2016.
[9] Ljubesic, Nikola, Tomaz Erjavec and Darja Fiser. “Corpus-Based Diacritic Restoration for South Slavic Languages.” LREC (2016).
[10] Ljubesic, Nikola and Tomaz Erjavec. “Corpus vs. Lexicon Supervision in Morphosyntactic Tagging: the Case of Slovene.” LREC (2016).
[11] Ljubesic, Nikola, Filip Klubicka, Zeljko Agic and Ivo-Pavao Jazbec. “New Inflectional Lexicons and Training Corpora for Improved Morphosyntactic Annotation of Croatian and Serbian.” LREC (2016).
[12] Agic, Zeljko and Nikola Ljubesic. “Universal Dependencies for Croatian (that work for Serbian, too).” BSNLP@RANLP (2015).
[13] Fišer, D., Ljubešić, N. & Erjavec, T. Lang Resources & Evaluation (2018). https://doi.org/10.1007/s10579-018-9425-z
[14] Milosevic, Nikola “Stemmer for Serbian Language. ” CoRR abs/ 1209.4471 (2012): n. pag.
[15] Taku Kudo, “CRF++: Yet another CRF toolkit“ 2005, https://taku910.github.io/crfpp
[16] Chollet, Francoise et al. “Keras“ 2015, https://keras.io
[17] Damerau, F.J., 1964. A technique for computer detection and correction of spelling errors. Communications of the ACM, 7(3), pp.171-176.
[18] Jaro, M.A., 1989. Advances in record-linkage methodology as applied to matching the 1985 census of Tampa, Florida. Journal of the American Statistical Association, 84(406), pp.414-420.
[19] Winkler, W.E., 1990. String Comparator Metrics and Enhanced Decision Rules in the Fellegi-Sunter Model of Record Linkage.
[20] Zhuo Yang Luo. python-string-similarity 2018 https://github.com/luozhouyang/python-string-similarity
[21] Jean-Bernard Ratte. Jaro Winkler Distance 2015, https:// github.com/nap/jaro-winkler-distance
[2] Lafferty, J., McCallum A., and Pereira F., 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data.
[3] Hochreiter, S. and Schmidhuber, J., 1997. Long short-term memory. Neural computation, 9 no.8, pp.1735-1780.
[4] Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078
[5] Bojanowski, P., Grave, E., Joulin, A. and Mikolov, T., 2017. Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 5, pp.135-146.
[6] Jason Huggins, et al, 2004. Selenium, https://www.seleniumhq.org
[7] Leonard Richardson 2014, BeautifulSoup4 https://www.crummy. com/software/BeautifulSoup
[8] Kyeongmin Rim, "MAE2: Portable Annotation Tool for General Natural Language Use". In Proceedings of the 12th Joint ACL-ISO Workshop on Interoperable Semantic Annotation, Portorož, Slovenia, May 28, 2016.
[9] Ljubesic, Nikola, Tomaz Erjavec and Darja Fiser. “Corpus-Based Diacritic Restoration for South Slavic Languages.” LREC (2016).
[10] Ljubesic, Nikola and Tomaz Erjavec. “Corpus vs. Lexicon Supervision in Morphosyntactic Tagging: the Case of Slovene.” LREC (2016).
[11] Ljubesic, Nikola, Filip Klubicka, Zeljko Agic and Ivo-Pavao Jazbec. “New Inflectional Lexicons and Training Corpora for Improved Morphosyntactic Annotation of Croatian and Serbian.” LREC (2016).
[12] Agic, Zeljko and Nikola Ljubesic. “Universal Dependencies for Croatian (that work for Serbian, too).” BSNLP@RANLP (2015).
[13] Fišer, D., Ljubešić, N. & Erjavec, T. Lang Resources & Evaluation (2018). https://doi.org/10.1007/s10579-018-9425-z
[14] Milosevic, Nikola “Stemmer for Serbian Language. ” CoRR abs/ 1209.4471 (2012): n. pag.
[15] Taku Kudo, “CRF++: Yet another CRF toolkit“ 2005, https://taku910.github.io/crfpp
[16] Chollet, Francoise et al. “Keras“ 2015, https://keras.io
[17] Damerau, F.J., 1964. A technique for computer detection and correction of spelling errors. Communications of the ACM, 7(3), pp.171-176.
[18] Jaro, M.A., 1989. Advances in record-linkage methodology as applied to matching the 1985 census of Tampa, Florida. Journal of the American Statistical Association, 84(406), pp.414-420.
[19] Winkler, W.E., 1990. String Comparator Metrics and Enhanced Decision Rules in the Fellegi-Sunter Model of Record Linkage.
[20] Zhuo Yang Luo. python-string-similarity 2018 https://github.com/luozhouyang/python-string-similarity
[21] Jean-Bernard Ratte. Jaro Winkler Distance 2015, https:// github.com/nap/jaro-winkler-distance
Downloads
Published
2019-12-21
Issue
Section
Electrotechnical and Computer Engineering