MULTIMODAL EMOTION RECOGNITION USING COMPRESSED GRAPH NEURAL NETWORKS
DOI:
https://doi.org/10.24867/28BE43DjurkicKeywords:
graph neural networks, emotion recognition, multimodal data, compressionAbstract
In this paper, first, an algorithm for emotion recognition based on sound, text and video, using graph neural networks is described. Then, the results of applying compression to the model are present, which is needed so that large models can be used on smaller devices.
References
[1] N.H. Frijda, The emotions, Cambridge University Press; 1986.
[2] Т. Носек, Б. Бркљач, Д. Деспотовић, М. Сечујски, Т. Лончар-Турукало, „Практикум из машинског учења“, Факултет техничких наука, Универзитет у Новом Саду, 2020.
[3] H. Abdi, D. Valentin, & B. Edelman, Neural networks (No. 124). Sage, 1999.
[4] J. O. Neill, An overview of neural network compression. arXiv preprint arXiv:2006.03669, 2020
[5] A. Joshi, A. Bhat, A. Jain, A. V. Singh, & A. Modi, COGMEN: COntextualized GNN based multimodal emotion recognitioN. arXiv preprint arXiv:2205.02455, 2022
[6] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, ... & I. Polosukhin, Attention is all you need. Advances in neural information processing systems, 30, 2017
[7] M. Schlichtkrull, T. N. Kipf, P. Bloem, R. Van Den Berg, I. Titov, & M. Welling, Modeling relational data with graph convolutional networks. In The Semantic Web: 15th International Conference, ESWC 2018, Heraklion, Crete, Greece, June 3–7, 2018, Proceedings 15 (pp. 593-607). Springer International Publishing, 2018
[8] Y. Shi, Z. Huang, S. Feng, H. Zhong, W. Wang, Y. Sun, Masked label prediction: Unified message passing model for semi-supervised classification. arXiv preprint arXiv:2009.03509, Sep 8, 2020
[9] C. Busso, M. Bulut, C. C. Lee, A. Kazemzadeh, E. Mower, S. Kim, ... & S. S. Narayanan, IEMOCAP: Interactive emotional dyadic motion capture database. Language resources and evaluation, 42, 335-359, 2008
[10] A. B. Zadeh, P. P .Liang, S. Poria, E. Cambria, & L. P. Morency, Multimodal language analysis in the wild: Cmu-mosei dataset and interpretable dynamic fusion graph. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 2236-2246), July, 2018
[11] Н. Симић, С. Сузић, Т. Носек, М. Вујовић, З. Перић, М. Савић, и В. Делић, Speaker recognition using constrained convolutional neural networks in emotional speech. Entropy, 24(3), 414, 2020
[12] T. Liang, J. Glossner, L. Wang, S. Shi, & X. Zhang, Pruning and quantization for deep neural network acceleration: A survey. Neurocomputing, 461, 370-403, 2021
[2] Т. Носек, Б. Бркљач, Д. Деспотовић, М. Сечујски, Т. Лончар-Турукало, „Практикум из машинског учења“, Факултет техничких наука, Универзитет у Новом Саду, 2020.
[3] H. Abdi, D. Valentin, & B. Edelman, Neural networks (No. 124). Sage, 1999.
[4] J. O. Neill, An overview of neural network compression. arXiv preprint arXiv:2006.03669, 2020
[5] A. Joshi, A. Bhat, A. Jain, A. V. Singh, & A. Modi, COGMEN: COntextualized GNN based multimodal emotion recognitioN. arXiv preprint arXiv:2205.02455, 2022
[6] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, ... & I. Polosukhin, Attention is all you need. Advances in neural information processing systems, 30, 2017
[7] M. Schlichtkrull, T. N. Kipf, P. Bloem, R. Van Den Berg, I. Titov, & M. Welling, Modeling relational data with graph convolutional networks. In The Semantic Web: 15th International Conference, ESWC 2018, Heraklion, Crete, Greece, June 3–7, 2018, Proceedings 15 (pp. 593-607). Springer International Publishing, 2018
[8] Y. Shi, Z. Huang, S. Feng, H. Zhong, W. Wang, Y. Sun, Masked label prediction: Unified message passing model for semi-supervised classification. arXiv preprint arXiv:2009.03509, Sep 8, 2020
[9] C. Busso, M. Bulut, C. C. Lee, A. Kazemzadeh, E. Mower, S. Kim, ... & S. S. Narayanan, IEMOCAP: Interactive emotional dyadic motion capture database. Language resources and evaluation, 42, 335-359, 2008
[10] A. B. Zadeh, P. P .Liang, S. Poria, E. Cambria, & L. P. Morency, Multimodal language analysis in the wild: Cmu-mosei dataset and interpretable dynamic fusion graph. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 2236-2246), July, 2018
[11] Н. Симић, С. Сузић, Т. Носек, М. Вујовић, З. Перић, М. Савић, и В. Делић, Speaker recognition using constrained convolutional neural networks in emotional speech. Entropy, 24(3), 414, 2020
[12] T. Liang, J. Glossner, L. Wang, S. Shi, & X. Zhang, Pruning and quantization for deep neural network acceleration: A survey. Neurocomputing, 461, 370-403, 2021
Downloads
Published
2024-09-06
Issue
Section
Electrotechnical and Computer Engineering