Panagiotis Koromilas, M. Nicolaou, Theodoros Giannakopoulos, Yannis Panagakis
{"title":"MMATR:一种基于张量方法的轻量级多模态情感分析方法","authors":"Panagiotis Koromilas, M. Nicolaou, Theodoros Giannakopoulos, Yannis Panagakis","doi":"10.1109/ICASSP49357.2023.10097030","DOIUrl":null,"url":null,"abstract":"Despite the considerable research output on Multimodal Learning for Affect-related tasks, most of the current methods are very complex in terms of the number of trainable parameters, and thus do not constitute effective solutions for real-life applications. In this work we try to alleviate this gap in the literature by introducing the Multimodal Attention Tensor Regression (MMATR) network, a lightweight model that is based on: (i) a static input representation (2D matrix of dimensions time × features) for each modality, which helps to avoid high-parameterized sequential models by incorporating a CNN, (ii) the replacement of the usual pooling and flattening operations as well as the linear layers by tensor contraction and tensor regression layers that are able to reduce the number of parameters, while keeping the high-order structure of the multimodal data, and (iii) a bimodal attention layer that learns multimodal co-occurrences. By a set of experiments comparing with a variety of state-of-the-art techniques, we show that the proposed MMATR can achieve results competitive to the state-of-the-art in the task of Multimodal Sentiment Analysis, albeit having four orders of magnitude fewer parameters.","PeriodicalId":113072,"journal":{"name":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"68 8","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"MMATR: A Lightweight Approach for Multimodal Sentiment Analysis Based on Tensor Methods\",\"authors\":\"Panagiotis Koromilas, M. Nicolaou, Theodoros Giannakopoulos, Yannis Panagakis\",\"doi\":\"10.1109/ICASSP49357.2023.10097030\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Despite the considerable research output on Multimodal Learning for Affect-related tasks, most of the current methods are very complex in terms of the number of trainable parameters, and thus do not constitute effective solutions for real-life applications. In this work we try to alleviate this gap in the literature by introducing the Multimodal Attention Tensor Regression (MMATR) network, a lightweight model that is based on: (i) a static input representation (2D matrix of dimensions time × features) for each modality, which helps to avoid high-parameterized sequential models by incorporating a CNN, (ii) the replacement of the usual pooling and flattening operations as well as the linear layers by tensor contraction and tensor regression layers that are able to reduce the number of parameters, while keeping the high-order structure of the multimodal data, and (iii) a bimodal attention layer that learns multimodal co-occurrences. By a set of experiments comparing with a variety of state-of-the-art techniques, we show that the proposed MMATR can achieve results competitive to the state-of-the-art in the task of Multimodal Sentiment Analysis, albeit having four orders of magnitude fewer parameters.\",\"PeriodicalId\":113072,\"journal\":{\"name\":\"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)\",\"volume\":\"68 8\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-06-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICASSP49357.2023.10097030\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSP49357.2023.10097030","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
MMATR: A Lightweight Approach for Multimodal Sentiment Analysis Based on Tensor Methods
Despite the considerable research output on Multimodal Learning for Affect-related tasks, most of the current methods are very complex in terms of the number of trainable parameters, and thus do not constitute effective solutions for real-life applications. In this work we try to alleviate this gap in the literature by introducing the Multimodal Attention Tensor Regression (MMATR) network, a lightweight model that is based on: (i) a static input representation (2D matrix of dimensions time × features) for each modality, which helps to avoid high-parameterized sequential models by incorporating a CNN, (ii) the replacement of the usual pooling and flattening operations as well as the linear layers by tensor contraction and tensor regression layers that are able to reduce the number of parameters, while keeping the high-order structure of the multimodal data, and (iii) a bimodal attention layer that learns multimodal co-occurrences. By a set of experiments comparing with a variety of state-of-the-art techniques, we show that the proposed MMATR can achieve results competitive to the state-of-the-art in the task of Multimodal Sentiment Analysis, albeit having four orders of magnitude fewer parameters.