Multimodal sentiment analysis has considerable potential and importance, but most of the existing methods face difficulties in adapting to complex environments and scenarios. To address these challenges, we propose a self-supervised feature fusion multimodal sentiment analysis model based on the Mamba architecture. The model abandons the traditional Transformer architecture and adopts the Mamba model, leveraging its advantages of efficiently processing long sequences and low computational complexity. Firstly, we utilize the Mamba model to extract features from text, audio, and visual modalities. To further optimize the feature fusion process, we introduce an improved cross-modal attention fusion module based on Mamba, which intelligently selects and fuses key information from different modalities using the unique selective state space model structure of Mamba when processing different types of data, thereby enhancing the accuracy of sentiment analysis. Additionally, we propose a novel feature fusion strategy that combines the powerful representation capabilities of the Mamba model with the cross-modal attention mechanism of the Transformer to more effectively integrate data features extracted by pre-trained models. To evaluate the performance of the proposed model, experiments were conducted on three public datasets: CMU-MOSI, CMU-MOSEI, and IEMOCAP. Compared with previous multimodal sentiment analysis models(such as MulT, ICCN, etc.), the experimental results demonstrate that our proposed model exhibits significant improvements in the evaluation metrics, thereby validating the effectiveness and robustness of our methodology.
Previous Article in event
Next Article in event
Next Article in session
Self-Supervised Feature Fusion Model Based on Mamba Architecture: Enhancing the Performance and Adaptability of Multimodal Sentiment Analysis
Published:
23 November 2024
by MDPI
in 2024 International Conference on Science and Engineering of Electronics (ICSEE'2024)
session Deep Learning and Data Analytics in Electronics
Abstract:
Keywords: natural language processing; sentiment analysis; multi-modality; deep learning