Please login first
Self-Supervised Feature Fusion Model Based on Mamba Architecture: Enhancing the Performance and Adaptability of Multimodal Sentiment Analysis
* 1 , 2
1  persistence_001@163.com
2  Name: Chaogang, Liu Address:Chaogang, Liu dagang_hhh@163.com Weihai Campus, Harbin Institute of Technology, Weihai 264209, China 264209 Weihai China dagang_hhh@163.com Institute:Harbin Institute of Technology
Academic Editor: Ying Tan

Abstract:

Multimodal sentiment analysis has considerable potential and importance, but most of the existing methods face difficulties in adapting to complex environments and scenarios. To address these challenges, we propose a self-supervised feature fusion multimodal sentiment analysis model based on the Mamba architecture. The model abandons the traditional Transformer architecture and adopts the Mamba model, leveraging its advantages of efficiently processing long sequences and low computational complexity. Firstly, we utilize the Mamba model to extract features from text, audio, and visual modalities. To further optimize the feature fusion process, we introduce an improved cross-modal attention fusion module based on Mamba, which intelligently selects and fuses key information from different modalities using the unique selective state space model structure of Mamba when processing different types of data, thereby enhancing the accuracy of sentiment analysis. Additionally, we propose a novel feature fusion strategy that combines the powerful representation capabilities of the Mamba model with the cross-modal attention mechanism of the Transformer to more effectively integrate data features extracted by pre-trained models. To evaluate the performance of the proposed model, experiments were conducted on three public datasets: CMU-MOSI, CMU-MOSEI, and IEMOCAP. Compared with previous multimodal sentiment analysis models(such as MulT, ICCN, etc.), the experimental results demonstrate that our proposed model exhibits significant improvements in the evaluation metrics, thereby validating the effectiveness and robustness of our methodology.

Keywords: natural language processing; sentiment analysis; multi-modality; deep learning

 
 
Top