Self-Supervised Feature Fusion Model Based on Mamba Architecture: Enhancing the Performance and Adaptability of Multimodal Sentiment Analysis

Menghua Li; Chaogang Liu

Previous Article in event

Two-dimensional halide optoelectronic materials and devices

Next Article in event

Analysis and Experimental Evaluation on Switched-Capacitor Equalizer for Nickel–Zinc Battery Energy Storage System

Next Article in session

Applying Existing Large Language Models for PCB Routing

Self-Supervised Feature Fusion Model Based on Mamba Architecture: Enhancing the Performance and Adaptability of Multimodal Sentiment Analysis

Menghua Li

^{*

1},

Chaogang Liu

¹ persistence_001@163.com
² Name: Chaogang, Liu Address:Chaogang, Liu dagang_hhh@163.com Weihai Campus, Harbin Institute of Technology, Weihai 264209, China 264209 Weihai China dagang_hhh@163.com Institute:Harbin Institute of Technology

Academic Editor: Ying Tan

Published: 23 November 2024 by MDPI in 2024 International Conference on Science and Engineering of Electronics (ICSEE'2024) session Deep Learning and Data Analytics in Electronics

Abstract:

Multimodal sentiment analysis has considerable potential and importance, but most of the existing methods face difficulties in adapting to complex environments and scenarios. To address these challenges, we propose a self-supervised feature fusion multimodal sentiment analysis model based on the Mamba architecture. The model abandons the traditional Transformer architecture and adopts the Mamba model, leveraging its advantages of efficiently processing long sequences and low computational complexity. Firstly, we utilize the Mamba model to extract features from text, audio, and visual modalities. To further optimize the feature fusion process, we introduce an improved cross-modal attention fusion module based on Mamba, which intelligently selects and fuses key information from different modalities using the unique selective state space model structure of Mamba when processing different types of data, thereby enhancing the accuracy of sentiment analysis. Additionally, we propose a novel feature fusion strategy that combines the powerful representation capabilities of the Mamba model with the cross-modal attention mechanism of the Transformer to more effectively integrate data features extracted by pre-trained models. To evaluate the performance of the proposed model, experiments were conducted on three public datasets: CMU-MOSI, CMU-MOSEI, and IEMOCAP. Compared with previous multimodal sentiment analysis models(such as MulT, ICCN, etc.), the experimental results demonstrate that our proposed model exhibits significant improvements in the evaluation metrics, thereby validating the effectiveness and robustness of our methodology.

Keywords: natural language processing; sentiment analysis; multi-modality; deep learning

0 Reads
0 Recommendations

Menghua Li

Chaogang Liu