基于多层注意力机制跨模态自适应融合的情感分析模型研究

SENTIMENT ANALYSIS MODEL OF MULTI-LEVEL ATTENTION CROSS-MODAL SELF-ADAPTIVE FUSION

  • 摘要: 与面向文本、图像进行情感分析的研究相比,面向视频进行情感分析的研究较少,且不同模式之间跨模态关系抽取依然存在噪声与信息冗余的问题。因此,结合文本、视频两种数据模态提出一种基于多层注意力机制的跨模态自适应融合的情感分析模型 (MACSF)。该文将提取到的文本与视频特征在多头层次注意 (MHA) 下跨模态分层融合两次,得到具有交互语义的二次融合特征;将文本特征和二次融合的特征通过自适应跨模态集成得到最终融合特征;将融合特征输入多层感知机和 Softmax 函数得到情感分类结果。在公开数据集 MOSI 和 MOSEI 上实验验证,该文模型有效弥补了跨模态交互中存在的噪声问题,提高了情感分类的效果。

     

    Abstract: The research of sentiment analysis for video is less compared with the research of sentiment analysis for text and image, and the extraction of cross-modal relationships between different modal still has the problems of noise and information redundancy. Therefore, this study proposes a text and video sentiment analysis model of multi-level attention cross-modal self-adaptive fusion (MACSF). The extracted text and video features were fused twice under multi-head hierarchical attention (MHA), to obtain the secondary fusion features with interactive semantics. The text features and the secondary fusion features were sent into the self-adaptive cross-modal integration to obtain the final fusion features. The fusion features were inputted into the multi-layer perceptron and Softmax function to obtain the sentiment classification results. Experiments on public dataset MOSI and MOSEI show that the model in this paper makes up for the noise problem in cross-modal interaction and improves the effect of sentiment analysis effectively.

     

/

返回文章
返回