【IEEE ICME2022】NHFNET: 用于多模态情感分析的非同质化融合网络——CCF B
然而,这些方法忽略了三种模式之间的信息密度差异,即视觉和音频具有低层次的符号特征,相反,文本具有高层次的语义特征。为此,我们提出了一个非同质融合网(NHENet)来实现多模态信息的交互。具体来说,我们设计了一个带有注意力聚集的融合模块来处理视觉和听觉模式的融合,以增强它们的高级语义特征。然后,跨模态注意力被用来实现文本模态和视听融合的信息强化。为了验证所提方法的有效性,我们在CMU-MOSEI数据
NHFNET: A Non-Homogeneous Fusion Network for Multimodal Sentiment Analysis
Vedio
Presented to Bilibili:https://www.bilibili.com/video/BV1XY411n7rx?vd_
source=9dad485ab167164358578deecb64a255#reply126270526800
Abstract
Fusion technology is crucial for multimodal sentiment anal- ysis. Recent attention-based fusion methods demonstrate high performance and strong robustness. However, these ap- proaches ignore the difference in information density among the three modalities, i.e., visual and audio have low-level sig- nal features and conversely text has high-level semantic fea- tures. To this end, we propose a non-homogeneous fusion net- work (NHENet) to achieve multimodal information interac- tion. Specifically, a fusion module with attention aggregation is designed to handle the fusion of visual and audio modalities to enhance them to high-level semantic features. Then, cross- modal attention is used to achieve information reinforcement of text modality and audio-visual fusion. NHFNet compen- sates for the differences in information density of different modalities enabling their fair interaction. To verify the ef- fectiveness of the proposed method, we set up the aligned and unaligned experiments on the CMU-MOSEI dataset, re- spectively. The experimental results show that the proposed method outperforms the state-of-the-art. Codes are available at https://github.com/skeletonNN/NHFNet.
融合技术对于多模态情感分析至关重要。最近的基于注意力的融合方法表现出高性能和强健性。然而,这些方法忽略了三种模式之间的信息密度差异,即视觉和音频具有低层次的符号特征,相反,文本具有高层次的语义特征。为此,我们提出了一个非同质融合网(NHENet)来实现多模态信息的交互。具体来说,我们设计了一个带有注意力聚集的融合模块来处理视觉和听觉模式的融合,以增强它们的高级语义特征。然后,跨模态注意力被用来实现文本模态和视听融合的信息强化。NHFNet弥补了不同模态的信息密度差异,使它们能够公平地互动。为了验证所提方法的有效性,我们在CMU-MOSEI数据集上分别进行了对齐和不对齐的实验。实验结果表明,建议的方法优于最先进的方法。代码可在https://github.com/skeletonNN/NHFNet。
更多推荐
所有评论(0)