作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• •    

基于多尺度时空注意力网络的微表情检测方法

  • 发布日期:2023-12-05

Micro-expression Detection Method Based on Multi-scale Spatiotemporal Attention Network

  • Published:2023-12-05

摘要: 微表情可以揭示人们试图隐藏的真实情绪,为刑事侦查、心理辅导等提供潜在的信息。现有微表情检测方法主要在获取空间特征的基础上提取时间特性构建时空特征,相应处理容易导致时间特征失真,同时在空间处理过程中会破坏原有时序关系,降低微表情时空特征的判别性。针对这一问题,提出基于多尺度时空注意力网络的微表情检测方法。利用包含时间和空间关系的三维卷积神经网络对微表情序列进行处理,获取兼顾时间域和空间域的鲁棒性特征。该网络构建多尺度时间输入序列从不同时间长度的图像序列中提取多维时间特征;采用轻量级三维卷积神经网络提取多尺度时空特征;利用全局时空注意力模块对时空特征进行全局性时空关联加强,其中时空重组模块加强不同时刻图像帧之间的连通性,全局信息关注模块构建单帧图像上的空间关联信息,最后对不同时刻特征赋予权重突出关键时间信息,有效地完成微表情检测工作。实验结果表明,该方法可以准确检测出微表情序列片段,在CASME、CASME II和SAMM三个公开数据集上的准确率分别达到92.32%、95.04%和89.56%。相比目前最优的深度学习方法,在CASME II和SAMM数据集上准确率分别提高了3.84%和4.96%。

Abstract: Micro-expression can reveal the genuine emotions that people attempt to hide, providing potential information for criminal investigation, psychological counseling, and other situations. Existing micro-expression detection methods primarily extract temporal characteristics to construct spatiotemporal features based on obtaining spatial features, but these approaches can distort the temporal features and disrupt the original temporal relationships during spatial processing, consequently diminishing the discriminative ability of the spatiotemporal features of micro-expression. To address this issue, a micro-expression detection method based on a multi-scale spatiotemporal attention network is proposed. By using a three-dimensional convolutional neural network that incorporates temporal and spatial relationships, the micro-expression sequences are processed to obtain robust features that consider both temporal and spatial domains. Multi-scale temporal input sequences are constructed to extract multidimensional temporal features from image sequences of different time lengths in the network. A lightweight three-dimensional convolutional neural network is used to extract multi-scale spatiotemporal features. The global spatiotemporal attention module is employed to enhance the overall spatiotemporal correlations of the features, wherein the spatiotemporal restructuring module strengthens the connectivity between different image frames at different moments, while the global information attention module constructs the spatial correlation information on a single-frame image. Finally, the assignment of weights to various temporal characteristics highlights key temporal information, effectively accomplishing the process of micro-expression detection. The experimental results show that the proposed method can accurately detect micro-expression sequence fragments, achieving accuracy rates of 92.32%, 95.04%, and 89.56% on the publicly available datasets: CASME, CASME II, and SAMM, respectively. Compared to the current state-of-the-art deep learning methods, the accuracy has been improved by 3.84% and 4.96% on the CASME II and SAMM datasets, respectively.