作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2024, Vol. 50 ›› Issue (6): 228-235. doi: 10.19678/j.issn.1000-3428.0068016

• 图形图像处理 • 上一篇    下一篇

基于多尺度时空注意力网络的微表情检测方法

于洋1, 孙芳芳1, 吕华1, 李扬2, 王晓民1   

  1. 1. 河北工业大学人工智能与数据科学学院, 天津 300401;
    2. 天津市农业科学院信息研究所, 天津 300192
  • 收稿日期:2023-07-06 修回日期:2023-09-13 发布日期:2023-12-05
  • 通讯作者: 于洋,E-mail:yuyang@hebut.edu.cn E-mail:yuyang@hebut.edu.cn
  • 基金资助:
    国家自然科学基金 (62276088,62102129)。

Micro-Expression Detection Method Based on Multi-Scale Spatiotemporal Attention Network

YU Yang1, SUN Fangfang1, LV Hua1, LI Yang2, WANG Xiaomin1   

  1. 1. School of Artificial Intelligence and Data Science, Hebei University of Technology, Tianjin 300401, China;
    2. Institute of Information Science, Tianjin Academy of Agricultural Sciences, Tianjin 300192, China
  • Received:2023-07-06 Revised:2023-09-13 Published:2023-12-05

摘要: 微表情可以揭示人们试图隐藏的真实情绪,为刑事侦查、心理辅导等提供潜在的信息。现有微表情检测方法主要在获取空间特征的基础上提取时间特征以构建时空特征,这种处理方式容易导致时间特征失真,同时在空间处理过程中会破坏原有时序关系,降低微表情时空特征的判别性。针对这一问题,提出基于多尺度时空注意力网络的微表情检测方法。利用包含时间和空间关系的三维卷积神经网络(3DCNN)对微表情序列进行处理,获取兼顾时间域和空间域的鲁棒性特征。构建多尺度时间输入序列,从不同时间长度的图像序列中提取多维时间特征,采用轻量级3DCNN提取多尺度时空特征,利用全局时空注意力模块(GSAM)对时空特征进行全局性时空关联加强,其中时空重组模块用于加强不同时刻图像帧之间的连通性,全局信息关注模块构建单帧图像上的空间关联信息,最后对不同时刻的特征赋予权重以突出关键时间信息,有效完成微表情检测工作。实验结果表明,该方法可以准确检测出微表情序列片段,在CASME、CASME II和SAMM公开数据集上的准确率分别达到92.32%、95.04%和89.56%,相比目前最优的深度学习方法LGAttNet,所提方法在CASME II和SAMM数据集上的准确率分别提高了3.84和4.96个百分点。

关键词: 微表情检测, 三维卷积神经网络, 时空特征, 多尺度特征, 关联性

Abstract: Micro-expressions can reveal genuine emotions that people attempt to hide, providing potential information for criminal investigations, psychological counseling, and other situations. Existing methods for detecting micro-expression primarily extract temporal characteristics to construct spatiotemporal features based on obtaining spatial features; however, these approaches can result in distorted temporal features, and thus disrupt the original temporal relationships during spatial processing, consequently diminishing the discriminative ability of the spatiotemporal features of micro-expressions. To address this issue, a method is proposed for micro-expression detection based on a multi-scale spatiotemporal attention network. Using a 3-Dimensional Convolutional Neural Network (3DCNN) that incorporates temporal and spatial relationships, the micro-expression sequences are processed to obtain robust features considering both the temporal and spatial domains. Multi-scale temporal input sequences are constructed to extract multi-dimensional temporal features from image sequences with different time lengths in the network. A lightweight 3DCNN is used to extract multi-scale spatiotemporal features. The Global Spatiotemporal Attention Module (GSAM) is employed to enhance the overall spatiotemporal correlations of features, wherein the spatiotemporal restructuring module strengthens the connectivity between different image frames at different moments, whereas the global information attention module constructs the spatial correlation information on a single-frame image. Finally, the assignment of weights to various temporal characteristics highlights the key temporal information, effectively detecting micro-expressions. The experimental results demonstrate that the proposed method can accurately detect micro-expression sequence fragments, achieving accuracy rates of 92.32%, 95.04%, and 89.56% on the publicly available CASME, CASME II, and SAMM datasets, respectively. Compared with that of the existing optimal deep learning method, LGAttNet, the accuracy of the proposed method is improved by 3.84 percentage points on the CASME II dataset and 4.96 percentage points on the SAMM dataset.

Key words: micro-expression detection, 3-Dimensional Convolutional Neural Network(3DCNN), spatiotemporal features, multi-scale features, correlation

中图分类号: