作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2024, Vol. 50 ›› Issue (5): 241-249. doi: 10.19678/j.issn.1000-3428.0067538

• 图形图像处理 • 上一篇    下一篇

基于关键区域遮挡与重建的人脸表情识别

李晶1, 李健1, 陈海丰1, 张倩1, 王丽燕2, 裴二成3   

  1. 1. 陕西科技大学电子信息与人工智能学院, 陕西 西安 710021;
    2. 陕西科技大学文理学院, 陕西 西安 710021;
    3. 西安邮电大学计算机学院, 陕西 西安 710100
  • 收稿日期:2023-05-04 修回日期:2023-08-25 发布日期:2023-10-10
  • 通讯作者: 李晶,E-mail:211612105@sust.edu.cn E-mail:211612105@sust.edu.cn
  • 基金资助:
    国家自然科学基金(62306172);国家土建结构预制装配化工程技术研究中心沈祖炎专项基金(2019CPCCE-K02);陕西省自然科学基础研究计划项目(2022JQ-662);2021年陕西科技大学教育信息化教学改革研究项目(JXJG2021-09);陕西科技大学博士科研启动基金(126022325)。

Facial Expression Recognition Based on Key Region Masking and Reconstruction

LI Jing1, LI Jian1, CHEN Haifeng1, ZHANG Qian1, WANG Liyan2, PEI Ercheng3   

  1. 1. School of Electronic Information and Artificial Intelligence, Shaanxi University of Science and Technology, Xi'an 710021, Shaanxi, China;
    2. School of Arts and Sciences, Shaanxi University of Science and Technology, Xi'an 710021, Shaanxi, China;
    3. School of Computer Science and Technology, Xi'an University of Posts and Telecommunications, Xi'an 710100, Shaanxi, China
  • Received:2023-05-04 Revised:2023-08-25 Published:2023-10-10
  • Contact: 李晶,E-mail:211612105@sust.edu.cn E-mail:211612105@sust.edu.cn

摘要: 为了解决自然场景下人脸表情识别任务中的无用信息干扰和遮挡对识别性能的影响问题,提出一种基于关键区域遮挡与重建的人脸表情识别模型。利用多尺度特征提取网络,提取人脸图像的全局特征。根据68个人脸关键点划分出68个关键区域,并通过插值法提取68个关键区域的特征,同时采用注意力机制学习关键区域特征之间的先验关系。设计自监督的遮挡与重建模块,对关键区域特征进行随机遮挡,并利用已知区域信息来预测和重建被遮挡区域的特征,从而提高模型在自然场景下的表情识别性能。设计多个实验验证了该模型的泛化能力,并通过消融实验验证了模型中每个模块的有效性。实验结果表明,该模型在真实世界的情感面孔数据集(RAF-DB)和Occlusion-RAF-DB数据集上分别达到了88.44%和86.09%的识别准确率,相比于视觉Transformer(ViT)等模型有效地提升了自然场景下人脸表情识别的性能。

关键词: 人脸表情识别, 多尺度关键区域特征, 注意力机制, 自监督学习, 遮挡与重建

Abstract: To overcome the negative impact of irrelevant information interference and masking issues on the performance of facial expression recognition in the wild, this study proposes a facial expression recognition model based on key region masking and reconstruction. A multi-scale feature extraction network is first used to extract global features from facial images. Thereafter, the features of key regions, based on 68 facial landmarks, are extracted and encoded with attention mechanisms to learn prior relationships between the features of the key regions. To further enhance the discriminative capability of the extracted features for improved recognition performance, a key region masking and reconstruction module is designed based on self-supervised learning. This module aims to reconstruct randomly masked features of key regions using known region information. Extensive experiments are conducted to validate the generalization ability of the model, and ablation experiments confirm the effectiveness of each module in the model. The experimental results demonstrate that the model achieves recognition accuracies of 88.44% and 86.09% on the Real-world Affective Faces DataBase(RAF-DB) and the Occlusion-RAF-DB dataset, respectively, effectively improving the performance of facial expression recognition in natural scenarios compared to models such as Vision Transformer(ViT).

Key words: facial expression recognition, multiscale key region feature, attention mechanism, self-supervised learning, masking and reconstruction

中图分类号: