作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2025, Vol. 51 ›› Issue (11): 258-267. doi: 10.19678/j.issn.1000-3428.0069661

• 图形图像处理 • 上一篇    下一篇

融合关键区域注意力机制的人脸表情识别方法

彭奇1,2, 刘银华2,*(), 尚云瑞1,2   

  1. 1. 青岛大学自动化学院,山东 青岛 266071
    2. 青岛大学未来研究院,山东 青岛 266071
  • 收稿日期:2024-03-26 修回日期:2024-06-07 出版日期:2025-11-15 发布日期:2024-06-26
  • 通讯作者: 刘银华
  • 基金资助:
    国家重点研发计划重点专项(2020YFB1313600)

Facial Expression Recognition Method Integrating Attention Mechanism for Key Regions

PENG Qi1,2, LIU Yinhua2,*(), SHANG Yunrui1,2   

  1. 1. Automation Institute, Qingdao University, Qingdao 266071, Shandong, China
    2. Institute for Future, Qingdao University, Qingdao 266071, Shandong, China
  • Received:2024-03-26 Revised:2024-06-07 Online:2025-11-15 Published:2024-06-26
  • Contact: LIU Yinhua

摘要:

在现实环境中, 由于检测人脸图像时受到光照强弱、面部遮挡、姿势变化等因素的影响, 人脸表情识别的准确率通常不高。为解决这一鲁棒性问题, 提出一种融合关键区域注意力机制的人脸表情识别方法。根据人脸视觉系统的面部感知机制, 将人脸的关键区域与整体区域相融合, 增强对复杂及微妙表情的识别能力。在关键区域提取阶段, 采用MTCNN算法将人脸数据依次输入3个级联网络, 得到人脸关键点位置信息。根据人脸解剖学对面部的研究, 提出区域裁剪法(LRC), 对位置信息进行处理, 裁剪得到人脸关键区域图像。将人脸整体区域和裁剪得到的人脸关键区域图像分别输入ResNet-50网络并进行特征融合, 添加通过精确的位置信息以及对通道关系和长期依赖性进行编码的坐标注意力(CA)机制, 使得模型更关注人脸中对表情分类贡献更大的区域。在公开数据集CK+和FER2013上进行实验, 结果表明, 该方法的识别准确率分别达到了96.9%和73.22%, 与现有的诸多先进方法对比, 其准确率均有显著提高, 表明所提方法在网络结构和性能方面具有一定的参考价值。

关键词: 人脸表情识别, 关键区域, 人脸关键点, 位置信息, 坐标注意力机制

Abstract:

In real-world environments, the accuracy of facial expression recognition is typically low because of factors such as varying illumination intensities, facial occlusions, and pose variations during the detection of facial images. To address this robustness issue, this study proposes a facial expression recognition method that integrates a key region attention mechanism. Drawing inspiration from the facial perception mechanism of the human visual system, this method combines key facial regions with the overall facial region to enhance the recognition of complex and subtle expressions. During the key region extraction phase, the MTCNN algorithm is employed to sequentially feed facial data through three cascaded networks, thereby obtaining the positional information of facial keypoints. Based on anatomical studies of the face, this study introduces a Local Region Cropping (LRC) method to process the positional information and crop key facial region images. Subsequently, both the overall facial image and cropped key facial region images are separately input into a ResNet-50 network, followed by feature fusion. A Coordinate Attention (CA) mechanism, which encodes precise positional information, channel relationships, and long-range dependencies, is incorporated to direct the model's focus toward facial regions that contribute more significantly to expression classification. Experimental results on publicly available datasets, CK+ and FER2013, demonstrate that the proposed method achieves recognition accuracies of 96.9% and 73.22%, respectively. Compared with existing state-of-the-art methods, the method achieves significant improvements in accuracy, indicating that it offers valuable insights into network architecture and performance.

Key words: facial expression recognition, key regions, facial landmarks, location information, Coordinate Attention (CA) mechanism