作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2023, Vol. 49 ›› Issue (1): 295-302,310. doi: 10.19678/j.issn.1000-3428.0063455

• 开发研究与工程应用 • 上一篇    下一篇

一种基于词频‐逆文档频率和混合损失的表情识别算法

蓝峥杰1, 王烈1, 聂雄1,2   

  1. 1. 广西大学 计算机与电子信息学院, 南宁 530004;
    2. 广西多媒体通信与网络技术重点实验室, 南宁 530004
  • 收稿日期:2021-12-06 修回日期:2022-02-08 发布日期:2022-03-21
  • 作者简介:蓝峥杰(1992-),男,硕士研究生,主研方向为深度学习、图像处理;王烈(通信作者),教授;聂雄,副教授。
  • 基金资助:
    广西科技重大专项(桂科AA21077007)。

An Expression Recognition Algorithm Based on Term Frequency-Inverse Document Frequency and Hybrid Loss

LAN Zhengjie1, WANG Lie1, NIE Xiong1,2   

  1. 1. School of Computer, Electronics and Information, Guangxi University, Nanning 530004, China;
    2. Guangxi Key Laboratory of Multimedia Communication and Network Technology, Nanning 530004, China
  • Received:2021-12-06 Revised:2022-02-08 Published:2022-03-21

摘要: 面部表情能自然高效地表达人类的心理活动和思想状态,影响着人们的沟通交流过程。在诸多智能化应用中,人脸表情识别是人类与机器间建立情感交互的重要基础。在细粒度人脸表情识别任务中,由于特征提取网络对表情产生区域的关键特征处理不足,从而引发细节特征信息丢失问题。提出一种词频-逆文档频率注意力机制TF-IDF SPA,通过该机制调整表情产生关键区域的注意力分布,强化网络对该区域关键细节特征的提取能力。同时,为了应对表情识别任务中普遍存在的类间差异小、类内差异大的问题,设计一种改进型混合加权损失函数,以增强表情类内聚拢性同时增大类间距离。依据数据集中样本的数量分布情况,动态调整损失函数的分类权重值,从而强化模型对小数据量样本的学习能力。在此基础上,将结构简单的TF-IDF SPA模块与卷积层共同堆叠以构建人脸表情识别网络。实验结果表明,该网络具有较好的人脸表情识别性能,在FER2013和CK+数据集上的分类准确率分别达到73.52%和98.27%。

关键词: 表情识别, FER2013数据集, CK+数据集, 词频-逆文档频率, 损失函数, 注意力机制

Abstract: Facial expressions can express people's mental activities and state of mind naturally and efficiently.They profoundly affect people's communication process.In many intelligent applications, facial expression recognition is an important basis for establishing emotional interaction between humans and machines.In fine-grained facial expression recognition tasks, details are lost owing to the insufficient processing of key features in the facial expression-producing region by the network.A Term Frequency-Inverse Document Frequency Spatial Pyramid Attention(TF-IDF SPA) is proposed to adjust the attention distribution in the facial expression-producing region and strengthen the ability of the network to extract key detail features.Moreover, to deal with the common problem of small inter-class differences and large intra-class differences in facial expression recognition tasks, this paper proposes an improved hybrid-weighted loss function to enhance the cohesion of facial expression classes and increase the distance between classes.In addition, setting the weight value of the loss function dynamically according to the distribution of samples in the data set strengthens the learning ability of the model for small data categories.On this basis, a TF-IDF SPA module with a simple structure and a convolution layer are stacked together to build a facial expression recognition network.The experimental results show that the network has good performance in facial expression recognition, and achieves a classification accuracy of 73.52% and 98.27% on the FER2013 and CK+ datasets, respectively.

Key words: expression recognition, FER2013 dataset, CK+ dataset, term frequency-inverse document frequency, loss function, attention mechanism

中图分类号: