作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2022, Vol. 48 ›› Issue (12): 79-85,94. doi: 10.19678/j.issn.1000-3428.0063424

• 人工智能与模式识别 • 上一篇    下一篇

融合多尺度特征的多监督人脸活体检测算法

陈苏阳, 宋晓宁   

  1. 江南大学 人工智能与计算机学院, 江苏 无锡 214122
  • 收稿日期:2021-12-01 修回日期:2022-01-21 发布日期:2022-12-07
  • 作者简介:陈苏阳(1997—),男,硕士研究生,主研方向为人脸活体检测;宋晓宁(通信作者),教授、博士、博士生导师。
  • 基金资助:
    国家自然科学基金(61876072);江苏省“六大人才高峰项目”。

Multi-supervision Face Liveness Detection Algorithm Fused with Multi-scale Feature

CHEN Suyang, SONG Xiaoning   

  1. School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, Jiangsu 214122, China
  • Received:2021-12-01 Revised:2022-01-21 Published:2022-12-07

摘要: 目前的多数活体检测算法忽略了特征挖掘,导致判别性信息提取不足。提出一种融合梯度纹理和群感受野的活体检测算法。使用中心差分卷积计算感受野周围点与中心点的差值,提取图像的梯度纹理特征,设计群感受野模块,采用不同尺寸的卷积核结合空洞卷积组成多分支结构,在使用较少参数量的情况下获得更大的感受野和多尺度特征,并将两种特征融合输入到残差结构中。此外,在使用深度图进行监督的同时,增加二值掩模进行辅助监督,使得网络将学习的中心放到人脸部位,进一步提升模型的鲁棒性。在此基础上,综合深度图生成器和掩模生成器的输出结果,计算预测得分,实现端到端的活体检测。实验结果表明,该算法在公开数据集OULU-NPU 4个协议上的平均分类错误率分别为0.9%、1.9%、1.6%±2.0%和2.7%±1.8%,在数据集CASIA-MFSD和Replay-Attack上可实现无误差活体检测,并且模型参数量仅为1.1 MB,与Auxiliary和STASN等活体检测算法相比,检测精度更高,具有更好的鲁棒性。

关键词: 卷积神经网络, 人脸活体检测, 纹理特征, 感受野, 多监督

Abstract: To address the problem that most face liveness detection algorithms ignore feature mining and cannot extract enough discriminative information, this paper proposes a novel algorithm that combines gradient texture and group receptive fields. On the one hand, central difference convolution is used to calculate the difference between the points around the receptive field and the center point.In this way, the gradient texture feature of the image can be fully extracted.On the other hand, a group receptive field module is proposed to obtain larger receptive fields and multi-scale features with fewer parameters.It uses convolution kernels of different sizes combined with dilated convolution to form multiple small branch structures.Then, two types of feature are combined and input into the residual structure.In addition, while a depth map is used for the main supervision, binary mask supervision is added to assist with supervision. As a result, the network focuses on the face part, further improving the robustness of the model.Finally, the outputs of the depth map generator and mask generator are combined to calculate the prediction score to realize end-to-end detection.Experimental results show that the average classification error rates of the proposed algorithm on the four protocols of the OULU-NPU database are 0.9%, 1.9%, 1.6%±2.0%, and 2.7%±1.8%.Moreover, error-free liveness detection was achieved on the CASIA-MFSD and Replay-Attack datasets.In addition, the size of the model is only 1.1 MB.It has higher detection accuracy and better robustness than auxiliary and the Spatio-Temporal Anti-Spoof Network (STASN) liveness detection algorithms.

Key words: Convolutional Neural Network(CNN), face liveness detection, texture feature, receptive field, multi-supervision

中图分类号: