作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2024, Vol. 50 ›› Issue (2): 281-287. doi: 10.19678/j.issn.1000-3428.0066795

• 图形图像处理 • 上一篇    下一篇

面向视频行为识别深度模型的数据预处理方法

安峰民1, 张冰冰2, 董微1, 张建新1,*()   

  1. 1. 大连民族大学计算机科学与工程学院, 辽宁 大连 116650
    2. 大连理工大学信息与通信工程学院, 辽宁 大连 116024
  • 收稿日期:2023-01-18 出版日期:2024-02-15 发布日期:2024-02-21
  • 通讯作者: 张建新
  • 基金资助:
    国家自然科学基金(61972062); 辽宁省应用基础研究计划项目(2023JH2/101300191); 辽宁省应用基础研究计划项目(2023JH2/101300193)

Data Preprocessing Method for Video Action Recognition Depth Models

Fengmin AN1, Bingbing ZHANG2, Wei DONG1, Jianxin ZHANG1,*()   

  1. 1. School of Computer Science and Engineering, Dalian Minzu University, Dalian 116650, Liaoning, China
    2. School of Information and Communication Engineering, Dalian University of Technology, Dalian 116024, Liaoning, China
  • Received:2023-01-18 Online:2024-02-15 Published:2024-02-21
  • Contact: Jianxin ZHANG

摘要:

以视频帧采样和数据增强为代表的预处理操作是提升视频行为识别深度模型性能的重要手段。针对现有视频数据预处理存在的采样视频帧区分性不足、数据增强方式单一等问题,提出一种面向视频行为识别深度模型的数据预处理方法。在视频帧采样上设计动作指导的片段化视频采样策略,综合考虑视频帧间差异特征与视频片段短期时序特征,通过显著行为动作获取关键视频帧并对其邻近视频帧进行采样,有效提高所选取视频帧的时空区分能力。借鉴图像分类中的随机数据增强方法,以随机数据增强方式对采样后视频短片段进行数据增强处理,使视频识别深度模型学习到更复杂的空间变化信息。根据2个公开的视频识别数据集和2个代表性的网络模型的评估实验结果表明,所提预处理方法可以使基准模型获得2.5个百分点以上的准确率提升,最高可提升6.8个百分点。上述实验结果验证了所提预处理方法在视频行为识别任务中的有效性。

关键词: 视频行为识别, 预处理方法, 动作指导的片段化视频采样, 数据增强, 深度学习

Abstract:

Video preprocessing operations, mainly including video frame sampling and data augmentation, are essential methods to improving the performance of deep video models in action recognition, and they have recently received increased attention. In this study, a novel data preprocessing method for video action recognition depth models is proposed, with a focus on video preprocessing problems of insufficient guidance during key frame sampling and relatively simple data amplification methods. First, a novel motion-guided fragmented video sampling technique is designed, which comprehensively considers features among different video frames and short-term timing features of video clips. It acquires key video frames guided by significant motion actions and sampling adjacent video frames of these key frames, which effectively improves the spatiotemporal discrimination capacity of the selected video frames. Additionally, motivated by the random data augmentation successfully applied in image classification task, this study further introduces the random data augmentation strategy to augment the sampled short video clips. This ensures that video recognition depth models can learn more complex spatially varying information. Based on evaluation experiments using two public video recognition datasets and two representative network models, the results show that the proposed preprocessing method can improve the accuracy of the baseline model by more than 2.5 percentage points, with the highest improvement accuracy of 6.8 percentage points. The results demonstrate the effectiveness of the method in video action recognition.

Key words: video action recognition, preprocessing method, motion-guided fragmented video sampling, data augmentation, deep learning