作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2022, Vol. 48 ›› Issue (11): 1-13. doi: 10.19678/j.issn.1000-3428.0065347

• 热点与综述 • 上一篇    下一篇

面向小样本数据的机器学习方法研究综述

陈良臣1,2,3,4, 傅德印2   

  1. 1. 中国劳动关系学院 计算机教研室, 北京 100048;
    2. 中国劳动关系学院 应用统计学教研室, 北京 100048;
    3. 中国科学院信息工程研究所, 中国科学院网络测评技术重点实验室, 北京 100093;
    4. 武汉理工大学 计算机科学与技术学院, 武汉 430063
  • 收稿日期:2022-06-10 修回日期:2022-09-23 发布日期:2022-11-05
  • 作者简介:陈良臣(1982—),男,副教授、博士研究生,主研方向为大数据、人工智能、信息安全;傅德印(通信作者),教授、博士、博士生导师。
  • 基金资助:
    国家统计局全国统计科学研究项目(2022LY005);中国劳动关系学院科研项目(22XYJS021);中国科学院网络测评技术重点实验室课题(KFKT2022-003);中国劳动关系学院教改项目(JG22080)。

Survey on Machine Learning Methods for Small Sample Data

CHEN Liangchen1,2,3,4, FU Deyin2   

  1. 1. Department of Computer, China University of Labor Relations, Beijing 100048, China;
    2. Department of Applied Statistics, China University of Labor Relations, Beijing 100048, China;
    3. Key Laboratory of Network Assessment Technology, Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100093, China;
    4. College of Computer Science & Technology, Wuhan University of Technology, Wuhan 430063, China
  • Received:2022-06-10 Revised:2022-09-23 Published:2022-11-05

摘要: 小样本学习是面向小样本数据的机器学习,旨在利用较少的有监督样本数据去构建能够解决实际问题的机器学习模型。小样本学习能够解决传统机器学习方法在样本数据不充分时性能严重下降的问题,可以为新型小样本任务实现低成本和快速的模型部署,缩小人类智能与人工智能之间的距离,对推动发展通用型人工智能具有重要意义。从小样本学习的概念、基础模型和实际应用入手,系统梳理当前小样本学习的相关工作,将小样本学习方法分类为基于模型微调、基于数据增强、基于度量学习和基于元学习,并具体阐述这4大类方法的核心思想、基本模型、细分领域和最新研究进展,以及每一类方法在科学研究或实际应用中存在的问题,总结目前小样本学习研究的常用数据集和评价指标,整理基于部分典型小样本学习方法在Omniglot和Mini-ImageNet数据集上的实验结果。最后对各种小样本学习方法及其优缺点进行总结,分别从数据层面、理论研究和应用研究3个方面对小样本学习的未来研究方向进行展望。

关键词: 小样本学习, 小样本数据, 机器学习, 深度学习, 数据增强

Abstract: Few-shot learning is a type of machine learning method for small sample data that operates by using less supervised sample data to build machine learning models that can solve practical problems.Therefore, few-shot learning can be used to solve the serious performance degradation problem in traditional machine learning methods when a small sample data is used, and can achieve low-cost and rapid model deployment for new few-sample tasks, which has the potential of narrowing the distance between human intelligence and artificial intelligence and promote the general importance of artificial intelligence development.This paper systematically sorts out the existing related studies on few-shot learning and classifies the methods on few-shot learning into model-based fine-tuning, data augmentation, metric-based learning, and meta-learning based on the concept, basic model, and practical application of few-shot learning. Moreover, the core ideas, basic models, subdivision fields, and latest research progress in these four method categories are specifically expounded, and the problems existing in the scientific research and practical application of each method category are outlined.Data sets and evaluation indicators are also obtained, and the experimental results are organized based on typical few-shot learning methods with Omniglot and Mini-ImageNet datasets.Additionally, the advantages and disadvantages of various few-shot learning methods are summarized.Finally, data-level theoretical and applied research approaches, and potential future research directions of few-shot learning, are determined.

Key words: few-shot learning, small sample data, machine learning, deep learning, data augmentation

中图分类号: