作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2022, Vol. 48 ›› Issue (6): 73-78,88. doi: 10.19678/j.issn.1000-3428.0063270

• 人工智能与模式识别 • 上一篇    下一篇

基于标签相关性的K近邻多标签学习

钱龙, 赵静, 韩京宇, 毛毅   

  1. 南京邮电大学 计算机学院, 南京 210023
  • 收稿日期:2021-11-18 修回日期:2022-01-04 发布日期:2022-01-07
  • 作者简介:钱龙(1994—),男,硕士研究生,主研方向为机器学习;赵静,硕士研究生;韩京宇(通信作者),教授、博士;毛毅,讲师、博士。
  • 基金资助:
    国家自然科学基金(62002174)。

K-Nearest Neighbor Multi-Label Learning Based on Label Correlation

QIAN Long, ZHAO Jing, HAN Jingyu, MAO Yi   

  1. School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing 210023, China
  • Received:2021-11-18 Revised:2022-01-04 Published:2022-01-07

摘要: 多标签学习是机器学习领域的一个研究热点,其能够有效解决真实世界中的多语义问题。在多标签学习任务中,样本的多个标签之间存在一定的关联关系,忽略标签间的相关性会导致模型的泛化性能降低。提出一种基于标签间相关性的多标签学习K近邻算法。充分挖掘样本多标签间的相关性,通过Fp_growth算法得到标签的频繁项集。针对频繁项和标签分别构建评分模型和阈值模型,评分模型用于衡量样本与频繁项或标签之间的关联程度,阈值模型用于求解频繁项或标签对应的判别阈值,结合评分模型和阈值模型对样本所属频繁项进行预测,进而确定样本标签集。在经典数据集Emotions和Scene上的实验结果表明,该算法的F1-Measure指标分别达到66.6%和73.3%,相比CC、LP、RAKEL、MLDF等基准方法,其F1-Measure分别平均提高3.8和2.1个百分点,该算法通过合理利用标签间的相关性使得分类性能得到有效提升。

关键词: 机器学习, 多标签学习, 标签相关性, K近邻, 频繁项集

Abstract: Multi-label learning is a popular research topic in the field of machine learning.It can effectively solve multi-lingualism in the real world.In multi-label learning, a certain correlation exists between multiple labels of the sample.Ignoring the correlation between labels reduces the generalization performance of the model.Concerning multi-label learning, a multi-label learning, K-nearest neighbor algorithm based on the correlation between labels is proposed to fully excavate the correlation between multiple labels of samples, using the Fp_growth algorithm to obtain the frequent item-sets of tags.For frequent items and labels, the scoring and threshold models are constructed.The scoring model measures the correlation between the sample and frequent items or labels.The threshold model solves the discrimination threshold corresponding to frequent items or labels.Combining these models, the frequent items of the sample are predicted, and the sample label set is then determined.The results on the classical Emotions and Scene datasets show that the F1-Measure index of the algorithm achieved 66.6% and 73.3%, respectively.Compared with benchmark methods, such as CC, LP, RAKE, and MLDF, the F1-Measure of the algorithm improved by an average of 3.8 and 2.1 percentage points.The algorithm effectively improves the classification performance by rationally using the correlation between labels.

Key words: machine learning, multi-label learning, label correlation, K-nearest neighbor, frequent item-sets

中图分类号: