K-Nearest Neighbor Multi-Label Learning Based on Label Correlation

doi:10.19678/j.issn.1000-3428.0063270

Abstract

Abstract: Multi-label learning is a popular research topic in the field of machine learning.It can effectively solve multi-lingualism in the real world.In multi-label learning, a certain correlation exists between multiple labels of the sample.Ignoring the correlation between labels reduces the generalization performance of the model.Concerning multi-label learning, a multi-label learning, K-nearest neighbor algorithm based on the correlation between labels is proposed to fully excavate the correlation between multiple labels of samples, using the Fp_growth algorithm to obtain the frequent item-sets of tags.For frequent items and labels, the scoring and threshold models are constructed.The scoring model measures the correlation between the sample and frequent items or labels.The threshold model solves the discrimination threshold corresponding to frequent items or labels.Combining these models, the frequent items of the sample are predicted, and the sample label set is then determined.The results on the classical Emotions and Scene datasets show that the F1-Measure index of the algorithm achieved 66.6% and 73.3%, respectively.Compared with benchmark methods, such as CC, LP, RAKE, and MLDF, the F1-Measure of the algorithm improved by an average of 3.8 and 2.1 percentage points.The algorithm effectively improves the classification performance by rationally using the correlation between labels.

Key words: machine learning, multi-label learning, label correlation, K-nearest neighbor, frequent item-sets

摘要： 多标签学习是机器学习领域的一个研究热点，其能够有效解决真实世界中的多语义问题。在多标签学习任务中，样本的多个标签之间存在一定的关联关系，忽略标签间的相关性会导致模型的泛化性能降低。提出一种基于标签间相关性的多标签学习K近邻算法。充分挖掘样本多标签间的相关性，通过Fp_growth算法得到标签的频繁项集。针对频繁项和标签分别构建评分模型和阈值模型，评分模型用于衡量样本与频繁项或标签之间的关联程度，阈值模型用于求解频繁项或标签对应的判别阈值，结合评分模型和阈值模型对样本所属频繁项进行预测，进而确定样本标签集。在经典数据集Emotions和Scene上的实验结果表明，该算法的F1-Measure指标分别达到66.6%和73.3%，相比CC、LP、RAKEL、MLDF等基准方法，其F1-Measure分别平均提高3.8和2.1个百分点，该算法通过合理利用标签间的相关性使得分类性能得到有效提升。

关键词: 机器学习, 多标签学习, 标签相关性, K近邻, 频繁项集

CLC Number:

TP391

QIAN Long, ZHAO Jing, HAN Jingyu, MAO Yi. K-Nearest Neighbor Multi-Label Learning Based on Label Correlation[J]. Computer Engineering, 2022, 48(6): 73-78,88.

钱龙, 赵静, 韩京宇, 毛毅. 基于标签相关性的K近邻多标签学习[J]. 计算机工程, 2022, 48(6): 73-78,88.

/ / Recommend / Download Citations

URL: http://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0063270

http://www.ecice06.com/EN/Y2022/V48/I6/73

Figures/Tables 7

References

[1] 王进, 徐巍, 丁一, 等.基于图嵌入和区域注意力的多标签文本分类[J].江苏大学学报(自然科学版), 2022, 43(3):310-318. WANG J, XU W, DING Y, et al.Multi-label text classification based on graph embedding and region attention[J].Journal of Jiangsu University (Natural Science Edition), 2022, 43(3):310-318.(in Chinese)
[2] MINAEE S, KALCHBRENNER N, CAMBRIA E, et al.Deep learning-based text classification:a comprehensive review[J].ACM Computing Surveys, 2022, 54(3):62.
[3] KOWSAR I, MEIMANDI J, HEIDARYSAF A, et al.Text classification algorithms:a survey[J].Information, 2019, 10(4):150.
[4] ZHANG Z L, ZHANG M L.Multi-instance multi-label learning with application to scene classification[EB/OL].[2021-10-05].https://cs.nju.edu.cn/zhouzh/zhouzh.files/publication/nips06.pdf.
[5] HAN J Y, SUN G P, SONG X H, et al.Detecting ECG abnormalities using an ensemble framework enhanced by Bayesian belief network[J].Biomedical Signal Processing and Control, 2022, 72:103320.
[6] ZHANG M L, ZHOU Z H.A review on multi-label learning algorithms[J].IEEE Transactions on Knowledge and Data Engineering, 2014, 26(8):1819-1837.
[7] HE J, KORTYLEWSKI A, YANG S K, et al.Rethinking re-sampling in imbalanced semi-supervised learning[EB/OL].[2021-10-05].https://arxiv.org/abs/2106.00209.
[8] TSOUMAKAS G, KATAKIS I.Multi-label classification[J].International Journal of Data Warehousing and Mining, 2007, 3(3):1-13.
[9] CARVALHO A C P L F D, FREITAS A A.A tutorial on multi-label classification techniques[M].Berlin, Germany:Springer, 2009.
[10] ZHANG M L, ZHANG K.Multi-label learning by exploiting label dependency[C]//Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York, USA:ACM Press, 2010:999-1008.
[11] 徐鹏宇, 刘华锋, 刘冰, 等.标签推荐方法研究综述[J/OL]. 软件学报:1-5[2021-10-05].http://www.jos.org.cn/1000-9825/6481.htm. XU P Y, LIU H F, LIU B, et al.Research review of tag recommendation methods[J/OL].Journal of Software:1-5[2021-10-05].http://www.jos.org.cn/1000-9825/6481.htm.(in Chinese)
[12] HUANG S J, GAO W, ZHOU Z H.Fast multi-instance multi-label learning[J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(11):2614-2627.
[13] JUNIOR J D C, FARIA E R, SILVA J A, et al.Label powerset for multi-label data streams classification with concept drift[C]//Proceedings of the 5th Symposium on Knowledge Discovery, Mining and Learning.Washington D.C., USA:IEEE Press, 2017:97-104.
[14] TSOUMAKAS G, KATAKIS I, VLAHAVAS I.Random k-labelsets for multilabel classification[J].IEEE Transactions on Knowledge and Data Engineering, 2011, 23(7):1079-1089.
[15] READ J, PFAHRINGER B, HOLMES G, et al.Classifier chains for multi-label classification[J].Machine Learning, 2011, 85(3):333-359.
[16] YANG L, WU X Z, JIANG Y, et al.Multi-label learning with deep forest[EB/OL].[2021-10-05].https://arxiv.org/pdf/1911.06557.pdf.
[17] BHANDARI A, GUPTA A, DAS D.Improvised apriori algorithm using frequent pattern tree for real time applications in data mining[J].Procedia Computer Science, 2015, 46:644-651.
[18] CHEN S M, LIAO W T.Multiple attribute decision making using Beta distribution of intervals, expected values of intervals, and new score function of interval-valued intuitionistic fuzzy values[J].Information Sciences, 2021, 579:863-887.
[19] ZHANG M L, ZHOU Z H.ML-kNN:a lazy learning approach to multi-label learning[J].Pattern Recognition, 2007, 40(7):2038-2048.
[20] SHAH K, PATEL H, SANGHVI D, et al.A comparative analysis of logistic regression, random forest and KNN models for the text classification[J].Augmented Human Research, 2020, 5(1):1-16.
[21] TSOUMAKAS G, SPYROMITROS-XIOUFIS E, VILCEK J, et al.Mulan:a java library for multi-label learning[J].The Journal of Machine Learning Research, 2011, 12:2411-2414.
[22] TROCHIDIS K, TSOUMAKAS G, KALLIRIS G, et al.Multi-label classification of music into emotions[J].EURASIP Journal on Audio Speech and Music Processing, 2008(1):325-330.
[23] BOUTELL M R, LUO J B, SHEN X P, et al.Learning multi-label scene classification[J].Pattern Recognition, 2004, 37(9):1757-1771.
[24] FÜRNKRANZ J, HÜLLERMEIER E, LOZA MENCÍA E, et al.Multilabel classification via calibrated label ranking[J].Machine Learning, 2008, 73(2):133-153.
[25] SEBASTIANI F.Machine learning in automated text categorization[J].ACM Computing Surveys, 2002, 34(1):1-47.

Please choose a citation manager

Content to export