作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2010, Vol. 36 ›› Issue (15): 83-85. doi: 10.3969/j.issn.1000-3428.2010.15.029

• 软件技术与数据库 • 上一篇    下一篇

基于频繁项集的多标签文本分类算法

吕小勇,石洪波   

  1. (山西财经大学信息管理学院,太原 030006)
  • 出版日期:2010-08-05 发布日期:2010-08-25
  • 作者简介:吕小勇(1982-),男,硕士研究生,主研方向:机器学习,数据挖掘;石洪波,教授、博士
  • 基金资助:
    国家自然科学基金资助项目(60873100);山西省自然科学基金资助项目(2009011017-4)

Multi-label Text Classification Algorithm Based on Frequent Item Sets

LV Xiao-yong, SHI Hong-bo   

  1. (Information Management Institute, Shanxi University of Finance & Economics, Taiyuan 030006)
  • Online:2010-08-05 Published:2010-08-25

摘要: 针对多标签文本分类问题,提出基于频繁项集的多标签文本分类算法——MLFI。该算法利用FP-growth算法挖掘类别之间的频繁项集,同时为每个类计算类标准向量和相似度阈值,如果文本与类标准向量的相似度大于相应阈值则归到相应的类别,在分类结束后利用挖掘到的类别之间的关联规则对分类结果进行校验。实验结果表明,该算法有较高的分类性能。

关键词: 多标签, 相似度, 频繁项集, 关联规则

Abstract: Aiming at the problem of multi-label text classification, this paper proposes a multi-label text classification algorithm based on frequent item sets. It uses FP-growth algorithm for mining frequent item sets between labels, calculates prototype vector and similarity threshold for each class, if the similarity between prototype vector and text are greater than the corresponding threshold, then classifies the text into corresponding category. After classifying, the association rules between the class are utilized to verify the result of classification. Experimental results show that the algorithm has a higher ability of classification performance.

Key words: multi-label, similarity, frequent item se, association rules

中图分类号: