计算机工程 ›› 2012, Vol. 38 ›› Issue (10): 63-66.doi: 10.3969/j.issn.1000-3428.2012.10.018

• 软件技术与数据库 • 上一篇    下一篇

基于关联规则和粗糙集的话题特征提取方法

高 飞,周学广,孙 艳   

  1. (海军工程大学电子工程学院,武汉 430033)
  • 收稿日期:2011-06-30 出版日期:2012-05-20 发布日期:2012-05-20
  • 作者简介:高 飞(1988-),男,硕士研究生,主研方向:网络信息安全;周学广,教授、博士生导师;孙 艳,博士研究生
  • 基金项目:
    海军工程大学自然科学基金资助项目(HGDYDJJ10008)

Topic Feature Extraction Method Based on Association Rule and Rough Set

GAO Fei, ZHOU Xue-guang, SUN Yan   

  1. (College of Electronic Engineering, Navy University of Engineering, Wuhan 430033, China)
  • Received:2011-06-30 Online:2012-05-20 Published:2012-05-20

摘要: 针对话题分类文本训练集少、主题相似度大的特点,提出一种基于关联规则和粗糙集的话题特征提取方法。在向量空间模型的基础上,采用挖掘关联规则的方式生成规则集与文本主体,通过调节事务主体的最小支持度与最小置信度查找不同颗粒层次的话题,利用粗糙集理论对词语特征与关联特征进行属性约简。实验结果表明,该方法能提取文本集中描述的评论主题,具有较高的话题分类准确率。

关键词: 关联规则, 粗糙集, 特征提取, 话题检测与跟踪, 向量空间模型, 属性约简

Abstract: Aiming at the characteristic that topic classification lacks training samples and has similar topics, this paper proposes a topic feature extracting method based on association rule and rough set. The method uses associated rules mining to generate rules set and text topic, finds different particle levels of topic by regulating the minimum support and minimum confidence of subject matters, and reduces attributes by Vector Space Model(VSM) combined with rough set. Experimental result shows that the method can preferably mine the text topic, and prompt the precise rate in topic classification.

Key words: association rule, rough set, feature extraction, Topic Detection and Tracking(TDT), Vector Space Model(VSM), attribute reduction

中图分类号: