Algorithm of Repeats-based Term Extraction and  Its Application in Text Clustering

doi:10.3969/j.issn.1000-3428.2007.02.022

Computer Engineering ›› 2007, Vol. 33 ›› Issue (02): 65-67.

• Software Technology and Database • Previous Articles Next Articles

Algorithm of Repeats-based Term Extraction and Its Application in Text Clustering

HU Jixiang1, 2, XU Hongbo1, LIU Yue1, CHENG Xueqi1

(1. Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100080; 2. Graduate School, Chinese Academy of Sciences, Beijing 100039)

Received:1900-01-01 Revised:1900-01-01 Online:2007-01-20 Published:2007-01-20

重复串特征提取算法及其在文本聚类中的应用

胡吉祥1,2，许洪波1，刘悦1，程学旗1

(1. 中国科学院计算技术研究所，北京 100080；2. 中国科学院研究生院，北京 100039)

Abstract

Abstract: This paper proposes a novel term extraction method based on repeats, which can extract meaningful terms from text. For Chinese, it need not word segmentation. Experimental results show that the proposed approach can remarkably reduce the dimensionality and effectively improve the performance of traditional clustering algorithms.

Key words: Text clustering, Term extraction, Repeats

摘要： 针对Web文档的高维问题及网络新语言给现有分词系统带来的挑战，该文提出一种基于重复串的特征提取方法，可以从文本中提取有意义的特征，且对于中文无需分词。实验表明，该方法可以降低特征空间维度，同时能有效改善传统以词为特征的聚类算法的性能。

关键词: 文本聚类, 特征提取, 重复串

HU Jixiang; ; XU Hongbo; LIU Yue; CHENG Xueqi. Algorithm of Repeats-based Term Extraction and Its Application in Text Clustering[J]. Computer Engineering, 2007, 33(02): 65-67.

胡吉祥;许洪波;刘悦;程学旗. 重复串特征提取算法及其在文本聚类中的应用[J]. 计算机工程, 2007, 33(02): 65-67.

/ Recommend / Download Citations

URL:

https://www.ecice06.com/EN/Y2007/V33/I02/65

[1]	XU Weijia, QIN Yongbin, HUANG Ruizhang, CHEN Yanping. Multi-Source Text Topic Model Based on DMA and Feature Division [J]. Computer Engineering, 2021, 47(7): 59-66.
[2]	XIAO Xiaoli,WU Yao,ZHOU Xiling,LIAO Zhuofan. Two-stage Text Feature Selection Algorithm Based on Differential Evolution [J]. Computer Engineering, 2019, 45(2): 303-309,314.
[3]	TAO Shu-yi, WANG Ming-wen, WAN Jian-yi, LUO Yuan-sheng, ZUO Jia-li. An Incremental Text Clustering Algorithm Based on Cluster Congruence [J]. Computer Engineering, 2014, 40(6): 195-200.
[4]	QIU Yun-fei, WANG Lin-ying, SHAO Liang-shan, GUO Hong-mei. User Interest Modeling Approach Based on Short Text of Micro-blog [J]. Computer Engineering, 2014, 40(2): 275-279.
[5]	WANG Yonggui,LIN Lin,LIU Xianguo. Research on Text Clustering Algorithm Based on Improved Particle Swarm Optimization [J]. Computer Engineering, 2014, 40(11): 172-177.
[6]	CAO Ze-Wen, ZHOU Tao. Design and Implementation of JP Algorithm Based on MapReduce [J]. Computer Engineering, 2012, 38(24): 14-16.
[7]	NIE Dun-Lan, MAO Wei-Wei, WANG Chang-Wu, WANG Bao-Wen, LIU Wen-Yuan. Identification Method of Tandem Repeats Based on Spectral Analysis [J]. Computer Engineering, 2011, 37(9): 181-183.
[8]	ZHONG Jiang, LIU Long-Hai, LIANG Chuan-Wei. Active Semi-supervised Text Clustering Based on Pairwise Constraints [J]. Computer Engineering, 2011, 37(13): 183-186.
[9]	CA Yue, YUAN Jin-Sheng. Text Clustering Based on Improved DBSCAN Algorithm [J]. Computer Engineering, 2011, 37(12): 50-52.
[10]	TU Yong-Gong, BAI Wen-Yang. Text Clustering Based on Automatic Partition of Feature Item Weight [J]. Computer Engineering, 2011, 37(11): 25-27.
[11]	QU Chao, BO Xiao-Heng, SHU Jun, CA Shao-Zhong, HU Tian-Meng. Text Clustering Method Based on Word Hyperclique [J]. Computer Engineering, 2011, 37(11): 86-88.
[12]	CHEN Cong, HAN Jian-Min, GU Jiong, XIN De-Dong. Statistical Algorithm for DNA Repeats Frequency Based on Finite State Automaton [J]. Computer Engineering, 2011, 37(11): 184-186,189.
[13]	MA Shi-xia; LIU Dan; JIA Shi-jie. Text Clustering Algorithm Based on Ant Colony Algorithm [J]. Computer Engineering, 2010, 36(8): 206-207.
[14]	TANG Guo; CHEN Hong-gang. Text Clustering Method Based on BBS Hot Topics Discovery [J]. Computer Engineering, 2010, 36(7): 79-81.
[15]	ZHENG Jun; WANG Wei; YANG Wu; YANG Yong-tian. Text Clustering Evaluation Method Based on Parameter Estimation of Distances Between Clusters [J]. Computer Engineering, 2009, 35(9): 37-39,4.

Please choose a citation manager

Content to export

Algorithm of Repeats-based Term Extraction and Its Application in Text Clustering

重复串特征提取算法及其在文本聚类中的应用

PDF

Knowledge

Cited

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics

Comments

模态框（Modal）标题

Please choose a citation manager

Content to export

Algorithm of Repeats-based Term Extraction and Its Application in Text Clustering

重复串特征提取算法及其在文本聚类中的应用

PDF

Knowledge

Cited

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics

Comments