Text Clustering Approach Based on Content Characteristics

doi:10.3969/j.issn.1000-3428.2007.14.008

Computer Engineering ›› 2007, Vol. 33 ›› Issue (14): 24-26，3.

• Degree Paper • Previous Articles Next Articles

Text Clustering Approach Based on Content Characteristics

LI Xiaoguang 1, SONG Baoyan 1, YU Ge 2, WANG Daling 2

(1. School of Information Science and Technology, Liaoning University, Shenyang 110036; 2. School of Information Science and Engineering, Northeastern University, Shenyang 110004)

Received:1900-01-01 Revised:1900-01-01 Online:2007-07-20 Published:2007-07-20

一种基于内容特性的文本聚类方法

李晓光1，宋宝燕1，于戈2，王大玲2

(1. 辽宁大学信息科学与技术学院，沈阳 110036；2. 东北大学信息科学与工程学院，沈阳 110004)

Abstract

Abstract: The fitness of cluster model to data distribution is critical to probabilistic-model-based clustering. The single-component model fails to capture the distribution of document data completely because of the complexity of content-based distribution of document. This paper considers the characteristics of document are influenced mainly by two components: topic and general writting style, proposes the content-based cluster model mixed by topic model and general model, and gives the document clustering algorithm. Experimental results indicate that the content-based cluster model shows better fitness than single-component model and gets better quality of clustering.

Key words: clustering, probabilistic-model-based clustering, mixture model, EM algoritlim

摘要： 在基于概率模型的聚类中，簇模型对数据分布的拟合性直接影响着聚类质量。基于内容的文本数据分布的复杂性导致单一因素的簇模型无法准确拟合文本数据的分布特征。该文认为文本基于内容的分布特性主要受主题内容和通用写作方式影响，给出了一种基于主题模型和通用模型的混合簇模型和基于该簇模型的文本聚类方法。实验表明该聚类方法较单一因素的簇模型具有更好的拟合性，聚类质量更好。

关键词: 聚类, 基于概率模型的聚类, 混合模型, EM子方法

CLC Number:

TP391

LI Xiaoguang ; SONG Baoyan ; YU Ge ; WANG Daling. Text Clustering Approach Based on Content Characteristics[J]. Computer Engineering, 2007, 33(14): 24-26，3.

李晓光;宋宝燕;于戈;王大玲. 一种基于内容特性的文本聚类方法[J]. 计算机工程, 2007, 33(14): 24-26，3.

/ Recommend / Download Citations

URL:

https://www.ecice06.com/EN/Y2007/V33/I14/24

[1]	GUO Jipeng, XU Shilong, LONG Jiahao, WANG Youqing, SUN Yanfeng, YIN Baocai. Multi-view Subspace Clustering Based on Dual Cross-view Correlation Detection [J]. Computer Engineering, 2025, 51(4): 27-36.
[2]	LI Qiwen, WANG Zhihe, DU Hui, LU Depeng. Adaptive Density Peak Clustering Algorithm Based on Gaussian Distribution [J]. Computer Engineering, 2025, 51(4): 137-148.
[3]	NIE Lei, HU Zisheng, BAO Haizhou. Heterogeneous Vehicular Network Selection Method Based on RSU-assisted and Adaptive Clustering [J]. Computer Engineering, 2025, 51(3): 162-171.
[4]	Hongjiao LI, Baojin WANG, Zhaohui WANG, Renhao HU. Dual-Client Selection Algorithm Based on Model Similarity and Local Loss [J]. Computer Engineering, 2024, 50(8): 153-164.
[5]	HU Aoran, CHEN Xiaohong. One-step Multi-view Clustering Based on Diversity and Consistency [J]. Computer Engineering, 2024, 50(5): 51-61.
[6]	Yue MA, Mi WEN. Spatial Load Forecasting Method Based on Multiscale LDTW and TCN [J]. Computer Engineering, 2024, 50(3): 106-113.
[7]	Huawei SONG, Shengqi LI, Fangjie WAN, Yuping WEI. Federated Learning Optimization Method in Non-IID Scenarios [J]. Computer Engineering, 2024, 50(3): 166-172.
[8]	Lijuan WANG, Jinping XING, Ming YIN, Zhifeng HAO, Ruichu CAI, Wen WEN. Weight Adaptive Multi-view Spectral Clustering Algorithm Based on Consistent Graphs [J]. Computer Engineering, 2024, 50(2): 122-131.
[9]	PAN Wei, HUANG Ruizhang, REN Lina, XUE Jingjing. Deep Document Clustering Based on Adaptive Structural Learning [J]. Computer Engineering, 2024, 50(11): 89-97.
[10]	ZHANG Yujie, GAO Han. Image Segmentation Algorithm for Stamping Defects Based on Improved FCM [J]. Computer Engineering, 2024, 50(10): 342-351.
[11]	LIU Daxing, GU Naijie, HUANG Zhangjin, SU Junjie, QI Dongsheng. A Sampling Algorithm for Software Prefetching Using Memory Access Traces [J]. Computer Engineering, 2024, 50(10): 362-369.
[12]	ZHANG Junna, HAN Chaochen, CHEN Jiawei, ZHAO Xiaoyan, YUAN Peiyan. A Method for Joint Edge Server Deployment and Service Placement [J]. Computer Engineering, 2024, 50(10): 266-280.
[13]	Sihui LIU, Quanxue GAO, Wei SONG, Deyan XIE. Multiview Spectral Clustering Based on Weighted Tensor Low-Rank Constraint [J]. Computer Engineering, 2024, 50(1): 129-137.
[14]	Yuyan JIANG, Chengfeng TAO, Ping LI. Deep Subspace Clustering Algorithm with Data Augmentation and Adaptive Self-Paced Learning [J]. Computer Engineering, 2023, 49(8): 96-103, 110.
[15]	Meiguang ZHENG, Yong YANG. Personalized Federated Learning Algorithm Based on Mutual Information and Soft Clustering [J]. Computer Engineering, 2023, 49(8): 20-28.

Please choose a citation manager

Content to export

Text Clustering Approach Based on Content Characteristics

一种基于内容特性的文本聚类方法

PDF

Knowledge

Cited

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics

Comments

模态框（Modal）标题

Please choose a citation manager

Content to export

Text Clustering Approach Based on Content Characteristics

一种基于内容特性的文本聚类方法

PDF

Knowledge

Cited

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics

Comments