Design and Implementation of JP Algorithm Based on MapReduce

doi:10.3969/j.issn.1000-3428.2012.24.004

Computer Engineering ›› 2012, Vol. 38 ›› Issue (24): 14-16. doi: 10.3969/j.issn.1000-3428.2012.24.004

Special Issue: ；

• Networks and Communications • Previous Articles Next Articles

Design and Implementation of JP Algorithm Based on MapReduce

CAO Ze-wen, ZHOU Yao

(College of Information System and Management, National University of Defense Technology, Changsha 410073, China)

Received:2012-04-16 Revised:2012-06-14 Online:2012-12-20 Published:2012-12-18

基于MapReduce的JP算法设计与实现

曹泽文，周姚

(国防科学技术大学信息系统与管理学院，长沙 410073)

作者简介:曹泽文(1967－)，男，研究员、博士，主研方向：信息综合处理，辅助决策；周姚，硕士

Abstract

Abstract: This paper analyzes the prevalent problems such as massiveness, high-dimension and sparse of feature vector of the ordinary algori- thms in clustering textual data, then proposes a massive text clustering based on cloud computing technology as a feasible solution. The classical Jarvis-Patrick(JP) algorithm is chosen as a case. It is implemented using MapReduce programming mode and is testified on the cloud computing platform-Hadoop with Sogou corpus provided by Sogou laboratory. Experimental results indicate that the JP algorithm can be paralleled in MapReduce framework and paralled algorithm can handle massive textual data and get a better time performance than single-node environment.

Key words: text mining, clustering analysis, text clustering, massive data, cloud computing, parallel data mining

摘要： 针对大规模文本聚类分析所面临的海量、高维、稀疏等难题，提出一种基于云计算的海量文本聚类解决方案。选择经典聚类算法Jarvis-Patrick(JP)作为案例，采用云计算平台的MapReduce编程模型对JP聚类算法进行并行化改造，利用搜狗实验室提供的语料库在 Hadoop平台上进行实验验证。实验结果表明，JP算法并行化改造可行，且相对于单节点环境，该算法在处理大规模文本数据时具有更好的时间性能。

关键词: 文本挖掘, 聚类分析, 文本聚类, 海量数据, 云计算, 并行数据挖掘

CLC Number:

TP391

CAO Ze-Wen, ZHOU Tao. Design and Implementation of JP Algorithm Based on MapReduce[J]. Computer Engineering, 2012, 38(24): 14-16.

曹泽文, 周姚. 基于MapReduce的JP算法设计与实现[J]. 计算机工程, 2012, 38(24): 14-16.

/ / Recommend / Download Citations

URL: http://www.ecice06.com/EN/10.3969/j.issn.1000-3428.2012.24.004

http://www.ecice06.com/EN/Y2012/V38/I24/14

[1]	DAI Haolei, HUANG Yonghui, ZHOU Guoxu. Clustering Analysis Based on Hyper-graph Regularized Non-Negative Tensor Train Decomposition [J]. Computer Engineering, 2023, 49(6): 81-89.
[2]	LIU Zhibin, HUANG Qiulan, HU Qingbao, CHENG Yaodong, HU Yu, TIAN Haolai. Design and Implementation of Fine-Grained Scheduling Strategy for Kubernetes Heterogeneous Resources [J]. Computer Engineering, 2023, 49(2): 31-36,45.
[3]	XI Zhiwen, CAI Jingjing, YANG Wenmin, CHAI Zhilei. Concurrent Request Scheduling Mechanism for FPGA Cloud Platform Based on Microservice Architecture [J]. Computer Engineering, 2022, 48(7): 206-213.
[4]	HE Xiaowei, XU Jingjie, WANG Bin, WU Hao, ZHANG Bowen. Research on Cloud Computing Resource Load Forecasting Based on GRU-LSTM Combination Model [J]. Computer Engineering, 2022, 48(5): 11-17,34.
[5]	CHEN Luyao, LIU Qilong, XU Yunxia, CHEN Zhen. Image Clustering Algorithm Based on Hypergraph Regularized Nonnegative Tucker Decomposition [J]. Computer Engineering, 2022, 48(4): 197-205.
[6]	ZHAO Yufeng, LEI Sheng, ZHANG Guogang, GENG Yingsan. Design and Development of Container-based Cloud Platform for Power Equipment Simulation [J]. Computer Engineering, 2021, 47(9): 171-177,184.
[7]	SHI Lingpeng, ZHU Zheng, ZHOU Junsong, LI Xin, LI Jing. Load Balancing Mechanism for Microservice Architecture in Cloud-based Systems [J]. Computer Engineering, 2021, 47(9): 44-50,58.
[8]	XU Weijia, QIN Yongbin, HUANG Ruizhang, CHEN Yanping. Multi-Source Text Topic Model Based on DMA and Feature Division [J]. Computer Engineering, 2021, 47(7): 59-66.
[9]	NI Siyuan, HU Hongchao, LIU Wenyan, LIANG Hao. Heterogeneous Cloud Resource Allocation Algorithm Based on Rotation Strategy [J]. Computer Engineering, 2021, 47(6): 44-51,67.
[10]	YE Peiwen, JIA Xiangdong, YANG Xiaorong, NIU Chunyu. Collaborative Edge and Cloud Offloading for Internet of Vehicles Using Multi-Agent Reinforcement Learning [J]. Computer Engineering, 2021, 47(4): 13-20.
[11]	LI Lingshu, WU Jiangxing. SaaS Security Oriented Virtual Network Function Embedding Method Under Cloud-Network Integration [J]. Computer Engineering, 2021, 47(12): 30-39.
[12]	WANG Yan, GE Haibo, FENG Anqi. Computation Offloading Strategy in Cloud-Assisted Mobile Edge Computing [J]. Computer Engineering, 2020, 46(8): 27-34.
[13]	ZHANG Jiyan, ZHENG Hanyuan. Research on Scientific Workflow Scheduling Based on Budget Allocation in Cloud Environment [J]. Computer Engineering, 2019, 45(9): 40-48.
[14]	TANG Hongcheng, WEN Chang, FENG Wenxiang, XIE Kai, FANG Wenqing. Rapid Display Method of Massive Data Based on Intelligent Clustering Model [J]. Computer Engineering, 2019, 45(8): 53-59.
[15]	XU Yuntao, XU Wujun, ZHAI Menglin. High Concurrency Iris Recognition System Based on B/S Architecture [J]. Computer Engineering, 2019, 45(8): 102-106,112.

Please choose a citation manager

Content to export

Design and Implementation of JP Algorithm Based on MapReduce

基于MapReduce的JP算法设计与实现

PDF

Knowledge

Cited

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics

Comments

模态框（Modal）标题

Please choose a citation manager

Content to export

Design and Implementation of JP Algorithm Based on MapReduce

基于MapReduce的JP算法设计与实现

PDF

Knowledge

Cited

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics

Comments