Clustering-based Semantic Retrieval Algorithm

doi:10.3969/j.issn.1000-3428.2012.02.011

Computer Engineering ›› 2012, Vol. 38 ›› Issue (2): 36-38. doi: 10.3969/j.issn.1000-3428.2012.02.011

• Networks and Communications • Previous Articles Next Articles

Clustering-based Semantic Retrieval Algorithm

XIANG He-lin, ZHANG Ming-xi, LI Po-han, HE Zhen-ying, WANG Wei

(School of Computer Science, Fudan University, Shanghai 201203, China)

Received:2011-07-22 Online:2012-01-20 Published:2012-01-20

一种基于聚类的语义检索算法

向河林，张明西，李珀瀚，何震瀛，汪卫

(复旦大学计算机科学技术学院，上海 201203)

作者简介:向河林(1986－)，男，硕士，主研方向：数据挖掘，信息检索；张明西，博士；李珀瀚，硕士；何震瀛，讲师、博士；汪卫，教授、博士生导师
基金资助:
国家自然科学基金资助项目(60703093)

Abstract

Abstract: Latent Semantic Analysis(LSA) lacks computation efficiency and has storage deficiencies when it is used in the large scale semantic retrieval. To solve this problem, this paper proposes a clustering-based semantic retrieval algorithm. This algorithm clusters the documents using their structural information, and applies the LSA process on those clusters to efficiently reduce the number of documents. Experimental results show that the algorithm can exponentially decrease the time of inquiring and get good retrieval accuracy.

Key words: Latent Semantic Analysis(LSA), information retrieval, vector space model, graph clustering algorithm

摘要： 潜在语义分析在进行大规模语义检索时计算效率较低、存储开销较大。针对该问题，提出一种基于聚类的潜在语义检索算法。通过文档之间的结构关系对文档进行聚类，利用簇代替文档分析潜在语义，以此减少处理文档的个数。实验结果表明，该算法能减少查询时间，且检索精确度较高。

关键词: 潜在语义分析, 信息检索, 向量空间模型, 图聚类算法

CLC Number:

TP301.6

XIANG He-Lin, ZHANG Meng-Xi, LI Po-Han, HE Shen-Ying, HONG Wei. Clustering-based Semantic Retrieval Algorithm[J]. Computer Engineering, 2012, 38(2): 36-38.

向河林, 张明西, 李珀瀚, 何震瀛, 汪卫. 一种基于聚类的语义检索算法[J]. 计算机工程, 2012, 38(2): 36-38.

/ / Recommend / Download Citations

URL: http://www.ecice06.com/EN/10.3969/j.issn.1000-3428.2012.02.011

http://www.ecice06.com/EN/Y2012/V38/I2/36

[1]	LI Pei, CHEN Qiaosong, CHEN Pengchang, DENG Xin, WANG Jin, PIAO Changhao. Multi-Modal Fine-Grained Retrieval Based on Modal Specific and Modal Shared Feature Information [J]. Computer Engineering, 2022, 48(11): 62-68,76.
[2]	GAO Jun,HUANG Xiance. Design and Implementation of Correlation Weight Algorithm Based on Hadoop Platform [J]. Computer Engineering, 2019, 45(3): 26-31.
[3]	ZHANG Qianqian,TIAN Xuedong,YANG Fang,LI Xinfu. Integration Retrieval Model Based on Transformation of Mathematical Text and Expression [J]. Computer Engineering, 2019, 45(3): 175-181,187.
[4]	SAIMAITI Maimaitimin, ESMAEL Abdurehim. Research on Uyghur Stop Words Extraction Method [J]. Computer Engineering, 2019, 45(10): 288-292,300.
[5]	CHANG Lin,HUANG Zhiqing,ZHANG Yanxin. Node Fault Diagnosis Algorithm in WSN Based on Data Driven [J]. Computer Engineering, 2017, 43(9): 105-109.
[6]	HUANG Wenming,MO Yang. Chinese Spam Message Filtering Based on Text Weighted KNN Algorithm [J]. Computer Engineering, 2017, 43(3): 193-199.
[7]	WANG Ying,LUO Zhunchen,YU Yang. Research on Microblog Diversification Retrieval Problem Based on Rank Learning Model [J]. Computer Engineering, 2017, 43(11): 152-160.
[8]	QIN Huazheng,HU Zhongshun,YANG Deqing,XIAO Yanghua. Encyclopedia Related Entity Construction Based on Category Template Mining [J]. Computer Engineering, 2016, 42(9): 180-185,191.
[9]	SHI Yan,LI Chaofeng. Query Recommendation Based on Collaborative Similarity Calculation [J]. Computer Engineering, 2016, 42(8): 188-193.
[10]	WU Guangxian,LIU Nianyi,LIU Boya. Design and Application of BGN-type CPA Secure Encryption Scheme Based on LWE [J]. Computer Engineering, 2016, 42(12): 118-123.
[11]	MA Leilei,LI Hongwei,LIAN Shiwei,LIANG Rupeng,CHEN Hu. A Strategy of Disaster Focused Crawler Based on Ontology Semantics [J]. Computer Engineering, 2016, 42(11): 50-56.
[12]	JI Pengfei,LI Yuangang,LU Shengqi,DAI Kaiyu. Personalized Customization System of Travel Route Based on Semantic Web [J]. Computer Engineering, 2016, 42(10): 308-317.
[13]	DENG Xiaojun,MAN Junfeng,OUYANG Min. Online Evaluation Algorithm of Sorting Device Based on K-armed Dueling Bandits Problem [J]. Computer Engineering, 2015, 41(9): 271-275.
[14]	LI Jinzhong,YANG Wei,XIA Jiewu,ZENG Xiaohui,SUN Lingyu. Learning to Rank Method Based on Hooke & Jeeves Pattern Search [J]. Computer Engineering, 2015, 41(7): 215-218.
[15]	XU Tao, YU Hong-zhi, JIA Yang-ji. Tibetan Document Representation Method Based on Improved Chi-squared Statistic [J]. Computer Engineering, 2014, 40(6): 185-189.

Please choose a citation manager

Content to export

Clustering-based Semantic Retrieval Algorithm

一种基于聚类的语义检索算法

PDF

Knowledge

Cited

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics

Comments

模态框（Modal）标题

Please choose a citation manager

Content to export

Clustering-based Semantic Retrieval Algorithm

一种基于聚类的语义检索算法

PDF

Knowledge

Cited

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics

Comments