Latent Document Similarity Model

doi:10.3969/j.issn.1000-3428.2009.15.011

Computer Engineering ›› 2009, Vol. 35 ›› Issue (15): 32-34. doi: 10.3969/j.issn.1000-3428.2009.15.011

• Software Technology and Database • Previous Articles Next Articles

Latent Document Similarity Model

JIA Xi-ping1, LIU Hai-zhu2

(1. School of Computer Science, Guangdong Polytechnic Normal University, Guangzhou 510665; 2. Zengcheng College, South China Normal University, Guangzhou 511363)

Received:1900-01-01 Revised:1900-01-01 Online:2009-08-05 Published:2009-08-05

一种潜在文档相似模型

贾西平1，刘海珠2

(1. 广东技术师范学院计算机科学学院，广州 510665；2. 华南师范大学增城学院，广州 511363)

Abstract

Abstract: This paper proposes a Latent Document Similarity Model(LDSM). It denotes each document pair as a bipartite graph, where each node is a latent topic, and each edge is weighted with the similarity between the corresponding topics, and it represents the document similarity as the optimal matching of the bipartite graph. Experimental results show that LDSM outperforms the document similarity model based on TextTiling and the optimal matching of bipartite graph at both average precision and average recall.

Key words: topic, document similarity, document retrieval, information retrieval

摘要： 提出一种潜在文档相似模型(LDSM)，把每对文档看作一个二分图，把文档的潜在主题看作二分图的顶点，用主题间的加权相似度为相应边赋权值，并用二分图的最佳匹配表示文档的相似度。实验结果表明，LDSM的平均查准率和平均查全率都优于用TextTiling和二分图最佳匹配方法构建的文档相似模型。

关键词: 主题, 文档相似度, 文档检索, 信息检索

CLC Number:

TP311

JIA Xi-ping; LIU Hai-zhu. Latent Document Similarity Model[J]. Computer Engineering, 2009, 35(15): 32-34.

贾西平;刘海珠. 一种潜在文档相似模型[J]. 计算机工程, 2009, 35(15): 32-34.

/ / Recommend / Download Citations

URL: http://www.ecice06.com/EN/10.3969/j.issn.1000-3428.2009.15.011

http://www.ecice06.com/EN/Y2009/V35/I15/32

[1]	LIU Jinshuo, LIU Ning. Automatic Generation of Semi-Structured Texts for Bidding Documents [J]. Computer Engineering, 2023, 49(3): 67-72.
[2]	LI Qin, LI Shaobo, HU Jie. Research on Tourist Portrait Based on Joint Topic-Sentiment Analysis [J]. Computer Engineering, 2022, 48(6): 278-287,294.
[3]	WANG Lu, LIU Xiaoqing, HE Zhenying. Frequent Word Sequence Mining Algorithm in Continuous Time Interval [J]. Computer Engineering, 2022, 48(2): 79-85,91.
[4]	LEI Henglin, Gulanbaier Tuerhong, Mairidan Wushouer, ZENG Qi. Topic Model of Lifelong Machine Learning Based on Hellinger Distance and Word Vector [J]. Computer Engineering, 2022, 48(11): 89-95.
[5]	LI Pei, CHEN Qiaosong, CHEN Pengchang, DENG Xin, WANG Jin, PIAO Changhao. Multi-Modal Fine-Grained Retrieval Based on Modal Specific and Modal Shared Feature Information [J]. Computer Engineering, 2022, 48(11): 62-68,76.
[6]	GAO Yongbing, LI Yuxuan, GAO Juntian, MA Zhanfei. Research on Weibo Text Generation Technology Based on User Intention [J]. Computer Engineering, 2022, 48(1): 119-126.
[7]	XU Weijia, QIN Yongbin, HUANG Ruizhang, CHEN Yanping. Multi-Source Text Topic Model Based on DMA and Feature Division [J]. Computer Engineering, 2021, 47(7): 59-66.
[8]	YUAN Ziyong, GAO Shu, CAO Jiao, CHEN Liangchen. Method for Few-Shot Short Text Classification Based on Heterogeneous Graph Convolutional Network [J]. Computer Engineering, 2021, 47(12): 87-94.
[9]	LIU Xin, BAI Tingting, ZHANG Yushu, QIAN Gennan, HE Xuli, XI Yongke. Potential Relationship Extension Based on EA-LDA Algorithm for Domain-Specific Knowledge Graph [J]. Computer Engineering, 2021, 47(10): 89-96,102.
[10]	CHEN Wenjie. An Adaptive Approach for Knowledge Representation Fused with Topic Feature [J]. Computer Engineering, 2021, 47(1): 87-93,100.
[11]	GAO Maoting, WANG Ji. Topic Model Recommendation Algorithm Combining Social Relationship and Time Factors [J]. Computer Engineering, 2020, 46(3): 66-72.
[12]	QIN Tingting, LIU Zheng, CHEN Kejia. Topic Model Combining Topic Word Embedding and Attention Mechanism [J]. Computer Engineering, 2020, 46(11): 104-108.
[13]	LIU Yining,SHEN Yanming. Topic mining and ratings prediction joint model based on lifelong machine learning [J]. Computer Engineering, 2019, 45(6): 237-241,248.
[14]	YAO Li,ZHANG Xihuang. An Improved Topics over Time Model Based on Label [J]. Computer Engineering, 2019, 45(4): 205-210,216.
[15]	ZHANG Kaihang,XU Kefu,ZHANG Chuang. Web News Extraction Method Based on Topic Weight of Wildcard Node [J]. Computer Engineering, 2019, 45(4): 275-280.

Please choose a citation manager

Content to export

Latent Document Similarity Model

一种潜在文档相似模型

PDF

Knowledge

Cited

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics

Comments

模态框（Modal）标题

Please choose a citation manager

Content to export

Latent Document Similarity Model

一种潜在文档相似模型

PDF

Knowledge

Cited

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics

Comments