Name Disambiguation Based on   Dependency Feature in Web Page Text

doi:10.3969/j.issn.1000-3428.2012.19.035

Computer Engineering ›› 2012, Vol. 38 ›› Issue (19): 133-136.

• Networks and Communications • Previous Articles Next Articles

Name Disambiguation Based on Dependency Feature in Web Page Text

YANG Xin-xin ^1,2, LI Pei-feng ^1,2, ZHU Qiao-ming ^1,2

(1. School of Computer Science & Technology, Soochow University, Suzhou 215006, China; 2. Jiangsu Provincial Key Lab of Computer Information Processing Technology , Suzhou 215006, China)

Received:2011-12-30 Online:2012-10-05 Published:2012-09-29

基于网页文本依存特征的人名消歧

杨欣欣^1,2，李培峰^1,2，朱巧明^1,2

(1. 苏州大学计算机科学与技术学院，江苏苏州 215006； 2. 江苏省计算机信息处理技术重点实验室，江苏苏州 215006)

作者简介:杨欣欣(1988－)，男，硕士研究生，主研方向：自然语言处理，人名消歧；李培峰，副教授；朱巧明，教授
基金资助:
国家自然科学基金资助项目(60970056, 61070123, 61003155)；江苏省自然科学基金资助项目(BK2008160)；高等学校博士学科点专项基金资助项目(20093201110006)；模式识别国家重点实验室开放课题基金资助项目

Abstract

Abstract: This paper works on the common ambiguity problem on Internet. The following is the proposed method: extract the dependency features which are related to the key name entities in the Web page text, while extract supporting features such as named entity extraction; cluster these features by a two-step cluster algorithm which clusters the documents with high reliability in the first stage and then merges the other documents to the existing clustering results. Experimental result shows that the proposed disambiguation system has better performance than common methods.

Key words: name ambiguity, dependency feature, name disambiguation, named entity, clustering

摘要： 研究互联网中的人名消歧问题。抽取与网页文本中人名关键字实体相关的依存特征及命名实体等辅助特征，利用二层聚类算法，根据依存特征将可信度高的文档聚类，使用辅助特征将剩余文档加到现有聚类结果中，由此实现人名消歧。实验结果证明，该方法消歧效果优于其他人名消歧方法。

关键词: 人名歧义, 依存特征, 人名消歧, 命名实体, 聚类

CLC Number:

TP391

YANG Xin-Xin, LI Pei-Feng, SHU Qiao-Meng. Name Disambiguation Based on Dependency Feature in Web Page Text[J]. Computer Engineering, 2012, 38(19): 133-136.

杨欣欣, 李培峰, 朱巧明. 基于网页文本依存特征的人名消歧[J]. 计算机工程, 2012, 38(19): 133-136.

/ Recommend / Download Citations

URL:

https://www.ecice06.com/EN/Y2012/V38/I19/133

References

[1] Malin B, Airoldi E, Carley K M. A Network Analysis Model for Disambiguation of Names in Lists[J]. Computational & Mathematical Organization Theory, 2005, 11(2): 119-139.
[2] Bagga A, Baldwin B. Entity-based Cross-document Corefe- rencing Using the Vector Space Model[C]//Proc. of the 17th International Conference on Computational Linguistics. [S. l.]: IEEE Press, 1998: 75-85.
[3] Chen Ying, Jin Peng, Li Wenjie, et al. The Chinese Persons Name Disambiguation Evaluation: Exploration of Personal Name Disambiguation in Chinese News[C]//Proc. of CIPS- SIGHAN Joint Conference on Chinese Language Processing. Beijing, China: Chinese Information Processing Society of China, 2010: 346-352.
[4] Mann G, Yarowsky D. Unsupervised Personal Name Disambigu- ation[C]//Proc. of CoNLL’03. Edmonton, Canada: Association for Computational Linguistics, 2003: 33-40.
[5] Fleischman M, Hovy E. Multi-document Person Name Resolution[C]//Proc. of the 42nd Annual Meeting of the Association for Computational Linguistics. Madrid, Spain: [s. n.], 2004: 1-8.
[6] Chen Ying, Martin J. Towards Robust Unsupervised Personal Name Disambiguation[C]//Proc. of 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. Pargue, Czech: [s. n.], 2007: 190-198.
[7] Ono S, Sato I, Yoshida M, et al. Person Name Disambiguation in Web Pages Using Social Network, Compound Words and Latent Topics[C]//Proc. of the 12th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining. Heidelberg, Germany: Springer-Verlag, 2008: 260-271.
[8] Malin B. Unsupervised Name Disambiguation via Social Network Similarity[C]//Proc. of 2005 SIAM International Conference on Data Mining. Newport Beach, USA: [s. n.], 2005: 93-102.
[9] Romano L, Buza K, Giuliano C. XMedia: Web People Search by Clustering with Machinely Learned Similarity Measures[C]// Proc. of Web People Search Evaluation Workshop at World Wide Web Conference. Madrid, Spain: [s. n.], 2009.
[10] 王厚峰. 指代消解的基本方法和实现技术[J]. 中文信息学报, 2002, 16(6): 45-48.
[11] Elmacioglu E, Fan Y, Su T, et al. PSNUS: Web People Name Disambiguation by Simple Clustering with Rich Features[C]// Proc. of the 4th International Workshop on Semantic Evaluations. Pargue, Czech: [s. n.], 2007: 268-271.

[1]	GUO Jipeng, XU Shilong, LONG Jiahao, WANG Youqing, SUN Yanfeng, YIN Baocai. Multi-view Subspace Clustering Based on Dual Cross-view Correlation Detection [J]. Computer Engineering, 2025, 51(4): 27-36.
[2]	LI Qiwen, WANG Zhihe, DU Hui, LU Depeng. Adaptive Density Peak Clustering Algorithm Based on Gaussian Distribution [J]. Computer Engineering, 2025, 51(4): 137-148.
[3]	YANG Junhui, LI Sujin. Chinese Named Entity Recognition Integrating Positional and Entity Category Information [J]. Computer Engineering, 2025, 51(3): 113-121.
[4]	NIE Lei, HU Zisheng, BAO Haizhou. Heterogeneous Vehicular Network Selection Method Based on RSU-assisted and Adaptive Clustering [J]. Computer Engineering, 2025, 51(3): 162-171.
[5]	DANG Xiaochao, LIU Jian, DONG Xiaohui, ZHU Zhongyan, LI Fenfang. Named Entity Recognition of Mechanical Equipment Failure for Imbalanced Data [J]. Computer Engineering, 2024, 50(9): 104-112.
[6]	Huaqing ZHANG, Zhangtao XIA, Xiaoqing LU, Jijun TONG. Named Entity Recognition of Vascular Surgery Based on Glyph Features [J]. Computer Engineering, 2024, 50(8): 13-21.
[7]	Huayu LI, Zhikang ZHANG, Yang YAN, Yang YUE. Enhanced Domain Multi-modal Entity Recognition Based on Knowledge Graph [J]. Computer Engineering, 2024, 50(8): 31-39.
[8]	Hongjiao LI, Baojin WANG, Zhaohui WANG, Renhao HU. Dual-Client Selection Algorithm Based on Model Similarity and Local Loss [J]. Computer Engineering, 2024, 50(8): 153-164.
[9]	HU Aoran, CHEN Xiaohong. One-step Multi-view Clustering Based on Diversity and Consistency [J]. Computer Engineering, 2024, 50(5): 51-61.
[10]	Hao WEI, Hongyue DIAO, Liangchen KONG, Yaochen DENG. Research on Fine-grained Named-Entity-Recognition Method for Public-Opinion Texts in Northeast Asia [J]. Computer Engineering, 2024, 50(5): 354-362.
[11]	Yue MA, Mi WEN. Spatial Load Forecasting Method Based on Multiscale LDTW and TCN [J]. Computer Engineering, 2024, 50(3): 106-113.
[12]	Huawei SONG, Shengqi LI, Fangjie WAN, Yuping WEI. Federated Learning Optimization Method in Non-IID Scenarios [J]. Computer Engineering, 2024, 50(3): 166-172.
[13]	Lijuan WANG, Jinping XING, Ming YIN, Zhifeng HAO, Ruichu CAI, Wen WEN. Weight Adaptive Multi-view Spectral Clustering Algorithm Based on Consistent Graphs [J]. Computer Engineering, 2024, 50(2): 122-131.
[14]	Wei LIU, Lei MA, Kai LI, Rong LI. Chinese Medical Named Entity Recognition Based on Multi-Granularity Glyph Enhancement [J]. Computer Engineering, 2024, 50(2): 337-344.
[15]	GAO Ruitao, LIN Dawei, GUO Liang, JIN Hong, WANG Hong. Design and Implementation of Rice Planting Intelligent Question-Answering System Based on Knowledge Graph [J]. Computer Engineering, 2024, 50(12): 133-141.

Please choose a citation manager

Content to export

Name Disambiguation Based on Dependency Feature in Web Page Text

基于网页文本依存特征的人名消歧

PDF

Knowledge

Cited

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics

Comments

模态框（Modal）标题

Please choose a citation manager

Content to export

Name Disambiguation Based on Dependency Feature in Web Page Text

基于网页文本依存特征的人名消歧

PDF

Knowledge

Cited

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics

Comments