作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• 人工智能及识别技术 • 上一篇    下一篇

一种基于聚集系数的人名识别方法

曾剑平1,刘华2   

  1. (1.复旦大学 计算机科学技术学院,上海 200433; 2.安徽省高级人民法院 信息办,合肥 230031)
  • 收稿日期:2015-06-18 出版日期:2016-07-15 发布日期:2016-07-15
  • 作者简介:曾剑平(1973-),男,副教授、博士,主研方向为人工智能、信息安全;刘华,硕士。

A Name Recognition Method Based on Clustering Coefficient

ZENG Jianping 1,LIU Hua 2   

  1. (1.School of Computer Science,Fudan University,Shanghai 200433,China; 2.Information Office,Anhui Provincial Higher People’s Court,Hefei 230031,China)
  • Received:2015-06-18 Online:2016-07-15 Published:2016-07-15

摘要: 在中文文本分析的许多应用领域中,人名识别是一个广泛存在且受到持续关注的基本问题。虽然目前人名识别方法较多,但大多以语料统计和语言规则为基础。为此,针对事件文本中人名具有紧密联系的特征,基于聚集系数,提出一种新的人名识别方法。利用姓氏列表获得原始文本中的可能姓氏信息,基于人名语料信息所构建的统计模型和上下文新的信息提取出候选人名,定义人际语义相似性、人名可能度等量化方法,在此基础上设计一种基于人际网络聚集系数的人名过滤方法。实验结果表明,与现有基于隐马尔可夫模型的方法相比,该方法所获得的F1值有1.2%的提升,并且不需要人工标注语料及使用语言规则。

关键词: 人名识别, 聚集系数, 人际网络, 统计模型, 人际语义相似性

Abstract: In many application fields of Chinese text analysis, name recognition is a widespread fundamental issue. Although there are several kinds of name recognition methods, they are mainly based on corpus statistics and language rules. A new name recognition method is proposed based on clustering coefficient according to the characteristics of the close relationship of the names in the event text. The method captures Chinese family names from the original text based on a standard list, and then candidates are selected according to the contextual information and statistical model constructed by the corpus information of personal names. Interpersonal semantic similarity and name probability are defined, and a personal name filtering method based on interpersonal network clustering coefficient is designed. Experimental results show that the F1 value is improved by 1.2% compared with the existing method based on hidden Markov model, and there is no need to manually label the corpus or use language rules.

Key words: name recognition, clustering coefficient, interpersonal network, statistical model, interpersonal semantic similarity

中图分类号: