计算机工程

• 人工智能及识别技术 • 上一篇    下一篇

基于隐结构感知的并列名词短语识别研究

王浩,姬东鸿,黄江平   

  1. (武汉大学 计算机学院,武汉 430072)
  • 收稿日期:2016-04-05 出版日期:2017-04-15 发布日期:2017-04-14
  • 作者简介:王浩(1991—),男,硕士研究生,主研方向为自然语言处理、机器学习;姬东鸿,教授、博士、博士生导师;黄江平,博士研究生。
  • 基金项目:
    国家自然科学基金重点项目“篇章级中文语义分析理论与方法”(61133012);国家自然科学基金面上项目“汉语语篇连贯的事件链模型研究”(61373108)。

Research on Coordinate Noun Phrase Identification Based on Latent Structured Perceptron

WANG Hao,JI Donghong,HUANG Jiangping   

  1. (Computer School,Wuhan University,Wuhan 430072,China)
  • Received:2016-04-05 Online:2017-04-15 Published:2017-04-14

摘要: 针对现有并列名词短语识别不能处理短语序列隐含信息的情况,提出一种新的并列名词短语识别方法。采用隐结构感知模型与条件随机场模型,识别并列名词短语序列以及序列中用于连接并列名词短语的连词和标点。针对并列名词短语序列进行任务描述,建立语料库并选择典型的并列名词短语识别特征进行实验。结果表明,隐结构感知模型由于加入序列中的隐含信息,相比传统条件随机场模型在并列名词短语识别中更有优势,F度量值达到86.36%,进而证明该模型能够用于以信息抽取为导向的并列名词短语识别。

关键词: 并列名词短语, 隐结构感知, 条件随机场, 序列识别, 边界识别

Abstract: Aiming at the situation that the existing coordinate noun phrase identification cannot deal with the implicit information of the phrase sequence,this paper proposes a new coordinate noun phrase indentificaiton method.Conditional Random Fields(CRF) model and Latent Structured Perceptron(LSP) model are used to identify the sequence of coordinate noun phrases as well as conjunctions and punctuation used to connect coordinate noun phrases in a sequence.In this paper,the task description is firstly carried out for the sequence of coordinate noun phrases.Then the corpus is constructed,and the typical recognition features of the coordinate noun phrase are selected for the experiment.Experimental results show that,compared with traditional CRF model,LSP model with latent information gives superior performance,gains the F score up to 86.36%,and proves that the model can be used for information extraction oriented coordinate noun phrase identification.

Key words: coordinate noun phrase, Latent Structured Perceptron(LSP), Conditional Random Fields(CRF), sequence identification, boundary identification

中图分类号: