计算机工程 ›› 2018, Vol. 44 ›› Issue (6): 162-168,175.doi: 10.19678/j.issn.1000-3428.0047454

• 人工智能及识别技术 • 上一篇    下一篇

中文专利中有标记并列结构的自动识别研究

刘小蝶 1,朱筠 2,晋耀红 2   

  1. 1.北京联合大学 国际交流学院,北京 100101; 2.北京师范大学 中文信息处理研究所,北京 100875
  • 收稿日期:2017-06-05 出版日期:2018-06-15 发布日期:2018-06-15
  • 作者简介:刘小蝶(1984—),女,讲师、博士,主研方向为机器翻译、自然语言处理;朱筠,博士后;晋耀红,教授、博士。
  • 基金项目:
    国家高技术研究发展计划项目“海量文本多层次知识表示及中文文本理解应用系统研制”(2012AA011104);国家语委 “十二五”科研规划项目“语言资源建设规划研究”(YB125-124)。

Research on Automatic Identification of Marked Parallel Structures in Chinese Patent

LIU Xiaodie 1,ZHU Yun 2,JIN Yaohong 2   

  1. 1.College of International Education,Beijing Union University,Beijing 100101,China; 2.Institute of Chinese Information Processing,Beijing Normal University,Beijing 100875,China
  • Received:2017-06-05 Online:2018-06-15 Published:2018-06-15

摘要: 中文专利中名词性有标记并列结构分布广泛、结构复杂,现有的识别技术仅能运用有限的特征识别某些简单类型的并列结构,总体识别效果不佳。为此,提出一种基于边界感知原则的识别方法。在概念层次网络(HNC)理论的基础上,从数量、层级、语义类型、语义特征、干扰特征、结构特征、外部环境和位置特征8个维度对并列结构进行标注,考察并总结语义特征、结构特征和外部词特征,制定217条形式化规则,并将其融合到已有的HNC翻译系统中。测试结果表明,与Google在线翻译系统相比,该方法对有标记并列结构的识别正确率较高。

关键词: 基于规则, 边界感知, 并列结构, 机器翻译, 专利文献

Abstract: The Coordination with Overt Conjunctions(COCs)in the Chinese patent literature are complex and widely distributed.The existing recognition technology can only use limited features to identify some simple types of parallel structures,and the recognition results are not very good as a whole.A method based on boundary-perceiving principles for recognizing COCs is introduced.Under the guidance of the HNC theory,COCs are annotated in the eight aspects:number,level,semantic type,semantic feature,interference,structural feature,contextual words and boundary position.The semantic characteristics,structural characteristics and contextual information are investigated and summarized;and 217 formal rules are set up and integrated into a HNC translation system.In contrast to Google Translate,the open experiment shows that this new method has better accuracy rate.

Key words: rule-based, boundary perception, parallel structure, machine translation, patent documentation

中图分类号: