摘要: 中文专利中名词性有标记并列结构分布广泛、结构复杂,现有的识别技术仅能运用有限的特征识别某些简单类型的并列结构,总体识别效果不佳。为此,提出一种基于边界感知原则的识别方法。在概念层次网络(HNC)理论的基础上,从数量、层级、语义类型、语义特征、干扰特征、结构特征、外部环境和位置特征8个维度对并列结构进行标注,考察并总结语义特征、结构特征和外部词特征,制定217条形式化规则,并将其融合到已有的HNC翻译系统中。测试结果表明,与Google在线翻译系统相比,该方法对有标记并列结构的识别正确率较高。
关键词:
基于规则,
边界感知,
并列结构,
机器翻译,
专利文献
Abstract: The Coordination with Overt Conjunctions(COCs)in the Chinese patent literature are complex and widely distributed.The existing recognition technology can only use limited features to identify some simple types of parallel structures,and the recognition results are not very good as a whole.A method based on boundary-perceiving principles for recognizing COCs is introduced.Under the guidance of the HNC theory,COCs are annotated in the eight aspects:number,level,semantic type,semantic feature,interference,structural feature,contextual words and boundary position.The semantic characteristics,structural characteristics and contextual information are investigated and summarized;and 217 formal rules are set up and integrated into a HNC translation system.In contrast to Google Translate,the open experiment shows that this new method has better accuracy rate.
Key words:
rule-based,
boundary perception,
parallel structure,
machine translation,
patent documentation
中图分类号:
刘小蝶,朱筠,晋耀红. 中文专利中有标记并列结构的自动识别研究[J]. 计算机工程, 2018, 44(6): 162-168,175.
LIU Xiaodie,ZHU Yun,JIN Yaohong. Research on Automatic Identification of Marked Parallel Structures in Chinese Patent[J]. Computer Engineering, 2018, 44(6): 162-168,175.