作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2021, Vol. 47 ›› Issue (5): 97-103. doi: 10.19678/j.issn.1000-3428.0057666

• 人工智能与模式识别 • 上一篇    下一篇

基于机器阅读理解模型与众包验证的属性值抽取方法

冯桫, 刘井平, 蒋海云, 肖仰华   

  1. 复旦大学 计算机科学技术学院, 上海 200433
  • 收稿日期:2020-03-10 修回日期:2020-04-17 发布日期:2020-04-24
  • 作者简介:冯桫(1994-),男,硕士研究生,主研方向为知识图谱、信息抽取;刘井平、蒋海云,博士研究生;肖仰华,教授、博士。
  • 基金资助:
    上海市科技创新行动计划(19511120400)。

Attribute Value Extraction Method Based on Machine Reading Comprehension Model and Crowdsourcing Verification

FENG Suo, LIU Jingping, JIANG Haiyun, XIAO Yanghua   

  1. School of Computer Science, Fudan University, Shanghai 200433, China
  • Received:2020-03-10 Revised:2020-04-17 Published:2020-04-24

摘要: 由于互联网语料的高噪音特性,传统的属性值抽取方法存在人工成本增加及训练集缺乏等问题。提出一种新的实体属性值抽取方法。利用机器阅读理解模型,从互联网语料中抽取出高质量的候选属性值,通过高效的众包验证机制调整各候选属性值的权重,得到最终抽取结果。实验结果表明,与OpenTag、QANET等模型相比,该机器阅读理解模型有效提升了候选属性值抽取的准确性,抽取准确率提升10%左右,同时通过众包验证方法,能够以较低的众包成本提高属性值抽取的整体性能。

关键词: 属性值抽取, 机器阅读理解模型, 知识图谱, 众包, 序列标注

Abstract: Due to the high noise characteristics of Internet corpus,traditional extraction methods based on attribute values suffer from increased labor costs and lack of training sets.This paper proposes an entity attribute value extraction method based on machine reading comprehension model and crowdsourcing verification.The new machine reading comprehension model is used to extract high-quality candidate attribute values from the Internet corpus,and the weight of each candidate attribute value is adjusted through an efficient crowdsourcing verification mechanism to obtain the final extraction result.Experimental results show that compared with OpenTag,QANET and other models,the machine reading comprehension model effectively improves the accuracy of candidate attribute value extraction,and the extraction accuracy is increased by about 10%.At the same time,it can improve the overall performance of attribute value extraction at a low crowdsourcing cost by using crowdsourcing verification.

Key words: attribute value extraction, machine reading comprehension model, knowledge graph, crowdsourcing, sequence labeling

中图分类号: