作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2024, Vol. 50 ›› Issue (3): 89-97. doi: 10.19678/j.issn.1000-3428.0066880

• 人工智能与模式识别 • 上一篇    下一篇

融合词性语义扩展信息的事件检测模型

严海宁1,2, 余正涛1,2,*(), 黄于欣1,2, 宋燃1,2, 杨溪1,2   

  1. 1. 昆明理工大学信息工程与自动化学院, 云南 昆明 650504
    2. 昆明理工大学云南省人工智能重点实验室, 云南 昆明 650504
  • 收稿日期:2023-02-06 出版日期:2024-03-15 发布日期:2023-04-28
  • 通讯作者: 余正涛
  • 基金资助:
    国家自然科学基金(U21B2027); 国家自然科学基金(61972186); 国家自然科学基金(62266028); 云南省重大科技专项计划(202202AD080003)

Event Detection Model Integrating Part of Speech Semantic Extension Information

Haining YAN1,2, Zhengtao YU1,2,*(), Yuxin HUANG1,2, Ran SONG1,2, Xi YANG1,2   

  1. 1. Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650504, Yunnan, China
    2. Yunnan Key Laboratory of Artificial Intelligence, Kunming University of Science and Technology, Kunming 650504, Yunnan, China
  • Received:2023-02-06 Online:2024-03-15 Published:2023-04-28
  • Contact: Zhengtao YU

摘要:

事件检测是事件抽取中的关键步骤,依赖于触发词进行事件类型分类。现有主流事件检测方法在稀疏标记数据上性能较差,模型过度拟合密集标注的触发词,在稀疏标记的触发词或者未见过的触发词上容易失效。改进方法通常通过扩充更多训练实例来缓解这一问题,但扩充后的数据分布不平衡,存在内置偏差,仍然表现不佳。为此,建立一种融合词性语义扩展信息的事件检测模型。对词粒度扩展信息进行分析,在不增加训练实例的条件下缩小候选触发词的范围,并对候选触发词进行语义扩展,挖掘候选触发词的上下文中蕴含的丰富语义,缓解了标记数据稀疏造成模型训练不充分的情况。通过词性筛选模块寻找候选触发词并对其进行语义扩展挖掘词粒度语义信息,融合句子粒度语义信息提升语义表征的鲁棒性,最终利用Softmax分类器进行分类完成事件检测任务。实验结果表明,该模型在ACE2005和KBP2015数据集上的事件检测任务中的F1值分别达到79.5%和67.5%,有效提升了事件检测性能,并且在稀疏标记数据实验中的F1值达到78.5%,明显改善了标记数据稀疏带来的不良影响。

关键词: 事件检测, 稀疏标记, 词性筛选, 语义扩展, 语义融合, 动态多池化

Abstract:

Event detection is one of the key steps in event extraction, which depends on the identified triggers for event type classification. Current mainstream event detection methods exhibit poor performance on sparsely labeled data, which overfit the model with densely labeled triggers and fail on the sparsely labeled or unseen triggers. Most previous methods mitigate this problem by adding more training examples; however, the expanded data are distributed unevenly, have built-in biases, and still perform poorly. To this end, this study explores word granularity expansion information to mitigate the impact of the problem of sparsely labeled data by reducing the range of candidate triggers, and mining the rich semantic information in the contexts without increasing the number of training instances. First, a part of speech selection module is applied to find candidate triggers and extend their semantics, which digs out word granularity semantic information. Thereafter, sentence granularity semantic information is incorporated to improve the robustness of semantic information. Finally, event types classification is performed by Softmax function, which completes the event detection task. Experimental results on ACE2005 and KBP2015 datasets demonstrate that the model achieves F1 scores of 79.5% and 67.5% in the event detection task, respectively, effectively improving the performance of event detection. The F1 score reaches 78.5% in the sparsely labeled data experiments, thereby alleviating the sparsely labeled data problem significantly.

Key words: event detection, sparse label, part of speech filtering, semantic extension, semantic integration, dynamic multi-pooling