作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2024, Vol. 50 ›› Issue (9): 63-71. doi: 10.19678/j.issn.1000-3428.0068523

• 人工智能与模式识别 • 上一篇    下一篇

面向行政执法案件文本的事件抽取研究

屈潇雅1,*(), 李兵1, 温立强2   

  1. 1. 对外经济贸易大学信息学院, 北京 100029
    2. 北京大学软件与微电子学院, 北京 100871
  • 收稿日期:2023-10-08 出版日期:2024-09-15 发布日期:2024-01-25
  • 通讯作者: 屈潇雅
  • 基金资助:
    科技部国家重点研发计划(2020YFC0833304)

Research on Event Extraction for Administrative Law Enforcement Case Texts

QU Xiaoya1,*(), LI Bing1, WEN Liqiang2   

  1. 1. School of Information Technology and Management, University of International Business and Economics, Beijing 100029, China
    2. School of Software and Microelectronics, Peking University, Beijing 100871, China
  • Received:2023-10-08 Online:2024-09-15 Published:2024-01-25
  • Contact: QU Xiaoya

摘要:

行政执法的智能化水平是国家治理能力现代化的体现, 数据是智能化发展的重要依托。在行政执法领域, 各行政机关存储大量以文本形式记录的历史案件, 这种非结构化的数据价值密度较低、可利用性不强。利用事件抽取技术从行政执法案件文本中快速高效地抽取案件职权类型、案发时间、案发地点等结构化信息, 可推动行政机关对历史案件信息的利用和智能化执法办案研究。收集整理某城市的真实案例数据, 并通过人工标注构建一个行政执法领域的数据集, 根据行政执法案件文本的无触发词、文档级、格式不固定等文本特征, 提出结合基于Transformer的双向编码器表示(BERT)和基于条件随机场的双向长短期记忆网络(BiLSTM-CRF)模型的两阶段事件抽取方法, 通过文本多分类和序列标注依次完成事件类型检测和事件论元抽取任务。实验结果表明, 事件类型检测任务的F1值达到99.54%, 事件论元抽取任务的F1值达到97.36%, 实现了对案件信息的有效抽取。

关键词: 行政执法案件, 事件抽取, 两阶段方法, 基于Transformer的双向编码器表示模型, 基于条件随机场的双向长短期记忆网络(BiLSTM-CRF)模型

Abstract:

The level of intelligence in administrative law enforcement is a manifestation of the modernization of national governance capacity, and data is an important support for the development of intelligence. In the field of administrative law enforcement, various administrative organs store numerous historical cases recorded in textual form. These cases are unstructured data with low value density and limited usability. The use of event extraction technology for the quick and efficient extraction of structured information, such as the type of case authority and the time and place of case occurrence, from administrative law enforcement case texts can promote the utilization of historical case records and provide support for the study of intelligent law enforcement. This study collects and organizes real case data for a city and constructs a dataset in the field of administrative law enforcement through manual annotation. Considering text characteristics, such as no trigger words, document-level text, and unfixed format, the study then proposes a two-stage event extraction method based on a Bidirectional Encoder Representations from Transformers (BERT) model and a Bi-directional Long Short-Term Memory network with Conditional Random Field (BiLSTM-CRF) model, which sequentially detects event types and identifies event arguments through text multi-classification and sequence annotation. Experimental results show that the F1 values of event-type detection and event-argument extraction tasks reach 99.54% and 97.36%, respectively, thus realizing the effective extraction of case information.

Key words: administrative law enforcement case, event extraction, two-stage method, Bidirectional Encoder Representations from Transformers (BERT) model, Bi-directional Long Short-Term Memory network with Conditional Random Field (BiLSTM-CRF) model