作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2024, Vol. 50 ›› Issue (1): 306-312. doi: 10.19678/j.issn.1000-3428.0066055

• 开发研究与工程应用 • 上一篇    下一篇

面向借贷案件的相似案例匹配模型

曹发鑫, 孙媛媛*(), 王治政, 潘丁豪, 林鸿飞   

  1. 大连理工大学计算机科学与技术学院, 辽宁 大连 116024
  • 收稿日期:2022-10-20 出版日期:2024-01-15 发布日期:2023-04-04
  • 通讯作者: 孙媛媛
  • 基金资助:
    国家重点研发计划(2022YFC3301801); 中央高校基本科研业务费专项资金(DUT22ZD205)

Similar Case Matching Model for Lending Cases

Faxin CAO, Yuanyuan SUN*(), Zhizheng WANG, Dinghao PAN, Hongfei LIN   

  1. School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, Liaoning, China
  • Received:2022-10-20 Online:2024-01-15 Published:2023-04-04
  • Contact: Yuanyuan SUN

摘要:

相似案例匹配任务是文本匹配在司法领域的具体应用之一,目的在于区分法律文书是否相似,对类案检索具有重要意义。与传统文本匹配任务相比,法律文本通常篇幅较长,同时相似案例匹配是针对相同案由案件的匹配,案情文本之间的差异较小,以往的文本匹配方法很难计算文本相似度。针对借贷案件文本匹配存在的问题,建立一种融合借贷案件关键要素的相似案例匹配模型。为了获取文本中更丰富的语义特征,构建正则表达式获得借贷案件的特定案件要素,如借款交付形式、借款人基本属性等,并与原有的案情文本相结合,联合学习法律文本与案件关键要素的语义特征。同时,利用共享权重的预训练模型分别对不同的文书进行编码,并且对预训练模型特定编码层的输出进行融合,得到更加丰富的语义信息。引入有监督对比学习框架,更好地利用样本信息,进一步提高相似案例匹配的性能。在CAIL2019-SCM数据集上的实验结果表明,与LFESM模型相比,该模型在测试集上的准确率提高了1.05个百分点。

关键词: 相似案例匹配, 孪生网络, 对比学习, 预训练模型, 法律关键要素

Abstract:

The purpose of Similar Case Matching(SCM) is to distinguish whether legal documents are similar, which is a specific application of text matching and is vital to the retrieval of similar cases. Compared with conventional texts, legal texts are typically longer, and SCM aims to realize matching for the same case. Moreover, the difference between case texts is negligible; therefore, calculating text similarity using previous text-matching methods is challenging. This study establishes a SCM model that integrates key elements of lending cases to address the issues of text matching in lending cases. To obtain richer semantic features from texts, regular expressions are constructed to obtain specific case elements of lending cases, such as the loan-delivery form and the basic attributes of borrowers, which are then combined with the original case text to jointly learn the semantic features of the legal text and key elements of the case. Additionally, pretrained models with shared weights are used to encode different instruments separately, and the outputs of specific encoding layers of the pretrained models are fused to obtain richer semantic information. Finally, the proposed model incorporates a supervised comparison learning framework to utilize the text information more effectively and further improve the performance of SCM. Experiments on the CAIL2019-SCM dataset show that this model improves the accuracy of the test set by 1.05 percentage points compared with LFESM models.

Key words: Similar Case Matching(SCM), Siamese network, contrastive learning, pretrained model, key legal element