Author Login Chief Editor Login Reviewer Login Editor Login Remote Office

Computer Engineering ›› 2025, Vol. 51 ›› Issue (4): 208-216. doi: 10.19678/j.issn.1000-3428.0068823

• Cyberspace Security • Previous Articles     Next Articles

A Method for Analyzing News Themes Involving Cases with Integrated Crime Classification

YIN Zhaoliang1,2,3, HUANG Yuxin1,2, YU Zhengtao1,2,*(), WANG Guanwen1,2, AI Chuanxian1,2,3   

  1. 1. Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, Yunnan, China
    2. Key Laboratory of Artificial Intelligence in Yunnan Province, Kunming University of Science and Technology, Kunming 650500, Yunnan, China
    3. Yunnan Branch of the National Computer Network Emergency Response Technical Team/Coordination Center of China, Kunming 650500, Yunnan, China
  • Received:2023-11-13 Online:2025-04-15 Published:2024-05-22
  • Contact: YU Zhengtao

融合罪名分类的涉案新闻主题分析方法

尹兆良1,2,3, 黄于欣1,2, 余正涛1,2,*(), 王冠文1,2, 艾传鲜1,2,3   

  1. 1. 昆明理工大学信息工程与自动化学院, 云南 昆明 650500
    2. 昆明理工大学云南省人工智能重点实验室, 云南 昆明 650500
    3. 国家计算机网络应急技术处理协调中心云南分中心, 云南 昆明 650500
  • 通讯作者: 余正涛
  • 基金资助:
    国家自然科学基金(U21B2027); 国家自然科学基金(61972186); 国家自然科学基金(62266027); 国家自然科学基金(62266028); 云南省重大科技专项(202302AD080003); 云南省重大科技专项(202202AD080003); 202202AD080003(202301AT070393); 202202AD080003(202301AT070471)

Abstract:

This paper discusses the significance of crime news topic analysis and identifies the limitations of existing methods. The paper presents a novel topic analysis model called the Bidirectional Encoder Representations from Transformers-based Embedded Crime Topic Model (BERT-ECTM) to address identified issues. The model leverages crime charges from legal documents as supervision signals and combines them with crime news text as input to enhance the accuracy and crime preferences of crime news topic information. The model adopts a BERT-based embedded topic analysis approach to capture contextual semantic features effectively. This paper also introduces a variation inference method that approximates the posterior distribution for improved distribution results, to address the challenge of complex marginal distribution estimation during model training. The proposed model is significantly more effective and accurate than traditional methods in analyzing specific crime news topics.

Key words: text topic extraction, crime classification, BERT-ECTM model, involved preferences, text semantics, semantic feature encoding, variegation inference

摘要:

介绍涉案新闻主题分析的应用场景以及现有方法的不足之处。针对这些不足, 提出一种融合罪名分类的主题分析模型BERT-ECTM。该模型利用法律文书中的罪名信息作为监督信号, 与涉案新闻文本相融合作为主题分析模型的输入, 以提高涉案新闻主题信息的准确性和涉案偏好。为了捕捉上下文语义特征, 采用基于BERT编码的嵌入式主题分析方法, 以提高主题分析的准确性和效果。此外, 在模型训练时, 针对边际分布求解难度较大、复杂程度高这一问题, 结合变分推断的方法, 用后验分布的近似分布来拟合其分布结果。实验结果表明, 在特定的涉案新闻主题分析任务中, 该模型的有效性和准确性相比于现有方法均有明显提升。

关键词: 文本主题抽取, 罪名分类, BERT-ECTM模型, 涉案偏好, 文本语义, 语义特征编码, 变分推断