作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2026, Vol. 52 ›› Issue (5): 360-370. doi: 10.19678/j.issn.1000-3428.0070408

• 大模型与生成式人工智能 • 上一篇    下一篇

基于大小模型融合的医疗数据分类方法

李江涛, 马礼*(), 李阳   

  1. 北方工业大学信息学院, 北京 100144
  • 收稿日期:2024-09-25 修回日期:2025-01-06 出版日期:2026-05-15 发布日期:2025-03-05
  • 通讯作者: 马礼
  • 作者简介:

    李江涛, 男, 硕士研究生, 主研方向为边缘计算、机器学习

    马礼(通信作者), 教授

    李阳, 讲师

  • 基金资助:
    北京市自然科学基金(4234083); 国家重点研发计划(2024YFE0200500); 国家重点研发计划(2023YFC3107804); 北京市教育委员会科学研究计划项目(KM202410009003)

Classification Method for Medical Data Based on the Fusion of Large and Small Models

LI Jiangtao, MA Li*(), LI Yang   

  1. School of Information Science and Technology, North China University of Technology, Beijing 100144, China
  • Received:2024-09-25 Revised:2025-01-06 Online:2026-05-15 Published:2025-03-05
  • Contact: MA Li

摘要:

医疗数据因涉及面广、数量庞大、种类繁多等特点而导致隐私保护难度增大。为了有效地对医疗数据进行合理分类, 进而依据分类结果采取相应的隐私保护措施, 根据医疗信息敏感程度的不同, 提出一种基于大小模型融合的分类方法, 达到医疗数据分类加密的目的。采用大语言模型(LLM)深度神经网络, 结合医疗数据分类标准(MDCS)对医疗数据集进行特征标注, 然后将LLM的输出特征作为小型文本分类模型的输入, 利用小型文本分类模型长短时记忆(LSTM)网络学习文本中的特征表示, 最后将小型文本分类模型的错误预测结果返回给LLM重新分类, 融合大小模型的分类结果, 从而实现将医疗数据按不同的敏感程度进行精准分类。实验结果表明, 大小模型融合分类方法相比于采用其他不同的分类模型和分类标准, 在模型收敛性、分类准确率、数据分类均衡度等方面都有着显著提升, 验证了大小模型融合迭代机制与医疗数据场景极具契合性, 极大地提升对医疗数据的分类准确率, 实现对医疗数据更高效分类, 从而确保对医疗数据的隐私保护。

关键词: 医疗数据分类, 隐私保护, 分类标准, 大小模型融合, 大语言模型, 机器学习

Abstract:

In response to the difficulty of privacy protection in medical data owing to its wide coverage, large quantity, and diverse types and to effectively classify medical data reasonably and take corresponding privacy protection measures based on the classification results, this article proposes a fusion classification method for large and small models based on different levels of medical information sensitivity, achieving the goal of medical data classification encryption. A Large Language Model (LLM) deep neural network combined with Medical Data Classification Standards (MDCS) is used to annotate and output features from the medical dataset. Then, the output features of the LLM are used as inputs for the small-text classification model. The Long Short-Term Memory (LSTM) network of the small-text classification model is used to learn feature representations in the text. Finally, the erroneous prediction results of the small-text classification model are returned to the LLM for reclassification, and the classification results of the large and small models are fused to achieve an accurate classification of medical data according to different levels of sensitivity. The experimental results show that the fusion classification method for large and small models improves model convergence, classification accuracy, and data classification balance than those of other classification models and standards. This verifies that the iterative mechanism of large and small models fusion is highly compatible with the medical data scenario and can significantly improve the classification accuracy, achieve more efficient classification, and ensure the privacy protection of medical data.

Key words: medical data classification, privacy protection, classification standard, fusion of large and small models, Large Language Model(LLM), machine learning