作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• •    

BrainTumorLLM:面向脑肿瘤诊疗的中文领域大模型优化与评估

  • 出版日期:2025-09-25 发布日期:2025-09-25

BrainTumorLLM: Optimizing and Evaluating a Chinese Domain-Specific LLM for Brain Tumor Diagnosis and Treatment

  • Online:2025-09-25 Published:2025-09-25

摘要: 针对通用医学大模型在脑肿瘤领域存在的专业数据匮乏、临床适应性不足及生成内容准确性有限等问题,提出了一种专用于脑肿瘤诊疗领域的大型语言模型BrainTumorLLM。该模型基于Meta-Llama-3-8B-Instruct模型,通过监督微调(Supervised Fine-tuning,SFT)和人类反馈强化学习(Reinforcement Learning with Human Feedback,RLHF)技术优化,结合自建的高质量脑肿瘤问答数据集BrainTumorQA进行训练。数据集采用宏观-微观协同的构建框架,共包含11,000条问答对,涵盖宏观医学知识(症状、诊断方法、治疗方案)及微观临床病例(1252份真实脑肿瘤MRI报告),并通过脱敏处理与信息约束策略保障数据安全。技术实现中,采用低秩适配(Low-Rank Adaptation,LoRA)技术提升训练效率,设计宏观与微观两级提示模板引导模型生成专业化回答,并引入人类反馈学习,通过专家偏好驱动优化机制以及近端策略优化(Proximal Policy Optimization,PPO)算法强化生成内容的临床一致性。实验结果表明,BrainTumorLLM在脑肿瘤问答任务中显著优于通用及医学领域模型,在自动评估环节,其BLUE-1、BLUE-2上分别达到了0.3383和0.2684,ROUGE-1、ROUGE-2和ROUGE-L得分分别为0.3237、0.1466和0.2611,与较基底模型相比困惑度从20.362大幅降至7.674,充分显示了该模型在脑肿瘤诊疗领域的专业性、精准性及临床应用潜力,为脑肿瘤的诊断、治疗决策以及医学科研等工作提供有力的智能化辅助支持。

Abstract: To address the challenges faced by general-purpose medical large language models (LLMs) in the field of brain tumor care—namely the scarcity of domain-specific data, limited clinical adaptability, and insufficient accuracy of generated content—this paper proposes BrainTumorLLM, a specialized large language model tailored for brain tumor diagnosis and treatment. Built upon the Meta-Llama-3-8B-Instruct foundation model, BrainTumorLLM is optimized through Supervised Fine-tuning (SFT) and Reinforcement Learning with Human Feedback (RLHF), and trained using a self-constructed, high-quality dataset named BrainTumorQA. This dataset comprises 11,000 question-answer pairs, encompassing both macro-level medical knowledge (symptoms, diagnostic methods, treatment strategies) and micro-level clinical cases, including 1,252 de-identified real-world brain tumor MRI reports, with privacy safeguarded via anonymization and information constraint strategies. From a technical perspective, Low-Rank Adaptation (LoRA) is employed to enhance training efficiency. A two-tier prompting framework is designed to guide the model in generating domain-specific responses at both macro and micro levels. Furthermore, human feedback learning is integrated through an expert preference-driven optimization mechanism and the Proximal Policy Optimization (PPO) algorithm, reinforcing the clinical consistency of the generated content. Experimental results demonstrate that BrainTumorLLM significantly outperforms both general-purpose and medical-domain models on brain tumor-related question answering tasks. In automatic evaluations, it achieves BLEU-1 and BLEU-2 scores of 0.3383 and 0.2684, respectively, and ROUGE-1, ROUGE-2, and ROUGE-L scores of 0.3237, 0.1466, and 0.2611. Moreover, the model’s perplexity is substantially reduced from 20.362 (base model) to 7.674, highlighting its domain-specific precision, professional accuracy, and potential for clinical application. BrainTumorLLM offers a robust AI-powered tool to support brain tumor diagnosis, treatment planning, and medical research.