BrainTumorLLM：面向脑肿瘤诊疗的中文领域大模型优化与评估

doi:10.19678/j.issn.1000-3428.00252472

摘要/Abstract

摘要： 针对通用医学大模型在脑肿瘤领域存在的专业数据匮乏、临床适应性不足及生成内容准确性有限等问题，提出了一种专用于脑肿瘤诊疗领域的大型语言模型BrainTumorLLM。该模型基于Meta-Llama-3-8B-Instruct模型，通过监督微调（Supervised Fine-tuning,SFT）和人类反馈强化学习（Reinforcement Learning with Human Feedback,RLHF）技术优化，结合自建的高质量脑肿瘤问答数据集BrainTumorQA进行训练。数据集采用宏观-微观协同的构建框架，共包含11,000条问答对，涵盖宏观医学知识（症状、诊断方法、治疗方案）及微观临床病例（1252份真实脑肿瘤MRI报告），并通过脱敏处理与信息约束策略保障数据安全。技术实现中，采用低秩适配（Low-Rank Adaptation,LoRA）技术提升训练效率，设计宏观与微观两级提示模板引导模型生成专业化回答，并引入人类反馈学习，通过专家偏好驱动优化机制以及近端策略优化（Proximal Policy Optimization,PPO）算法强化生成内容的临床一致性。实验结果表明，BrainTumorLLM在脑肿瘤问答任务中显著优于通用及医学领域模型，在自动评估环节，其BLUE-1、BLUE-2上分别达到了0.3383和0.2684，ROUGE-1、ROUGE-2和ROUGE-L得分分别为0.3237、0.1466和0.2611，与较基底模型相比困惑度从20.362大幅降至7.674，充分显示了该模型在脑肿瘤诊疗领域的专业性、精准性及临床应用潜力，为脑肿瘤的诊断、治疗决策以及医学科研等工作提供有力的智能化辅助支持。

Abstract: To address the challenges faced by general-purpose medical large language models (LLMs) in the field of brain tumor care—namely the scarcity of domain-specific data, limited clinical adaptability, and insufficient accuracy of generated content—this paper proposes BrainTumorLLM, a specialized large language model tailored for brain tumor diagnosis and treatment. Built upon the Meta-Llama-3-8B-Instruct foundation model, BrainTumorLLM is optimized through Supervised Fine-tuning (SFT) and Reinforcement Learning with Human Feedback (RLHF), and trained using a self-constructed, high-quality dataset named BrainTumorQA. This dataset comprises 11,000 question-answer pairs, encompassing both macro-level medical knowledge (symptoms, diagnostic methods, treatment strategies) and micro-level clinical cases, including 1,252 de-identified real-world brain tumor MRI reports, with privacy safeguarded via anonymization and information constraint strategies. From a technical perspective, Low-Rank Adaptation (LoRA) is employed to enhance training efficiency. A two-tier prompting framework is designed to guide the model in generating domain-specific responses at both macro and micro levels. Furthermore, human feedback learning is integrated through an expert preference-driven optimization mechanism and the Proximal Policy Optimization (PPO) algorithm, reinforcing the clinical consistency of the generated content. Experimental results demonstrate that BrainTumorLLM significantly outperforms both general-purpose and medical-domain models on brain tumor-related question answering tasks. In automatic evaluations, it achieves BLEU-1 and BLEU-2 scores of 0.3383 and 0.2684, respectively, and ROUGE-1, ROUGE-2, and ROUGE-L scores of 0.3237, 0.1466, and 0.2611. Moreover, the model’s perplexity is substantially reduced from 20.362 (base model) to 7.674, highlighting its domain-specific precision, professional accuracy, and potential for clinical application. BrainTumorLLM offers a robust AI-powered tool to support brain tumor diagnosis, treatment planning, and medical research.

李佳坤, 刘艳青, 杜方, 余振华, 冯宇, 王慧, 霍显浩. BrainTumorLLM：面向脑肿瘤诊疗的中文领域大模型优化与评估[J]. 计算机工程, doi: 10.19678/j.issn.1000-3428.00252472.

JiaKun LI, YanQing LIU, Fang DU, ZhenHua YU, Yu FENG, Hui Wang, XianHao HUO. BrainTumorLLM: Optimizing and Evaluating a Chinese Domain-Specific LLM for Brain Tumor Diagnosis and Treatment[J]. Computer Engineering, doi: 10.19678/j.issn.1000-3428.00252472.

参考文献

[1]PATEL A P, FISHER J L, NICHOLS E, et al. Global, regional, and national burden of brain and other CNS cancer, 1990-2016: a systematic analysis for the Global Burden of Disease Study 2016[J]. Lancet Neurology, 2019, 18(4): 376-393.
[2]郭华源, 刘盼, 卢若谷, 等. 人工智能大模型医学应用研究[J]. 中国科学: 生命科学, 2024, 54(3): 482-506. GUO Huayuan, LIU Pan, LU Ruogu, et al. Research on the Medical Applications of Large-scale Artificial Intelligence Models [J]. Science China Life Sciences, 2024, 54(3): 482–506.
[3]TIAN Yuanhe, GAN Ruyi, SONG Yan, et al. ChiMed-GPT: a Chinese medical large language model with full training regime and better alignment to human preferences[C]//Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics(Volume 1: Long Papers). Bangkok, Thailand: Association for Computational Linguistics, 2024: 7156-7173.
[4]CHEN Y, WANG Z, XING X, et al. Bianque: balancing the questioning and suggestion ability of health LLMs with multi-turn health conversations polished by ChatGPT[EB/OL]. [2025-05-21]. https://arxiv.org/abs/2310.15896.
[5]万艳丽, 王颖帅, 赵姗姗. 医学大模型研究进展[J]. 医学研究杂志, 2024, 53(10): 1-6+186. WAN Yanli, WANG Yingshuai, ZHAO Shanshan. Research Progress on Medical Large-scale Models [J]. Journal of Medical Research, 2024, 53(10): 1–6+186.
[6]CHRISTIANO P F, LEIKE J, BROWN T B, et al. Deep reinforcement learning from human preferences[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach: Curran Associates, 2017: 4302-4310.
[7]OUYANG L, WU J, JIANG X, et al. Training language models to follow instructions with human feedback[J]. Advances in neural information processing systems, 2022, 35: 27730-27744.
[8]罗焕坤, 葛一烽, 刘帅, 等. 大语言模型在数学推理中的研究进展[J/OL]. 计算机工程: 1-23[2025-05-21]. https://doi.org/10.19678/j.issn.1000-3428.0069590. LUO Huankun, GE Yifeng , LIU Shuai. Research Progress of Large Language Models in Mathematical Reasoning[J/OL]. Computer Engineering, 1-23[2025-05-21]. https://doi.org/10.19678/j.issn.1000-3428.0069590.
[9]李敬灿, 肖萃林, 覃晓婷, 谢夏. 基于大语言模型与语义增强的文本关系抽取算法[J]. 计算机工程, 2024, 50(4): 87-94. LI Jingcan, XIAO Cuilin, QIN Xiaoting, XIE Xia. Text-Relation-Extraction Algorithm Based on Large-Language Model and Semantic Enhancement[J]. Computer Engineering, 2024, 50(4): 87-94.
[10]BROWN T, MANN B, RYDER N, et al. Language models are few-shot learners[J]. Advances in neural information processing systems, 2020, 33: 1877-1901.
[11]ACHIAM J, ADLER S, AGARWAL S, et al. GPT-4 technical report[EB/OL]. [2025-05-21]. https://arxiv.org/abs/2303.08774.
[12]Grattafiori A, Dubey A, Jauhri A, et al. The llama 3 herd of models[EB/OL]. [2025-05-21]. https://arxiv.org/abs/2407.21783.
[13]CUI Y, YANG Z, YAO X. Efficient and effective text encoding for Chinese llama and alpaca[EB/OL]. [2025-05-21]. https://arxiv.org/abs/2304.08177.
[14]ZHANG J, GAN R, WANG J, et al. Fengshenbang 1.0: being the foundation of Chinese cognitive intelligence[EB/OL]. [2025-05-21]. https://arxiv.org/abs/2209.02970.
[15]张一帆, 张泽瑞, 董敬, 等. 大模型时代下的医疗人工智能技术进展与挑战[J]. 中国医学装备, 2024, 21(06): 189-194. ZHANG Yifan, ZHANG Zerui, DONG Jing, et al. Advances and Challenges of Medical Artificial Intelligence in the Era of Large-scale Models [J]. China Medical Equipment, 2024, 21(06): 189–194.
[16]HAN T, ADAMS L C, PAPAIOANNOU J M, et al. MedAlpaca - an open-source collection of medical conversational AI models and training data[EB/OL]. [2025-05-21]. https://arxiv.org/abs/2304.08247.
[17]SINGHAL K, TU T, GOTTWIES J, et al. Toward expert-level medical question answering with large language models[J]. Nature Medicine, 2025, 31: 943-950. https://doi.org/10.1038/s41591-024-03423-7. [18]ZHANG Zhiyi, QINGYING Xiao, WAN Xiang, et al. HuatuoGPT, towards taming language model
to be a doctor[C]//Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing(EMNLP). Singapore: Association for Computational Linguistics, 2023: 10859-10885.
[19]YUE Shengbin, LIU Shujun, ZHOU Yuxuan, et al. LawLLM: intelligent legal system with legal reasoning and verifiable retrieval[C]//Proceedings of the International Conference on Database Systems for Advanced Applications. Cham: Springer, 2024: 304-321.
[20]CHEN Wei, WANG Qiushi, LONG Zefei, et al. DISC-FinLLM: a Chinese financial large language model based on multiple experts fine-tuning[EB/OL]. [2025-05-21]. https://arxiv.org/abs/2310.15205.
[21]DAN Yuhao, LEI Zhikai, GU Yiyang, et al. EduChat: a large-scale language model-based chatbot system for intelligent education[C]//Proceedings of the China Conference on Knowledge Graph and Semantic Computing (CCKS 2024). [S.l.]: [s.n.], 2024.
[22]LESTER B, AL-RFOU R, CONSTANT N. The power of scale for parameter-efficient prompt tuning[C]//Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Punta Cana, Dominican Republic: Association for Computational Linguistics, 2021: 3045-3059.
[23]HU E J, SHEN Y, WALLIS P, et al. LoRA: low-rank adaptation of large language models[C]//Proceedings of the 10th International Conference on Learning Representations (ICLR 2022). [S.l.]: [s.n.], 2022: 3.
[24]SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms[EB/OL]. [2025-05-21]. https://arxiv.org/abs/1707.06347.
[25]刘波, 束朋辉, 刘军旗. 磁共振弥散加权成像联合磁共振波谱成像技术诊断脑肿瘤的价值[J]. 中国医学工程, 2024, 32(8): 112-115. LIU Bo, SHU Penghui, LIU Junqi. Diagnostic Value of Magnetic Resonance Diffusion-Weighted Imaging Combined with Magnetic Resonance Spectroscopy in Brain Tumors [J]. China Medical Engineering, 2024, 32(8): 112–115.
[26]YANG An, YANG Baosong,HUI Binyuan , et al. Qwen2 technical report[EB/OL]. [2025-05-21]. https://arxiv.org/abs/2407.10671.
[27]YANG A, XIAO B, WANG B, et al. Baichuan 2: open large-scale language models[EB/OL]. [2025-05-21]. https://arxiv.org/abs/2309.10305.
[28]TEAM GLM, ZENG A, XU B, et al. ChatGLM: a family of large language models from GLM-130B to GLM-4 all tools[EB/OL]. [2025-05-21]. https://arxiv.org/abs/2406.12793.

选择文件类型/文献管理软件名称

选择包含的内容