BrainTumorLLM: 面向脑肿瘤诊疗的大语言模型优化与评估

doi:10.19678/j.issn.1000-3428.0252472

计算机工程 ›› 2026, Vol. 52 ›› Issue (5): 349-359. doi: 10.19678/j.issn.1000-3428.0252472

• 大模型与生成式人工智能 • 上一篇下一篇

BrainTumorLLM: 面向脑肿瘤诊疗的大语言模型优化与评估

李佳坤¹^,²^,³, 刘艳青¹^,²^,³, 杜方¹^,²^,³^,*(), 余振华¹^,²^,³, 冯宇¹^,²^,³, 王慧¹^,²^,³, 霍显浩⁴

1. 宁夏大学信息工程学院, 宁夏银川 750021
2. 宁夏"东数西算"人工智能与信息安全重点实验室, 宁夏银川 750021
3. 宁夏大数据与人工智能省部共建协同创新中心, 宁夏银川 750021
4. 宁夏医科大学总医院神经外科, 宁夏银川 750004

收稿日期:2025-05-22 修回日期:2025-08-19 出版日期:2026-05-15 发布日期:2026-05-12
通讯作者: 杜方
作者简介:
李佳坤(CCF学生会员), 女, 硕士研究生, 主研方向为自然语言处理
刘艳青, 副教授
杜方(CCF高级会员、通信作者), 教授
余振华, 副教授
冯宇, 硕士研究生
王慧, 硕士研究生
霍显浩, 住院医师
基金资助:
宁夏回族自治区重点研发计划(2023BEG02009); 国家自然科学基金(62062058)

BrainTumorLLM: Optimizing and Evaluating of Large Language Model for Brain Tumor Diagnosis and Treatment

LI Jiakun¹^,²^,³, LIU Yanqing¹^,²^,³, DU Fang¹^,²^,³^,*(), YU Zhenhua¹^,²^,³, FENG Yu¹^,²^,³, WANG Hui¹^,²^,³, HUO Xianhao⁴

1. School of Information Engineering, Ningxia University, Yinchuan 750021, Ningxia, China
2. Ningxia Key Laboratory of Artificial Intelligence and Information Security for Channeling Computing Resources from the East to the West, Yinchuan 750021, Ningxia, China
3. Collaborative Innovation Center for Ningxia Big Data and Artificial Intelligence Co-Founded by Ningxia Municipality and Ministry of Education, Yinchuan 750021, Ningxia, China
4. Department of Neurosurgery, General Hospital of Ningxia Medical University, Yinchuan 750004, Ningxia, China

Received:2025-05-22 Revised:2025-08-19 Online:2026-05-15 Published:2026-05-12
Contact: DU Fang

摘要/Abstract

摘要：

通用医学大语言模型(LLM)在脑肿瘤领域存在专业数据匮乏、临床适应性不足及生成内容准确性有限等问题, 提出一种专用于脑肿瘤诊疗领域的大语言模型BrainTumorLLM。该模型基于Meta-LLaMA-3-8B-Instruct模型, 通过监督微调(SFT)和人类反馈强化学习(RLHF)技术优化, 结合自建的高质量脑肿瘤问答数据集BrainTumorQA进行训练。数据集采用宏观-微观协同的构建框架, 共包含11 000条问答对, 涵盖宏观医学知识(症状、诊断方法、治疗方案)及微观临床病例, 并通过脱敏处理与信息约束策略保障数据安全。在技术实现中, 采用低秩适配(LoRA)技术提升训练效率, 设计宏观与微观两级提示模板, 引导模型生成专业化回答, 并引入RLHF, 通过专家偏好驱动优化机制以及近端策略优化(PPO)算法强化生成内容的临床一致性。实验结果表明, BrainTumorLLM在脑肿瘤问答任务中显著优于通用及医学领域模型, 在自动评估环节, 其BLEU-1、BLEU-2分别达到了0.338 3和0.268 4, ROUGE-1、ROUGE-2和ROUGE-L得分分别为0.323 7、0.146 6和0.261 1, 与基准模型相比困惑度从20.362降至7.674, 充分显示了所提模型在脑肿瘤诊疗领域的专业性、精准性及临床应用潜力, 为脑肿瘤的诊断、治疗决策以及医学科研等工作提供有力的智能化辅助支持。

关键词: 大语言模型, 脑肿瘤问答, 监督微调, 人类反馈强化学习, 临床决策支持

Abstract:

To address the challenges faced by general-purpose medical Large Language Model (LLM) in the field of brain tumor care—namely the scarcity of domain-specific data, limited clinical adaptability, and insufficient accuracy of generated content. This paper proposes BrainTumorLLM, a specialized LLM tailored for brain tumor diagnosis and treatment. Built upon the Meta-LLaMA-3-8B-Instruct foundation model, BrainTumorLLM is optimized via Supervised Fine-Tuning (SFT) and Reinforcement Learning with Human Feedback (RLHF) and trained using a self-constructed, high-quality dataset named BrainTumorQA. This dataset comprises 11 000 question-answer pairs, encompassing both macro-level medical knowledge (symptoms, diagnostic methods, and treatment strategies) and micro-level clinical cases, with privacy safeguarded via anonymization and information constraint strategies. From a technical perspective, Low-Rank Adaptation (LoRA) is employed to enhance the training efficiency. A two-tier prompting framework is designed to guide the model in generating domain-specific responses at both the macro and micro levels. Furthermore, RLHF is integrated using an expert preference-driven optimization mechanism and a Proximal Policy Optimization (PPO) algorithm, reinforcing the clinical consistency of the generated content. The experimental results demonstrate that BrainTumorLLM significantly outperforms both general-purpose and medical-domain models in brain tumor-related question-answering tasks. In automatic evaluations, it achieves BLEU-1 and BLEU-2 scores of 0.338 3 and 0.268 4, respectively, and ROUGE-1, ROUGE-2, and ROUGE-L scores of 0.323 7, 0.146 6, and 0.261 1, respectively. Moreover, the perplexity of the model is substantially reduced from 20.362 (base model) to 7.674, highlighting its domain-specific precision, professional accuracy, and potential for clinical applications. BrainTumorLLM is a robust AI-powered tool that supports brain tumor diagnosis, treatment planning, and medical research.

Key words: Large Language Model(LLM), brain tumor question-answering, Supervised Fine-Tuning(SFT), Reinforcement Learning with Human Feedback(RLHF), clinical decision support

李佳坤, 刘艳青, 杜方, 余振华, 冯宇, 王慧, 霍显浩. BrainTumorLLM: 面向脑肿瘤诊疗的大语言模型优化与评估[J]. 计算机工程, 2026, 52(5): 349-359.

LI Jiakun, LIU Yanqing, DU Fang, YU Zhenhua, FENG Yu, WANG Hui, HUO Xianhao. BrainTumorLLM: Optimizing and Evaluating of Large Language Model for Brain Tumor Diagnosis and Treatment[J]. Computer Engineering, 2026, 52(5): 349-359.

https://www.ecice06.com/CN/Y2026/V52/I5/349

图/表 12

图1 BrainTumorLLM的模型架构

Fig.1 The architecture of BrainTumorLLM model

图2 脑肿瘤病例的统计分布

Fig.2 Statistical distribution of brain tumor case

图3 用于构建脑肿瘤问答的提示模板

Fig.3 Prompt templates used for constructing brain tumor question-answering

图4 信息化约束策略

Fig.4 Information constraint strategy

图5 宏观和微观层级问题提示的模板

Fig.5 Prompt templates for macro and micro-level question

图6 近端策略优化的整体流程

Fig.6 Overall procedure of the proximal policy optimization

图7 GPT-4在准确性、专业性和满意度方面的评估结果

Fig.7 The evaluation results of GPT-4 in terms of accuracy, professionalism, and satisfaction

图8 人工评估不同模型的结果

Fig.8 Manual evaluation of results from different model

图9 不同模型针对同一脑肿瘤问题的回答示例

Fig.9 Examples of responses from different models to the same brain tumor question

参考文献 28

1	PATEL A P , FISHER J L , NICHOLS E , et al. Global, regional, and national burden of brain and other CNS cancer, 1990-2016: a systematic analysis for the global burden of disease study 2016. Lancet Neurology, 2019, 18 (4): 376- 393.
2	郭华源, 刘盼, 卢若谷, 等. 人工智能大模型医学应用研究. 中国科学: 生命科学, 2024, 54 (3): 482- 506. doi: 10.1360/SSV-2022-0298
	GUO H Y , LIU P , LU R G , et al. Research on a massively large artificial intelligence model and its application in medicine. Scientia Sinica (Vitae), 2024, 54 (3): 482- 506. doi: 10.1360/SSV-2022-0298
3	TIAN Y H, GAN R Y, SONG Y, et al. ChiMed-GPT: a Chinese medical large language model with full training regime and better alignment to human preferences[C]//Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics. San Diego, USA: Association for Computational Linguistics, 2024: 7156-7173.
4	CHEN Y R, WANG Z Y, ZHENG H M, et al. BianQue: balancing the questioning and suggestion ability of health LLMs with multi-turn health conversations polished by ChatGPT[EB/OL]. [2025-04-15]. https://arxiv.org/pdf/2310.15896.
5	万艳丽, 王颖帅, 赵姗姗. 医学大模型研究进展. 医学研究杂志, 2024, 53 (10): 1-6, 186.
	WAN Y L , WANG Y S , ZHAO S S . Research progress of medical large model. Journal of Medical Research, 2024, 53 (10): 1-6, 186.
6	CHRISTIANO P F, LEIKE J, BROWN T B, et al. Deep reinforcement learning from human preferences[EB/OL]. [2025-04-15]. https://arxiv.org/pdf/1706.03741.
7	OUYANG L, WU J, JIANG X, et al. Training language models to follow instructions with human feedback[EB/OL]. [2025-04-15]. https://arxiv.org/pdf/2203.02155.
8	罗焕坤, 葛一烽, 刘帅. 大语言模型在数学推理中的研究进展. 计算机工程, 2024, 50 (9): 1- 17.
	LUO H K , GE Y F , LIU S . Research progress of large language models in mathematical reasoning. Computer Engineering, 2024, 50 (9): 1- 17.
9	李敬灿, 肖萃林, 覃晓婷, 等. 基于大语言模型与语义增强的文本关系抽取算法. 计算机工程, 2024, 50 (4): 87- 94.
	LI J C , XIAO C L , QIN X T , et al. Text-relation-extraction algorithm based on large-language model and semantic enhancement. Computer Engineering, 2024, 50 (4): 87- 94.
10	BROWN T , MANN B , RYDER N , et al. Language models are few-shot learners. Advances in Neural Information Processing Systems, 2020, 33, 1877- 1901. doi: 10.48550/arXiv.2005.14165
11	OpenAI. GPT-4 technical report[EB/OL]. [2025-04-15]. https://arxiv.org/abs/2303.08774.
12	GRATTAFIORI A, DUBEY A, JAUHRI A, et al. The LLaMA 3 herd of models[EB/OL]. [2025-04-15]. https://arxiv.org/abs/2407.21783.
13	CUI Y M, YANG Z Q, YAO X. Efficient and effective text encoding for Chinese LLaMA and Alpaca[EB/OL]. [2025-04-15]. https://arxiv.org/abs/2304.08177.
14	ZHANG J X, GAN R Y, WANG J J, et al. Fengshenbang 1.0: being the foundation of Chinese cognitive intelligence[EB/OL]. [2025-04-15]. https://arxiv.org/abs/2209.02970.
15	张一帆, 张泽瑞, 董敬, 等. 大模型时代下的医疗人工智能技术进展与挑战. 中国医学装备, 2024, 21 (6): 189- 194. doi: 10.3969/j.issn.1672-8270.2024.06.036
	ZHANG Y F , ZHANG Z R , DONG J , et al. Advances and challenges of AI technologies of healthcare in the era of large model. China Medical Equipment, 2024, 21 (6): 189- 194. doi: 10.3969/j.issn.1672-8270.2024.06.036
16	HAN T Y, ADAMS L C, PAPAIOANNOU J M, et al. MedAlpaca-an open-source collection of medical conversational AI models and training data[EB/OL]. [2025-04-15]. https://arxiv.org/pdf/2304.08247.
17	SINGHAL K , TU T , GOTTWEIS J , et al. Toward expert-level medical question answering with large language models. Nature Medicine, 2025, 31 (3): 943- 950. doi: 10.1038/s41591-024-03423-7
18	ZHANG H B, CHEN J Y, JIANG F, et al. HuatuoGPT, towards taming language model to be a doctor[C]//Proceedings of the Findings of the Association for Computational Linguistics. San Diego, USA: Association for Computational Linguistics, 2023: 10859-10885.
19	YUE S B, LIU S J, ZHOU Y X, et al. LawLLM: intelligent legal system with legal reasoning and verifiable retrieval[M]//Database Systems for Advanced Applications. Berlin, Germany: Springer, 2024: 304-321.
20	CHEN W, WANG Q S, LONG Z F, et al. DISC-FinLLM: a Chinese financial large language model based on multiple experts fine-tuning[EB/OL]. [2025-04-15]. https://arxiv.org/pdf/2310.15205.
21	DAN Y H, LEI Z K, GU Y Y, et al. EduChat: a large-scale language model-based chatbot system for intelligent education[EB/OL]. [2025-04-15]. https://arxiv.org/pdf/2308.02773.
22	LESTER B, AL-RFOU R, CONSTANT N. The power of scale for parameter-efficient prompt tuning[C]//Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. San Diego, USA: Association for Computational Linguistics, 2021: 3045-3059.
23	HU E J, SHEN Y L, WALLIS P, et al. LoRA: low-rank adaptation of large language models[EB/OL]. [2025-04-15]. https://arxiv.org/pdf/2106.09685.
24	SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms[EB/OL]. [2025-04-15]. https://arxiv.org/pdf/1707.06347.
25	刘波, 束朋辉, 刘军旗. 磁共振弥散加权成像联合磁共振波谱成像技术诊断脑肿瘤的价值. 中国医学工程, 2024, 32 (8): 112- 115. doi: 10.19338/j.issn.1672-2019.2024.08.024
	LIU B , SHU P H , LIU J Q . Value of diffusion-weighted magnetic resonance imaging combined with magnetic resonance spectroscopy in diagnosis of brain tumors. China Medical Engineering, 2024, 32 (8): 112- 115. doi: 10.19338/j.issn.1672-2019.2024.08.024
26	YANG A, YANG B S, HUI B Y, et al. Qwen2 technical report[EB/OL]. [2025-04-15]. https://arxiv.org/abs/2407.10671.
27	YANG A Y, XIAO B, WANG B N, et al. Baichuan 2: open large-scale language models[EB/OL]. [2025-04-15]. https://arxiv.org/pdf/2309.10305.
28	GLM T, ZENG A H. ChatGLM: a family of large language models from GLM-130B to GLM-4 all tools[EB/OL]. [2025-04-15]. https://arxiv.org/pdf/2406.12793.

[1]	崔爽锌, 卢搏, 张明月, 赵一汎, 王子铭, 刘新宇, 陈程立诏. 基于多模态融合的360°图像质量与美学评估方法[J]. 计算机工程, 2026, 52(6): 288-295.
[2]	李江涛, 马礼, 李阳. 基于大小模型融合的医疗数据分类方法[J]. 计算机工程, 2026, 52(5): 360-370.
[3]	余滔, 董军. 多智能体博弈环境下的大语言模型协同决策研究[J]. 计算机工程, 2026, 52(5): 336-348.
[4]	许旻辰, 屈丹, 司念文, 彭思思, 陈雅淇. 社交媒体虚假信息检测技术研究综述[J]. 计算机工程, 2026, 52(5): 60-80.
[5]	张添植, 周刚, 张爽, 陈静, 黄宁博, 吴皓. 针对图文模态间实体对齐的目标实体情感分类[J]. 计算机工程, 2026, 52(3): 222-233.
[6]	李博, 季佰军, 段湘煜. 基于译文易错词纠正机制的大语言模型机器翻译[J]. 计算机工程, 2026, 52(2): 372-382.
[7]	王利民, 朱光辉, 吴涛. 大模型技术演进：世界模型让人工智能从感知走向决策(特邀)[J]. 计算机工程, 2026, 52(2): 1-6.
[8]	张成辉, 罗景, 涂新辉, 陈雨霖. 基于大语言模型的语料库查询自动生成方法[J]. 计算机工程, 2026, 52(2): 404-412.
[9]	刘荣龙, 李梓炜, 万悦, 吴嘉婧, 蒋子规. 面向Web3钓鱼网站的域名检测与网页分析方法[J]. 计算机工程, 2026, 52(1): 76-85.
[10]	林丹, 卢顺峰, 刘姿妍, 张博昭, 何龙, 蒋子规, 吴嘉婧, 郑子彬. 大语言模型赋能区块链服务安全研究综述: 现状、挑战与机遇(特邀)[J]. 计算机工程, 2026, 52(1): 1-21.
[11]	张珑耀, 温东新, 马庄宇, 舒燕君, 李庆, 刘明义, 左德承. 基于大语言模型的多智能体系统异常综述(特邀)[J]. 计算机工程, 2026, 52(1): 22-32.
[12]	刘根壕, 张能, 郑子彬. 基于大语言模型的API使用约束知识构建[J]. 计算机工程, 2025, 51(8): 74-85.
[13]	梁绪宁, 王思琪, 杨海龙, 栾钟治, 刘轶, 钱德沛. 基于自适应张量交换和重算的大模型推理优化[J]. 计算机工程, 2025, 51(10): 27-36.
[14]	杨冬菊, 黄俊涛. 基于大语言模型的中文科技文献标注方法[J]. 计算机工程, 2024, 50(9): 113-120.
[15]	罗焕坤, 葛一烽, 刘帅. 大语言模型在数学推理中的研究进展[J]. 计算机工程, 2024, 50(9): 1-17.

选择文件类型/文献管理软件名称

选择包含的内容

BrainTumorLLM: 面向脑肿瘤诊疗的大语言模型优化与评估

BrainTumorLLM: Optimizing and Evaluating of Large Language Model for Brain Tumor Diagnosis and Treatment

RichHTML

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

图/表 12

参考文献 28

相关文章 15

编辑推荐

Metrics

本文评价

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

BrainTumorLLM: 面向脑肿瘤诊疗的大语言模型优化与评估

BrainTumorLLM: Optimizing and Evaluating of Large Language Model for Brain Tumor Diagnosis and Treatment

RichHTML

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

图/表 12

参考文献 28

相关文章 15

编辑推荐

Metrics

本文评价