作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2022, Vol. 48 ›› Issue (9): 254-261. doi: 10.19678/j.issn.1000-3428.0062650

• 开发研究与工程应用 • 上一篇    下一篇

融合多类型数据的胃癌风险预测模型

陈先来1,2, 贾一珍2,3, 安莹1,2, 唐红英4   

  1. 1. 中南大学 大数据研究院, 长沙 410083;
    2. 中南大学 医疗大数据应用技术国家工程研究中心, 长沙 410083;
    3. 中南大学 计算机学院, 长沙 410083;
    4. 中南大学湘雅医院 临床护理学教研室, 长沙 410008
  • 收稿日期:2021-09-10 修回日期:2021-10-27 发布日期:2021-11-05
  • 作者简介:陈先来(1970—),男,教授、博士,主研方向为医学大数据处理;贾一珍,硕士研究生;安莹,副教授、博士;唐红英(通信作者),主管护师、硕士。
  • 基金资助:
    国家重点研发计划“精准医学研究”(2016YFC0901705)。

Gastric Cancer Risk Prediction Model with Multi-Type Data

CHEN Xianlai1,2, JIA Yizhen2,3, AN Ying1,2, TANG Hongying4   

  1. 1. Big Data Institute, Central South University, Changsha 410083, China;
    2. National Engineering Research Center for Medical Big Data Application Technology, Central South University, Changsha 410083, China;
    3. School of Computer Science and Engineering, Central South University, Changsha 410083, China;
    4. Department of Clinical Nursing, Xiangya Hospital, Central South University, Changsha 410008, China
  • Received:2021-09-10 Revised:2021-10-27 Published:2021-11-05

摘要: 胃癌的早期发现对于降低死亡率、提高患者生存质量具有重要意义。现有预测模型通过单一的结构化电子健康记录数据预测患者的癌症患病风险,但是无法有效地整合不同类型的临床数据且不能满足实际的临床需求。提出一种基于多类型异构数据融合的胃癌风险预测模型。利用预训练语言模型提取电子健康记录数据中的入院记录文本信息,采用降噪自动编码器提取实验室检验数据的特征,同时对低维度的结构化数据向量表示的维度进行扩增,以避免低维度的实验检验特征表示被高维度特征淹没。在此基础上,将扩增后的结构化数据向量与高维度的文本表示向量在相同的尺度上进行融合,从而预测患者的患病风险。实验结果表明,该模型的准确率可达到0.949 337,相比支持向量机、逻辑回归、朴素贝叶斯等模型,具有较优的预测性能。

关键词: 风险预测, 融合模型, 深度学习, 胃癌, 电子病历

Abstract: The early detection of gastric cancer is of great significance for reducing mortality and improving the life quality of patients.Existing prediction models predict the cancer risk of patients using single-structured Electronic Health Record (EHR) data but cannot effectively integrate different types of clinical data or satisfy actual clinical needs. This study proposes a gastric cancer risk prediction model based on multi-type heterogeneous data fusion.The pretraining language model is used to extract text information from the admission record in the EHR data, and the denoising auto-encoder is used to extract features from the laboratory test data.Simultaneously, the dimension represented by the low-dimensional structured-data vector is expanded to avoid drowning the low-dimensional experimental test feature representation by high-dimensional features.On this basis, the expanded structured-data vector and high-dimensional text representation vector are fused on the same scale to predict the patient's disease risk.Experiments show that the accuracy of the proposed model can reach 0.949 337, which is a better prediction performance than that achieved using support vector machine, logistic regression, naive Bayes, and other models.

Key words: risk prediction, fusion model, deep learning, gastric cancer, electronic medical record

中图分类号: