Author Login Chief Editor Login Reviewer Login Editor Login Remote Office

Computer Engineering ›› 2025, Vol. 51 ›› Issue (8): 168-180. doi: 10.19678/j.issn.1000-3428.0069295

• Artificial Intelligence and Pattern Recognition • Previous Articles     Next Articles

Attentional BiLSTM and Prototype Networks for lncRNA Subcellular Localization Prediction

SUN Rongneng1, LIU Lin1,2,*(), KANG Yuanzhao1   

  1. 1. School of Information, Yunnan Normal University, Kunming 650504, Yunnan, China
    2. Engineering Research Center of Computer Vision and Intelligent Control Technology, Department of Education of Yunnan Province, Kunming 650504, Yunnan, China
  • Received:2024-01-25 Revised:2024-04-25 Online:2025-08-15 Published:2024-08-28
  • Contact: LIU Lin

面向lncRNA亚细胞定位预测的注意力BiLSTM与原型网络

孙荣能1, 刘琳1,2,*(), 亢元召1   

  1. 1. 云南师范大学信息学院, 云南 昆明 650504
    2. 云南省教育厅计算机视觉与智能控制技术工程研究中心, 云南 昆明 650504
  • 通讯作者: 刘琳
  • 基金资助:
    云南省基础研究面上项目(202201AT070042); 国家自然科学基金(61862067); 国家自然科学基金(U1902201); 云南省科技厅-云南大学双一流联合基金重点项目(2019FY003027); 国家重点研发计划(2022YFC2602500); 云南师范大学研究生科研创新基金项目(YJ SJ23-B173)

Abstract:

Long non-coding RNA (lncRNA) are crucial in many cellular life processes, and their subcellular localization can provide key information for their functional identification. Identifying the subcellular localization of lncRNA through traditional biochemical experimental methods is limited by complex procedures, difficulty in replication, and high cost. Therefore, an attentional Bi-directional Long Short-Term Memory (BiLSTM) and prototype network approach for the prediction of lncRNA subcellular localization, named BP-lncLoc, is proposed. First, the K-mer initial features are obtained from the original sequence data and balanced. Second, the attention BiLSTM is incorporated to effectively extract the deep implicit features of lncRNA sequences and optimize the neural network to tackle the gradient vanishing problem that may occur when using high-dimensional data. Third, a prototype network prediction framework that does not rely on large-scale training samples is constructed considering the small-sample nature of lncRNA subcellular localization data. Finally, from the perspective of quantifying the importance of input features to output decisions, good interpretability is achieved for the prediction model. The experimental results show that the accuracy of this method on public datasets reaches 98.89%, which is superior to those of comparative methods, rendering this method suitable for the application of lncRNA subcellular localization prediction.

Key words: lncRNA subcellular localization, unbalanced learning, Bidirectional Long and Short-Term Memory (BiLSTM) networks, prototype networks, interpretable

摘要:

长链非编码RNA(lncRNA)在细胞的许多生命过程中发挥着重要作用, 而lncRNA亚细胞定位可为其功能识别带来关键信息。通过传统生物生化实验方法鉴定lncRNA亚细胞定位时存在程序复杂、难以复制、成本高等缺点, 为此, 提出一种面向lncRNA亚细胞定位预测的注意力双向长短时记忆(BiLSTM)与原型网络方法BP-lncLoc。首先, 基于原始序列数据获取K-mer初始特征, 并对其进行平衡处理; 其次, 结合注意力BiLSTM有效提取lncRNA序列的深层隐含特征, 并优化神经网络在处理高维数据时可能出现的梯度消失问题; 随后, 针对lncRNA亚细胞定位数据的小样本特性, 构建不依赖大规模训练样本的原型网络预测框架; 最后, 从量化输入特征对输出决策重要性的角度出发, 实现预测模型的可解释性。实验结果表明, 该方法在公开数据集上的准确率达到了98.89%, 优于对比方法, 为lncRNA亚细胞定位预测的应用提供了一种新思路。

关键词: lncRNA亚细胞定位, 不平衡学习, 双向长短时记忆网络, 原型网络, 可解释