Attentional BiLSTM and Prototype Networks for lncRNA Subcellular Localization Prediction

doi:10.19678/j.issn.1000-3428.0069295

Abstract

Abstract:

Long non-coding RNA (lncRNA) are crucial in many cellular life processes, and their subcellular localization can provide key information for their functional identification. Identifying the subcellular localization of lncRNA through traditional biochemical experimental methods is limited by complex procedures, difficulty in replication, and high cost. Therefore, an attentional Bi-directional Long Short-Term Memory (BiLSTM) and prototype network approach for the prediction of lncRNA subcellular localization, named BP-lncLoc, is proposed. First, the K-mer initial features are obtained from the original sequence data and balanced. Second, the attention BiLSTM is incorporated to effectively extract the deep implicit features of lncRNA sequences and optimize the neural network to tackle the gradient vanishing problem that may occur when using high-dimensional data. Third, a prototype network prediction framework that does not rely on large-scale training samples is constructed considering the small-sample nature of lncRNA subcellular localization data. Finally, from the perspective of quantifying the importance of input features to output decisions, good interpretability is achieved for the prediction model. The experimental results show that the accuracy of this method on public datasets reaches 98.89%, which is superior to those of comparative methods, rendering this method suitable for the application of lncRNA subcellular localization prediction.

Key words: lncRNA subcellular localization, unbalanced learning, Bidirectional Long and Short-Term Memory (BiLSTM) networks, prototype networks, interpretable

摘要：

长链非编码RNA(lncRNA)在细胞的许多生命过程中发挥着重要作用, 而lncRNA亚细胞定位可为其功能识别带来关键信息。通过传统生物生化实验方法鉴定lncRNA亚细胞定位时存在程序复杂、难以复制、成本高等缺点, 为此, 提出一种面向lncRNA亚细胞定位预测的注意力双向长短时记忆(BiLSTM)与原型网络方法BP-lncLoc。首先, 基于原始序列数据获取K-mer初始特征, 并对其进行平衡处理; 其次, 结合注意力BiLSTM有效提取lncRNA序列的深层隐含特征, 并优化神经网络在处理高维数据时可能出现的梯度消失问题; 随后, 针对lncRNA亚细胞定位数据的小样本特性, 构建不依赖大规模训练样本的原型网络预测框架; 最后, 从量化输入特征对输出决策重要性的角度出发, 实现预测模型的可解释性。实验结果表明, 该方法在公开数据集上的准确率达到了98.89%, 优于对比方法, 为lncRNA亚细胞定位预测的应用提供了一种新思路。

关键词: lncRNA亚细胞定位, 不平衡学习, 双向长短时记忆网络, 原型网络, 可解释

SUN Rongneng, LIU Lin, KANG Yuanzhao. Attentional BiLSTM and Prototype Networks for lncRNA Subcellular Localization Prediction[J]. Computer Engineering, 2025, 51(8): 168-180.

孙荣能, 刘琳, 亢元召. 面向lncRNA亚细胞定位预测的注意力BiLSTM与原型网络[J]. 计算机工程, 2025, 51(8): 168-180.

/ Recommend / Download Citations

URL: https://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0069295

https://www.ecice06.com/EN/Y2025/V51/I8/168

Figures/Tables 17

Fig.1 BP-lncLoc model architecture

Fig.2 Balance processing effect of dataset 1

Fig.3 Balance processing effect of dataset 2

Fig.4 Overall framework of A-BiLSTM

Fig.5 Framework of prototype network

Fig.6 Accuracy under different K-mer

Fig.7 Comparison of 5-way N-shot experiment results

Fig.8 top15 features of prototype contributions

Fig.9 top15 features of discriminant contributions

References 34

1	TAFT R J , PANG K C , MERCER T R , et al. Non-coding RNAs: regulators of disease. J Pathol, 2010, 220 (2): 126- 139. doi: 10.1002/path.2638
2	HERMAN A B , TSITSIPATIS D , GOROSPE M . Integrated lncRNA function upon genomic and epigenomic regulation. Molecular Cell, 2022, 82 (12): 2252- 2266. doi: 10.1016/j.molcel.2022.05.027
3	MATTICK J S , AMARAL P P , CARNINCI P , et al. Long non-coding RNAs: definitions, functions, challenges and recommendations. Nature Reviews. Molecular Cell Biology, 2023, 24 (6): 430- 447. doi: 10.1038/s41580-022-00566-8
4	STATELLO L , GUO C J , CHEN L L , et al. Gene regulation by long non-coding RNAs and its biological functions. Nature Reviews. Molecular Cell Biology, 2021, 22 (2): 96- 118. doi: 10.1038/s41580-020-00315-9
5	SHARMA H , CARNINCI P . The secret life of lncRNAs: conserved, yet not conserved. Cell, 2020, 181 (3): 512- 514. doi: 10.1016/j.cell.2020.04.012
6	SU K , WANG N N , SHAO Q Q , et al. The role of a ceRNA regulatory network based on lncRNA MALAT1 site in cancer progression. Biomedicine & Pharmacotherapy, 2021, 137, 111389. URL
7	苏越, 梁琳慧, 何祥火. 长链非编码RNA亚细胞定位和功能的研究进展. 基础医学与临床, 2023, 43 (10): 1580- 1584.
	SU Y , LIANG L H , HE X H . Research progress on subcellular localization and function of long non-coding RNA. Basic and Clinical Medicine, 2023, 43 (10): 1580- 1584.
8	YAO R W , WANG Y , CHEN L L . Cellular functions of long noncoding RNAs. Nature Cell Biology, 2019, 21 (5): 542- 551. doi: 10.1038/s41556-019-0311-8
9	LI R H , TIAN T , GE Q W , et al. A phosphatidic acid-binding lncRNA SNHG9 facilitates LATS1 liquid-liquid phase separation to promote oncogenic YAP signaling. Cell Research, 2021, 31 (10): 1088- 1105. doi: 10.1038/s41422-021-00530-9
10	WEN X, GAO L, GUO X L, et al. lncSLdb: a resource for long non-coding RNA subcellular localization[EB/OL]. [2023-10-05]. https://academic.oup.com/database/article/doi/10.1093/database/bay085/5096723?login=false.
11	XIE F , TIMME K A , WOOD J R . Using single molecule mRNA fluorescent in situ hybridization (RNA-FISH) to quantify mRNAs in individual murine oocytes and embryos. Scientific Reports, 2018, 8 (1): 7930. doi: 10.1038/s41598-018-26345-0
12	SU Z D , HUANG Y , ZHANG Z Y , et al. iLoc-lncRNA: predict the subcellular location of lncRNAs by incorporating octamer composition into general PseKNC. Bioinformatics, 2018, 34 (24): 4196- 4204. doi: 10.1093/bioinformatics/bty508
13	AHMAD A , LIN H , SHATABDA S . Locate-R: subcellular localization of long non-coding RNAs using nucleotide compositions. Genomics, 2020, 112 (3): 2583- 2589. doi: 10.1016/j.ygeno.2020.02.011
14	FENG S , LIANG Y , DU W , et al. LncLocation: efficient subcellular location prediction of long non-coding RNA-based multi-source heterogeneous feature fusion. International Journal of Molecular Sciences, 2020, 21 (19): E7271. doi: 10.3390/ijms21197271
15	LV J Y , ZHENG P J , QI Y , et al. LightGBM-LncLoc: a LightGBM-based computational predictor for recognizing long non-coding RNA subcellular localization. Mathematics, 2023, 11 (3): 602. doi: 10.3390/math11030602
16	FU X , CHEN Y , TIAN S . DlncRNALoc: a discrete wavelet transform-based model for predicting lncRNA subcellular localization. Mathematical Biosciences and Engineering : MBE, 2023, 20 (12): 20648- 20667. doi: 10.3934/mbe.2023913
17	ZENG M , WU Y F , LU C Q , et al. DeepLncLoc: a deep learning framework for long non-coding RNA subcellular localization prediction based on subsequence embedding. Briefings in Bioinformatics, 2022, 23 (1): 360. doi: 10.1093/bib/bbab360
18	WANG Y , ZHU X P , YANG L L , et al. IDDLncLoc: subcellular localization of LncRNAs based on a framework for imbalanced data distributions. Interdisciplinary Sciences: Computational Life Sciences, 2022, 14 (2): 409- 420. doi: 10.1007/s12539-021-00497-6
19	CAI J Z , WANG T , DENG X , et al. GM-lncLoc: LncRNAs subcellular localization prediction based on graph neural network with meta-learning. BMC Genomics, 2023, 24 (1): 52. doi: 10.1186/s12864-022-09034-1
20	DOUZAS G , BACAO F , LAST F . Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE. Information Sciences, 2018, 465, 1- 20. doi: 10.1016/j.ins.2018.06.056
21	CAO Z , PAN X Y , YANG Y , et al. The lncLocator: a subcellular localization predictor for long non-coding RNAs based on a stacked ensemble classifier. Bioinformatics, 2018, 34 (13): 2185- 2194. doi: 10.1093/bioinformatics/bty085
22	DAI Q , LIU X Q , YAO Y H , et al. Numerical characteristics of word frequencies and their application to dissimilarity measure for sequence comparison. Journal of Theoretical Biology, 2011, 276 (1): 174- 180. doi: 10.1016/j.jtbi.2011.02.005
23	吴耀群. 基于K-mer位置信息加权序列相似性算法的研究[D]. 湘潭: 湘潭大学, 2022.
	WU Y Q. Research on weighted sequence similarity algorithm based on K-mer position information[D]. Xiangtan: Xiangtan University, 2022. (in Chinese)
24	张小丹, 李喆, 卫泽刚, 等. 基于k-mer词频向量的九种DNA序列相似性计算方法比较分析. 科学技术创新, 2023 (21): 106- 111.
	ZHANG X D , LI Z , WEI Z G , et al. Comparison of nine methods for computing DNA sequence similarity based on k-mer frequency vector. Scientific and Technological Innovation, 2023 (21): 106- 111.
25	FAN Y X , CHEN M J , ZHU Q Q . lncLocPred: predicting LncRNA subcellular localization using multiple sequence feature information. IEEE Access, 2020, 8, 124702- 124711. doi: 10.1109/ACCESS.2020.3007317
26	YANG X F , ZHOU Y K , ZHANG L , et al. Predicting LncRNA subcellular localization using unbalanced pseudo-k nucleotide compositions. Current Bioinformatics, 2020, 15 (6): 554- 562. doi: 10.2174/1574893614666190902151038
27	GAYATRI E , AARTHY S L . Reduction of overfitting on the highly imbalanced ISIC-2019 skin dataset using deep learning frameworks. Journal of X-ray Science and Technology, 2024, 32 (1): 53- 68. URL
28	CHEN T , XU R F , HE Y L , et al. Improving sentiment analysis via sentence type classification using BiLSTM-CRF and CNN. Expert Systems with Applications, 2017, 72, 221- 230. doi: 10.1016/j.eswa.2016.10.065
29	SNELL J, SWERSKY K, ZEMEL R. Prototypical networks for few-shot learning[EB/OL]. [2023-10-05]. https://arxiv.org/abs/1703.05175.
30	陈良臣, 傅德印. 面向小样本数据的机器学习方法研究综述. 计算机工程, 2022, 48 (11): 1- 13. doi: 10.19678/j.issn.1000-3428.0065347
	CHEN L C , FU D Y . Survey on machine learning methods for small sample data. Computer Engineering, 2022, 48 (11): 1- 13. doi: 10.19678/j.issn.1000-3428.0065347
31	TJOA E , GUAN C T . A survey on explainable artificial intelligence (XAI): toward medical XAI. IEEE Transactions on Neural Networks and Learning Systems, 2020, 32 (11): 4793- 4813. URL
32	BARREDO ARRIETA A , DÍAZ-RODRÍGUEZ N , DEL SER J , et al. Explainable artificial intelligence (XAI): concepts, taxonomies, opportunities and challenges toward responsible AI. Information Fusion, 2020, 58, 82- 115.
33	KIM B, DOSHI-VELEZ F. Interpretable machine learning: the fuss, the concrete and the questions[EB/OL]. [2023-10-05]. https://arxiv.org/abs/2103.11251.
34	REYAD M , SARHAN A M , ARAFA M . A modified Adam algorithm for deep neural network optimization. Neural Computing and Applications, 2023, 35 (23): 17095- 17112.

Please choose a citation manager

Content to export