Xingze Fang , Yun Zuo , Youxu Tan , Xiangrong Liu , Xiangxiang Zeng , Zhaohong Deng
{"title":"PreAIS:基于DNN-CNN深度学习模型的A-to-I编辑站点预测","authors":"Xingze Fang , Yun Zuo , Youxu Tan , Xiangrong Liu , Xiangxiang Zeng , Zhaohong Deng","doi":"10.1016/j.compbiolchem.2025.108612","DOIUrl":null,"url":null,"abstract":"<div><div>Adenosine-to-inosine RNA editing is crucial in biological processes and diseases, making A-to-I site identification key for research and drug development. However, accurate identification remains challenging due to complexity, low accuracy, and poor generalization in current models. To overcome these, a deep learning model called PreAIS has been proposed for identifying A-to-I editing sites. PreAIS first employs K-mer algorithm for feature extraction, followed by DNN-CNN for model1 construction. Finally, the model1 was trained and evaluated using 10-fold cross-validation. Compared to state-of-the-art models, PreAIS(model1) demonstrated improvements of 3.01 %, 0.67 %, and 5.04 % in Accuracy (ACC), Specificity (Sp), and Sensitivity (Sn) on Dataset 1. Additionally, using a human A-to-I RNA editing site dataset validated by Sanger sequencing, PreAIS(model1) identified 55 out of 58 sites with 94.8 % accuracy, outperforming other classifiers. To further validate the model's generalization capability, Bi-profile Bayes features were extracted from Dataset 2 for model evaluation. While keeping other parameters unchanged, only the input dimensions were adjusted to construct model2. Results from the independent test set demonstrated that even on a different dataset, our model continued to exhibit superior performance, once again surpassing the current best predictive models. Additionally, CAM was employed to interpret the prediction of PreAIS. The predictive model PreAIS and the related dataset constructed in this study can be accessed on the following GitHub page: <span><span>https://github.com/xzfang00/PreAIS</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":10616,"journal":{"name":"Computational Biology and Chemistry","volume":"119 ","pages":"Article 108612"},"PeriodicalIF":3.1000,"publicationDate":"2025-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"PreAIS: Prediction of A-to-I editing sites based on DNN-CNN deep learning models\",\"authors\":\"Xingze Fang , Yun Zuo , Youxu Tan , Xiangrong Liu , Xiangxiang Zeng , Zhaohong Deng\",\"doi\":\"10.1016/j.compbiolchem.2025.108612\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Adenosine-to-inosine RNA editing is crucial in biological processes and diseases, making A-to-I site identification key for research and drug development. However, accurate identification remains challenging due to complexity, low accuracy, and poor generalization in current models. To overcome these, a deep learning model called PreAIS has been proposed for identifying A-to-I editing sites. PreAIS first employs K-mer algorithm for feature extraction, followed by DNN-CNN for model1 construction. Finally, the model1 was trained and evaluated using 10-fold cross-validation. Compared to state-of-the-art models, PreAIS(model1) demonstrated improvements of 3.01 %, 0.67 %, and 5.04 % in Accuracy (ACC), Specificity (Sp), and Sensitivity (Sn) on Dataset 1. Additionally, using a human A-to-I RNA editing site dataset validated by Sanger sequencing, PreAIS(model1) identified 55 out of 58 sites with 94.8 % accuracy, outperforming other classifiers. To further validate the model's generalization capability, Bi-profile Bayes features were extracted from Dataset 2 for model evaluation. While keeping other parameters unchanged, only the input dimensions were adjusted to construct model2. Results from the independent test set demonstrated that even on a different dataset, our model continued to exhibit superior performance, once again surpassing the current best predictive models. Additionally, CAM was employed to interpret the prediction of PreAIS. The predictive model PreAIS and the related dataset constructed in this study can be accessed on the following GitHub page: <span><span>https://github.com/xzfang00/PreAIS</span><svg><path></path></svg></span>.</div></div>\",\"PeriodicalId\":10616,\"journal\":{\"name\":\"Computational Biology and Chemistry\",\"volume\":\"119 \",\"pages\":\"Article 108612\"},\"PeriodicalIF\":3.1000,\"publicationDate\":\"2025-07-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computational Biology and Chemistry\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1476927125002737\",\"RegionNum\":4,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Biology and Chemistry","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1476927125002737","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOLOGY","Score":null,"Total":0}
引用次数: 0
摘要
腺苷-肌苷RNA编辑在生物过程和疾病中至关重要,使A-to-I位点鉴定成为研究和药物开发的关键。然而,由于当前模型的复杂性、低准确性和较差的泛化,准确识别仍然具有挑战性。为了克服这些问题,一种名为PreAIS的深度学习模型被提出用于识别a - To - i编辑位点。PreAIS首先使用K-mer算法进行特征提取,然后使用DNN-CNN进行model1构建。最后,使用10倍交叉验证对模型1进行训练和评估。与最先进的模型相比,PreAIS(model1)在数据集1上的准确性(ACC)、特异性(Sp)和灵敏度(Sn)分别提高了3.01 %、0.67 %和5.04 %。此外,使用经过Sanger测序验证的人类a -to- i RNA编辑位点数据集,PreAIS(model1)以94.8 %的准确率识别了58个位点中的55个,优于其他分类器。为了进一步验证模型的泛化能力,从数据集2中提取双轮廓贝叶斯特征进行模型评估。在保持其他参数不变的情况下,仅调整输入维度构建model2。独立测试集的结果表明,即使在不同的数据集上,我们的模型也继续表现出优异的性能,再次超越了目前最好的预测模型。此外,采用CAM对PreAIS的预测进行解释。本研究构建的预测模型PreAIS和相关数据集可在以下GitHub页面访问:https://github.com/xzfang00/PreAIS。
PreAIS: Prediction of A-to-I editing sites based on DNN-CNN deep learning models
Adenosine-to-inosine RNA editing is crucial in biological processes and diseases, making A-to-I site identification key for research and drug development. However, accurate identification remains challenging due to complexity, low accuracy, and poor generalization in current models. To overcome these, a deep learning model called PreAIS has been proposed for identifying A-to-I editing sites. PreAIS first employs K-mer algorithm for feature extraction, followed by DNN-CNN for model1 construction. Finally, the model1 was trained and evaluated using 10-fold cross-validation. Compared to state-of-the-art models, PreAIS(model1) demonstrated improvements of 3.01 %, 0.67 %, and 5.04 % in Accuracy (ACC), Specificity (Sp), and Sensitivity (Sn) on Dataset 1. Additionally, using a human A-to-I RNA editing site dataset validated by Sanger sequencing, PreAIS(model1) identified 55 out of 58 sites with 94.8 % accuracy, outperforming other classifiers. To further validate the model's generalization capability, Bi-profile Bayes features were extracted from Dataset 2 for model evaluation. While keeping other parameters unchanged, only the input dimensions were adjusted to construct model2. Results from the independent test set demonstrated that even on a different dataset, our model continued to exhibit superior performance, once again surpassing the current best predictive models. Additionally, CAM was employed to interpret the prediction of PreAIS. The predictive model PreAIS and the related dataset constructed in this study can be accessed on the following GitHub page: https://github.com/xzfang00/PreAIS.
期刊介绍:
Computational Biology and Chemistry publishes original research papers and review articles in all areas of computational life sciences. High quality research contributions with a major computational component in the areas of nucleic acid and protein sequence research, molecular evolution, molecular genetics (functional genomics and proteomics), theory and practice of either biology-specific or chemical-biology-specific modeling, and structural biology of nucleic acids and proteins are particularly welcome. Exceptionally high quality research work in bioinformatics, systems biology, ecology, computational pharmacology, metabolism, biomedical engineering, epidemiology, and statistical genetics will also be considered.
Given their inherent uncertainty, protein modeling and molecular docking studies should be thoroughly validated. In the absence of experimental results for validation, the use of molecular dynamics simulations along with detailed free energy calculations, for example, should be used as complementary techniques to support the major conclusions. Submissions of premature modeling exercises without additional biological insights will not be considered.
Review articles will generally be commissioned by the editors and should not be submitted to the journal without explicit invitation. However prospective authors are welcome to send a brief (one to three pages) synopsis, which will be evaluated by the editors.