ACVPred：通过迁移学习与数据扩增相结合，增强抗oronavirus 肽的预测能力

IF 6.2 2区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

Future Generation Computer Systems-The International Journal of Escience Pub Date : 2024-06-07 DOI:10.1016/j.future.2024.06.008

Yi Xu , Tianyuan Liu , Yu Yang , Juanjuan Kang , Liping Ren , Hui Ding , Yang Zhang

{"title":"ACVPred：通过迁移学习与数据扩增相结合，增强抗oronavirus 肽的预测能力","authors":"Yi Xu , Tianyuan Liu , Yu Yang , Juanjuan Kang , Liping Ren , Hui Ding , Yang Zhang","doi":"10.1016/j.future.2024.06.008","DOIUrl":null,"url":null,"abstract":"<div><p>Anti-coronavirus peptides (ACVPs) have garnered significant attention in COVID-19 therapeutic research due to their precise targeting, low risk of drug resistance, flexible synthesis, and effectiveness against viral mutations. Although some in-silico methods have been developed to predict ACVPs, they suffer from challenges such as limited datasets and a lack of interpretability. Hence, this study introduces ACVPred, an algorithm for ACVP prediction, based on two few-shot learning strategies: transfer learning and data augmentation strategies. Our experiments demonstrate that data augmentation can significantly enhance model performance, while transfer learning can effectively prevent overfitting and strengthen generalizability. Compared to existing methods, ACVPred exhibits superior performance and robust generalization both in training and independent test datasets. Moreover, the interpretability study of the model reveals that its transformer-based core can effectively capture key motifs on ACVP sequences, demonstrating strong feature learning capabilities. Additionally, the findings suggest that the sequence feature weights and key motif positions tend to be distributed towards the N-terminal end of ACVP sequences, providing vital clues for the design of ACVPs. In summary, ACVPred is not only a practical and valuable tool for aiding in the design of ACVPs, but its algorithmic concept also serves as an important reference for research on other small sample prediction problems.</p></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"160 ","pages":"Pages 305-315"},"PeriodicalIF":6.2000,"publicationDate":"2024-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"ACVPred: Enhanced prediction of anti-coronavirus peptides by transfer learning combined with data augmentation\",\"authors\":\"Yi Xu , Tianyuan Liu , Yu Yang , Juanjuan Kang , Liping Ren , Hui Ding , Yang Zhang\",\"doi\":\"10.1016/j.future.2024.06.008\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Anti-coronavirus peptides (ACVPs) have garnered significant attention in COVID-19 therapeutic research due to their precise targeting, low risk of drug resistance, flexible synthesis, and effectiveness against viral mutations. Although some in-silico methods have been developed to predict ACVPs, they suffer from challenges such as limited datasets and a lack of interpretability. Hence, this study introduces ACVPred, an algorithm for ACVP prediction, based on two few-shot learning strategies: transfer learning and data augmentation strategies. Our experiments demonstrate that data augmentation can significantly enhance model performance, while transfer learning can effectively prevent overfitting and strengthen generalizability. Compared to existing methods, ACVPred exhibits superior performance and robust generalization both in training and independent test datasets. Moreover, the interpretability study of the model reveals that its transformer-based core can effectively capture key motifs on ACVP sequences, demonstrating strong feature learning capabilities. Additionally, the findings suggest that the sequence feature weights and key motif positions tend to be distributed towards the N-terminal end of ACVP sequences, providing vital clues for the design of ACVPs. In summary, ACVPred is not only a practical and valuable tool for aiding in the design of ACVPs, but its algorithmic concept also serves as an important reference for research on other small sample prediction problems.</p></div>\",\"PeriodicalId\":55132,\"journal\":{\"name\":\"Future Generation Computer Systems-The International Journal of Escience\",\"volume\":\"160 \",\"pages\":\"Pages 305-315\"},\"PeriodicalIF\":6.2000,\"publicationDate\":\"2024-06-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Future Generation Computer Systems-The International Journal of Escience\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0167739X2400308X\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Future Generation Computer Systems-The International Journal of Escience","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167739X2400308X","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

摘要

抗冠状病毒肽（ACVPs）因其靶向精确、耐药性风险低、合成灵活、对病毒变异有效而在 COVID-19 治疗研究中备受关注。虽然已经开发出了一些用于预测 ACVPs 的硅学方法，但这些方法面临着数据集有限和缺乏可解释性等挑战。因此，本研究引入了 ACVPred 算法，这是一种基于两种少量学习策略的 ACVP 预测算法：迁移学习策略和数据增强策略。我们的实验证明，数据扩增能显著提高模型性能，而迁移学习能有效防止过拟合并增强泛化能力。与现有方法相比，ACVPred 在训练数据集和独立测试数据集上都表现出卓越的性能和强大的泛化能力。此外，对模型的可解释性研究表明，其基于变换器的核心能有效捕捉 ACVP 序列上的关键图案，显示出强大的特征学习能力。此外，研究结果表明，序列特征权重和关键图案位置往往分布在 ACVP 序列的 N 端，这为 ACVP 的设计提供了重要线索。总之，ACVPred 不仅是帮助设计 ACVP 的实用而有价值的工具，而且其算法理念也为其他小样本预测问题的研究提供了重要参考。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

ACVPred: Enhanced prediction of anti-coronavirus peptides by transfer learning combined with data augmentation

查看原文本刊更多论文

ACVPred: Enhanced prediction of anti-coronavirus peptides by transfer learning combined with data augmentation

Anti-coronavirus peptides (ACVPs) have garnered significant attention in COVID-19 therapeutic research due to their precise targeting, low risk of drug resistance, flexible synthesis, and effectiveness against viral mutations. Although some in-silico methods have been developed to predict ACVPs, they suffer from challenges such as limited datasets and a lack of interpretability. Hence, this study introduces ACVPred, an algorithm for ACVP prediction, based on two few-shot learning strategies: transfer learning and data augmentation strategies. Our experiments demonstrate that data augmentation can significantly enhance model performance, while transfer learning can effectively prevent overfitting and strengthen generalizability. Compared to existing methods, ACVPred exhibits superior performance and robust generalization both in training and independent test datasets. Moreover, the interpretability study of the model reveals that its transformer-based core can effectively capture key motifs on ACVP sequences, demonstrating strong feature learning capabilities. Additionally, the findings suggest that the sequence feature weights and key motif positions tend to be distributed towards the N-terminal end of ACVP sequences, providing vital clues for the design of ACVPs. In summary, ACVPred is not only a practical and valuable tool for aiding in the design of ACVPs, but its algorithmic concept also serves as an important reference for research on other small sample prediction problems.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Future Generation Computer Systems-The International Journal of Escience 工程技术-计算机：理论方法

CiteScore

19.90

自引率

2.70%

发文量

376

审稿时长

10.6 months

期刊介绍： Computing infrastructures and systems are constantly evolving, resulting in increasingly complex and collaborative scientific applications. To cope with these advancements, there is a growing need for collaborative tools that can effectively map, control, and execute these applications. Furthermore, with the explosion of Big Data, there is a requirement for innovative methods and infrastructures to collect, analyze, and derive meaningful insights from the vast amount of data generated. This necessitates the integration of computational and storage capabilities, databases, sensors, and human collaboration. Future Generation Computer Systems aims to pioneer advancements in distributed systems, collaborative environments, high-performance computing, and Big Data analytics. It strives to stay at the forefront of developments in grids, clouds, and the Internet of Things (IoT) to effectively address the challenges posed by these wide-area, fully distributed sensing and computing systems.