Peptipedia v2.0：多肽序列数据库和用户友好型网络平台。重大更新

bioRxiv Pub Date : 2024-07-16 DOI:10.1101/2024.07.11.603053

Gabriel Cabas-Mora, Anamaría Daza, Nicole Soto-García, Valentina Garrido, Diego Alvarez, Marcelo A Navarrete, Lindybeth Sarmiento-Varón, J. H. Sepúlveda Yáñez, Mehdi D. Davari, Frederic Cadet, Á. Olivera-Nappa, Roberto Uribe-Paredes, David Medina-Ortiz

{"title":"Peptipedia v2.0：多肽序列数据库和用户友好型网络平台。重大更新","authors":"Gabriel Cabas-Mora, Anamaría Daza, Nicole Soto-García, Valentina Garrido, Diego Alvarez, Marcelo A Navarrete, Lindybeth Sarmiento-Varón, J. H. Sepúlveda Yáñez, Mehdi D. Davari, Frederic Cadet, Á. Olivera-Nappa, Roberto Uribe-Paredes, David Medina-Ortiz","doi":"10.1101/2024.07.11.603053","DOIUrl":null,"url":null,"abstract":"In recent years, peptides have gained significant relevance due to their therapeutic properties. The surge in peptide production and synthesis has generated vast amounts of data, enabling the creation of comprehensive databases and information repositories. Advances in sequencing techniques and artificial intelligence have further accelerated the design of tailor-made peptides. However, leveraging these techniques requires versatile and continuously updated storage systems, along with tools that facilitate peptide research and the implementation of machine learning for predictive systems. This work introduces Peptipedia v2.0, one of the most comprehensive public repositories of peptides, supporting biotechnological research by simplifying peptide study and annotation. Peptipedia v2.0 has expanded its collection by over 45% with peptide sequences that have reported biological activities. The functional biological activity tree has been revised and enhanced, incorporating new categories such as cosmetic and dermatological activities, molecular binding, and anti-ageing properties. Utilizing protein language models and machine learning, more than 90 binary classification models have been trained, validated, and incorporated into Peptipedia v2.0. These models exhibit average sensitivities and specificities of 0.877 ± 0.0530 and 0.873 ±0.054, respectively, facilitating the annotation of more than 3.6 million peptide sequences with unknown biological activities, also registered in Peptipedia v2.0. Additionally, Peptipedia v2.0 introduces description tools based on structural and ontological properties and user-friendly machinelearning tools to facilitate the application of machine-learning strategies to study peptide sequences. Peptipedia v2.0 is accessible under the Creative Commons CC BY-NC-ND 4.0 license at https://peptipedia.cl/.","PeriodicalId":9124,"journal":{"name":"bioRxiv","volume":"83 21","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Peptipedia v2.0: A peptide sequence database and user-friendly web platform. A major update\",\"authors\":\"Gabriel Cabas-Mora, Anamaría Daza, Nicole Soto-García, Valentina Garrido, Diego Alvarez, Marcelo A Navarrete, Lindybeth Sarmiento-Varón, J. H. Sepúlveda Yáñez, Mehdi D. Davari, Frederic Cadet, Á. Olivera-Nappa, Roberto Uribe-Paredes, David Medina-Ortiz\",\"doi\":\"10.1101/2024.07.11.603053\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In recent years, peptides have gained significant relevance due to their therapeutic properties. The surge in peptide production and synthesis has generated vast amounts of data, enabling the creation of comprehensive databases and information repositories. Advances in sequencing techniques and artificial intelligence have further accelerated the design of tailor-made peptides. However, leveraging these techniques requires versatile and continuously updated storage systems, along with tools that facilitate peptide research and the implementation of machine learning for predictive systems. This work introduces Peptipedia v2.0, one of the most comprehensive public repositories of peptides, supporting biotechnological research by simplifying peptide study and annotation. Peptipedia v2.0 has expanded its collection by over 45% with peptide sequences that have reported biological activities. The functional biological activity tree has been revised and enhanced, incorporating new categories such as cosmetic and dermatological activities, molecular binding, and anti-ageing properties. Utilizing protein language models and machine learning, more than 90 binary classification models have been trained, validated, and incorporated into Peptipedia v2.0. These models exhibit average sensitivities and specificities of 0.877 ± 0.0530 and 0.873 ±0.054, respectively, facilitating the annotation of more than 3.6 million peptide sequences with unknown biological activities, also registered in Peptipedia v2.0. Additionally, Peptipedia v2.0 introduces description tools based on structural and ontological properties and user-friendly machinelearning tools to facilitate the application of machine-learning strategies to study peptide sequences. Peptipedia v2.0 is accessible under the Creative Commons CC BY-NC-ND 4.0 license at https://peptipedia.cl/.\",\"PeriodicalId\":9124,\"journal\":{\"name\":\"bioRxiv\",\"volume\":\"83 21\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-07-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"bioRxiv\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1101/2024.07.11.603053\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"bioRxiv","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.07.11.603053","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

近年来，肽因其治疗特性而变得越来越重要。多肽生产和合成的激增产生了大量数据，从而促成了综合数据库和信息库的建立。测序技术和人工智能的进步进一步加速了定制肽的设计。然而，要充分利用这些技术，需要多功能和不断更新的存储系统，以及促进多肽研究和为预测系统实施机器学习的工具。这项工作介绍了 Peptipedia v2.0，它是最全面的多肽公共资料库之一，通过简化多肽研究和注释支持生物技术研究。Peptipedia v2.0将其收集的有生物活性报道的多肽序列扩大了45%以上。功能生物活性树经过修订和增强，纳入了化妆品和皮肤活性、分子结合和抗衰老特性等新类别。利用蛋白质语言模型和机器学习，90 多个二元分类模型已经过训练、验证并纳入 Peptipedia v2.0。这些模型的平均灵敏度和特异度分别为 0.877 ± 0.0530 和 0.873 ± 0.054，有助于对 360 多万个具有未知生物活性的肽序列进行注释，这些序列也已在 Peptipedia v2.0 中注册。此外，Peptipedia v2.0还引入了基于结构和本体特性的描述工具以及用户友好型机器学习工具，以促进机器学习策略在多肽序列研究中的应用。Peptipedia v2.0 采用知识共享 CC BY-NC-ND 4.0 许可，可在 https://peptipedia.cl/ 访问。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Peptipedia v2.0: A peptide sequence database and user-friendly web platform. A major update

In recent years, peptides have gained significant relevance due to their therapeutic properties. The surge in peptide production and synthesis has generated vast amounts of data, enabling the creation of comprehensive databases and information repositories. Advances in sequencing techniques and artificial intelligence have further accelerated the design of tailor-made peptides. However, leveraging these techniques requires versatile and continuously updated storage systems, along with tools that facilitate peptide research and the implementation of machine learning for predictive systems. This work introduces Peptipedia v2.0, one of the most comprehensive public repositories of peptides, supporting biotechnological research by simplifying peptide study and annotation. Peptipedia v2.0 has expanded its collection by over 45% with peptide sequences that have reported biological activities. The functional biological activity tree has been revised and enhanced, incorporating new categories such as cosmetic and dermatological activities, molecular binding, and anti-ageing properties. Utilizing protein language models and machine learning, more than 90 binary classification models have been trained, validated, and incorporated into Peptipedia v2.0. These models exhibit average sensitivities and specificities of 0.877 ± 0.0530 and 0.873 ±0.054, respectively, facilitating the annotation of more than 3.6 million peptide sequences with unknown biological activities, also registered in Peptipedia v2.0. Additionally, Peptipedia v2.0 introduces description tools based on structural and ontological properties and user-friendly machinelearning tools to facilitate the application of machine-learning strategies to study peptide sequences. Peptipedia v2.0 is accessible under the Creative Commons CC BY-NC-ND 4.0 license at https://peptipedia.cl/.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

bioRxiv

自引率

0.00%

发文量