Bioinformatics (Oxford, England)最新文献

筛选
英文 中文
Bayesian gene set benchmark dose estimation for "omic" responses. 基因组反应的贝叶斯基因集基准剂量估计。
Bioinformatics (Oxford, England) Pub Date : 2024-12-26 DOI: 10.1093/bioinformatics/btaf008
Daniel Zilber, Kyle P Messier, John House, Fred Parham, Scott S Auerbach, Matthew W Wheeler
{"title":"Bayesian gene set benchmark dose estimation for \"omic\" responses.","authors":"Daniel Zilber, Kyle P Messier, John House, Fred Parham, Scott S Auerbach, Matthew W Wheeler","doi":"10.1093/bioinformatics/btaf008","DOIUrl":"10.1093/bioinformatics/btaf008","url":null,"abstract":"<p><strong>Motivation: </strong>Estimating a toxic reference point using tools like the benchmark dose (BMD) is a critical step in setting policy to regulate pollution and ensure safe environments. Toxicity can be measured for different endpoints, including changes in gene expression and histopathology for various tissues, and is typically explored one gene or tissue at a time in a univariate setting that ignores correlation. In this work, we develop a multivariate estimation procedure to estimate the BMD for specified gene sets. Our approach extends the foundational univariate approach by accounting for correlation in a statistically principled way.</p><p><strong>Results: </strong>We illustrate the method using data from a 5-day rat study and Hallmark gene sets and compare to existing BMD results computed by the EPA for both gene sets and apical histopathology endpoints. In contrast to previous ad-hoc methods, our principled approach provides the needed extension to bring the foundational univariate method into the multivariate world of transcriptomics. In addition to use in a regulatory setting, our method can provide hypothesis generation when gene sets correspond to mechanistic pathways.</p><p><strong>Availability and implementation: </strong>BS-BMD is implemented in R and C++ and available at https://github.com/NIEHS/BS-BMD.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11783320/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142960262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Naïve Bayes classifier++ for metagenomic taxonomic classification-query evaluation. Naïve贝叶斯分类器++用于宏基因组分类-查询评估。
Bioinformatics (Oxford, England) Pub Date : 2024-12-26 DOI: 10.1093/bioinformatics/btae743
Haozhe Neil Duan, Gavin Hearne, Robi Polikar, Gail L Rosen
{"title":"The Naïve Bayes classifier++ for metagenomic taxonomic classification-query evaluation.","authors":"Haozhe Neil Duan, Gavin Hearne, Robi Polikar, Gail L Rosen","doi":"10.1093/bioinformatics/btae743","DOIUrl":"10.1093/bioinformatics/btae743","url":null,"abstract":"<p><strong>Motivation: </strong>This study examines the query performance of the NBC++ (Incremental Naive Bayes Classifier) program for variations in canonicality, k-mer size, databases, and input sample data size. We demonstrate that both NBC++ and Kraken2 are influenced by database depth, with macro measures improving as depth increases. However, fully capturing the diversity of life, especially viruses, remains a challenge.</p><p><strong>Results: </strong>NBC++ can competitively profile the superkingdom content of metagenomic samples using a small training database. NBC++ spends less time training and can use a fraction of the memory than Kraken2 but at the cost of long querying time. Major NBC++ enhancements include accommodating canonical k-mer storage (leading to significant storage savings) and adaptable and optimized memory allocation that accelerates query analysis and enables the software to be run on nearly any system. Additionally, the output now includes log-likelihood values for each training genome, providing users with valuable confidence information.</p><p><strong>Availability and implementation: </strong>Source code and Dockerfile are available at http://github.com/EESI/Naive_Bayes.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11729721/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142866576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Scope+: an open source generalizable architecture for single-cell RNA-seq atlases at sample and cell levels. Scope+:用于样本和细胞水平的单细胞RNA-seq图谱的开源通用架构。
Bioinformatics (Oxford, England) Pub Date : 2024-12-26 DOI: 10.1093/bioinformatics/btae727
Danqing Yin, Yue Cao, Junyi Chen, Candice L Y Mak, Ken H O Yu, Jiaxuan Zhang, Jia Li, Yingxin Lin, Joshua W K Ho, Jean Y H Yang
{"title":"Scope+: an open source generalizable architecture for single-cell RNA-seq atlases at sample and cell levels.","authors":"Danqing Yin, Yue Cao, Junyi Chen, Candice L Y Mak, Ken H O Yu, Jiaxuan Zhang, Jia Li, Yingxin Lin, Joshua W K Ho, Jean Y H Yang","doi":"10.1093/bioinformatics/btae727","DOIUrl":"10.1093/bioinformatics/btae727","url":null,"abstract":"<p><strong>Summary: </strong>With the recent advancement in single-cell RNA-sequencing technologies and the increased availability of integrative tools, challenges arise in easy and fast access to large collections of cell atlas. Existing cell atlas portals rarely are open sourced and adaptable, and do not support meta-analysis at cell level. Here, we present an open source, highly optimized and scalable architecture, named Scope+, to allow quick access, meta-analysis and cell-level selection of the atlas data. We applied this architecture to our well-curated 5 million COVID-19 blood and immune cells, as a portal called Covidscope. We achieved efficient access to atlas-scale data via three strategies, such as cell-as-unit data modelling, novel database optimization techniques and innovative software architectural design. Scope+ serves as an open source architecture for researchers to build on with their own atlas.</p><p><strong>Availability and implementation: </strong>The COVID-19 web portal, data and meta-analysis are available on Covidscope (https://covidsc.d24h.hk/). User tutorials on how to implement Scope+ architecture with their atlases can be found at https://hiyin.github.io/scopeplus-user-tutorial/. Scope+ source code can be found at https://doi.org/10.5281/zenodo.14174632 and https://github.com/hiyin/scopeplus.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11755096/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142869805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Rnalib: a Python library for custom transcriptomics analyses. Rnalib:用于自定义转录组学分析的Python库。
Bioinformatics (Oxford, England) Pub Date : 2024-12-26 DOI: 10.1093/bioinformatics/btae751
Niko Popitsch, Stefan L Ameres
{"title":"Rnalib: a Python library for custom transcriptomics analyses.","authors":"Niko Popitsch, Stefan L Ameres","doi":"10.1093/bioinformatics/btae751","DOIUrl":"10.1093/bioinformatics/btae751","url":null,"abstract":"<p><strong>Motivation: </strong>The efficient and reproducible analysis of high-throughput sequencing datasets necessitates the development of methodical and robust computational pipelines that integrate established and bespoke bioinformatics analysis tools, often written in high-level programming languages such as Python. Despite the increasing availability of programming libraries for genomics, there is a noticeable lack of tools specifically focused on transcriptomics. Key tasks in this area include the association of gene features (e.g. transcript isoforms, introns or untranslated regions) with relevant subsections of (large) genomics datasets across diverse data formats, as well as efficient querying of these data based on genomic locations and annotation attributes.</p><p><strong>Results: </strong>To address the needs of transcriptomics data analyses, we developed rnalib, a Python library designed for creating custom bioinformatics analysis methods. Built on existing Python libraries like pysam and pyBigWig, rnalib offers random access support, enabling efficient access to relevant subregions of large, genome-wide datasets. Rnalib extends the filtering and access capabilities of these libraries and includes additional checks to prevent common errors when integrating genomics datasets. The library is centred on an object-oriented Transcriptome class that provides methods for stepwise annotation of gene features with both, local and remote data sources. The rnalib Application Programming Interface cleanly separates immutable genomic locations from associated, mutable data, and offers a wide range of methods for iterating, querying, and exporting collated datasets. Rnalib establishes a fast, readable, reproducible, and robust framework for developing novel transcriptomics data analysis tools and methods.</p><p><strong>Availability and implementation: </strong>Source code, documentation, and tutorials are available at https://github.com/popitsch/rnalib.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11734754/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142883906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Heterogeneous entity representation for medicinal synergy prediction. 异构实体表示用于药物协同预测。
Bioinformatics (Oxford, England) Pub Date : 2024-12-26 DOI: 10.1093/bioinformatics/btae750
Jiawei Wu, Jun Wen, Mingyuan Yan, Anqi Dong, Shuai Gao, Ren Wang, Can Chen
{"title":"Heterogeneous entity representation for medicinal synergy prediction.","authors":"Jiawei Wu, Jun Wen, Mingyuan Yan, Anqi Dong, Shuai Gao, Ren Wang, Can Chen","doi":"10.1093/bioinformatics/btae750","DOIUrl":"10.1093/bioinformatics/btae750","url":null,"abstract":"<p><strong>Motivation: </strong>Forecasting the synergistic effects of drug combinations facilitates drug discovery and development, especially regarding cancer therapeutics. While numerous computational methods have emerged, most of them fall short in fully modeling the relationships among clinical entities including drugs, cell lines, and diseases, which hampers their ability to generalize to drug combinations involving unseen drugs. These relationships are complex and multidimensional, requiring sophisticated modeling to capture nuanced interplay that can significantly influence therapeutic efficacy.</p><p><strong>Results: </strong>We present a novel deep hypergraph learning method named Heterogeneous Entity Representation for MEdicinal Synergy (HERMES) prediction to predict the synergistic effects of anti-cancer drugs. Heterogeneous data sources, including drug chemical structures, gene expression profiles, and disease clinical semantics, are integrated into hypergraph neural networks equipped with a gated residual mechanism to enhance high-order relationship modeling. HERMES demonstrates state-of-the-art performance on two benchmark datasets, significantly outperforming existing methods in predicting the synergistic effects of drug combinations, particularly in cases involving unseen drugs.</p><p><strong>Availability and implementation: </strong>The source code is available at https://github.com/Christina327/HERMES.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":"41 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11745903/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143018185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Proteoform identification and quantification based on alignment graphs. 基于比对图的变形形态鉴定与定量。
Bioinformatics (Oxford, England) Pub Date : 2024-12-26 DOI: 10.1093/bioinformatics/btaf007
Zhaohui Zhan, Lusheng Wang
{"title":"Proteoform identification and quantification based on alignment graphs.","authors":"Zhaohui Zhan, Lusheng Wang","doi":"10.1093/bioinformatics/btaf007","DOIUrl":"10.1093/bioinformatics/btaf007","url":null,"abstract":"<p><strong>Motivation: </strong>Proteoforms are the different forms of a proteins generated from the genome with various sequence variations, splice isoforms, and post-translational modifications. Proteoforms regulate protein structures and functions. A single protein can have multiple proteoforms due to different modification sites. Proteoform identification is to find proteoforms of a given protein that best fits the input spectrum. Proteoform quantification is to find the corresponding abundances of different proteoforms for a specific protein.</p><p><strong>Results: </strong>We proposed algorithms for proteoform identification and quantification based on the top-down tandem mass spectrum. In the combination alignments of the HomMTM spectrum and the reference protein, we need to give a correction of the mass for each matched peak within the pre-defined error range. After the correction, we impose that the mass between any two (not necessarily consecutive) matched nodes in the protein is identical to that of the corresponding two matched peaks in the HomMTM spectrum. We design a back-tracking graph to store such kind of information and find a combinatorial path (k paths) with the minimum sum of peak intensity error in this back-tracking graph. The obtained alignment can also show the relative abundance of these proteoforms (paths). Our experimental results demonstrate the algorithm's capability to identify and quantify proteoform combinations encompassing a greater number of peaks. This advancement holds promise for enhancing the accuracy and comprehensiveness of proteoform quantification, addressing a crucial need in the field of top-down MS-based proteomics.</p><p><strong>Availability and implementation: </strong>The software package are available at https://github.com/Zeirdo/TopMGQuant.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11769674/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142960270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Modeling multi-stage disease progression and identifying genetic risk factors via a novel collaborative learning method. 通过新型协作学习方法模拟多阶段疾病进展并识别遗传风险因素。
Bioinformatics (Oxford, England) Pub Date : 2024-12-26 DOI: 10.1093/bioinformatics/btae728
Duo Xi, Minjianan Zhang, Muheng Shang, Lei Du, Junwei Han
{"title":"Modeling multi-stage disease progression and identifying genetic risk factors via a novel collaborative learning method.","authors":"Duo Xi, Minjianan Zhang, Muheng Shang, Lei Du, Junwei Han","doi":"10.1093/bioinformatics/btae728","DOIUrl":"10.1093/bioinformatics/btae728","url":null,"abstract":"<p><strong>Motivation: </strong>Alzheimer's disease (AD) typically progresses gradually for ages rather than suddenly. Thus, staging AD progression in different phases could aid in accurate diagnosis and treatment. In addition, identifying genetic variations that influence AD is critical to understanding the pathogenesis. However, staging the disease progression and identifying genetic variations is usually handled separately.</p><p><strong>Results: </strong>To address this limitation, we propose a novel sparse multi-stage multi-task mixed-effects collaborative longitudinal regression method (MSColoR). Our method jointly models long disease progression as a multi-stage procedure and identifies genetic risk factors underpinning this complex trajectory. Specifically, MSColoR models multi-stage disease progression using longitudinal neuroimaging-derived phenotypes and associates the fitted disease trajectories with genetic variations at each stage. Furthermore, we collaboratively leverage summary statistics from large genome-wide association studies to improve the powers. Finally, an efficient optimization algorithm is introduced to solve MSColoR. We evaluate our method using both synthetic and real longitudinal neuroimaging and genetic data. Both results demonstrate that MSColoR can reduce modeling errors while identifying more accurate and significant genetic variations compared to other longitudinal methods. Consequently, MSColoR holds great potential as a computational technique for longitudinal brain imaging genetics and AD studies.</p><p><strong>Availability and implementation: </strong>The code is publicly available at https://github.com/dulei323/MSColoR.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11784593/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142831217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
EnrichRBP: an automated and interpretable computational platform for predicting and analysing RNA-binding protein events. enrichment rbp:一个自动化的、可解释的计算平台,用于预测和分析rna结合蛋白事件。
Bioinformatics (Oxford, England) Pub Date : 2024-12-26 DOI: 10.1093/bioinformatics/btaf018
Yubo Wang, Haoran Zhu, Yansong Wang, Yuning Yang, Yujian Huang, Jian Zhang, Ka-Chun Wong, Xiangtao Li
{"title":"EnrichRBP: an automated and interpretable computational platform for predicting and analysing RNA-binding protein events.","authors":"Yubo Wang, Haoran Zhu, Yansong Wang, Yuning Yang, Yujian Huang, Jian Zhang, Ka-Chun Wong, Xiangtao Li","doi":"10.1093/bioinformatics/btaf018","DOIUrl":"10.1093/bioinformatics/btaf018","url":null,"abstract":"<p><strong>Motivation: </strong>Predicting RNA-binding proteins (RBPs) is central to understanding post-transcriptional regulatory mechanisms. Here, we introduce EnrichRBP, an automated and interpretable computational platform specifically designed for the comprehensive analysis of RBP interactions with RNA.</p><p><strong>Results: </strong>EnrichRBP is a web service that enables researchers to develop original deep learning and machine learning architectures to explore the complex dynamics of RBPs. The platform supports 70 deep learning algorithms, covering feature representation, selection, model training, comparison, optimization, and evaluation, all integrated within an automated pipeline. EnrichRBP is adept at providing comprehensive visualizations, enhancing model interpretability, and facilitating the discovery of functionally significant sequence regions crucial for RBP interactions. In addition, EnrichRBP supports base-level functional annotation tasks, offering explanations and graphical visualizations that confirm the reliability of the predicted RNA-binding sites. Leveraging high-performance computing, EnrichRBP provides ultra-fast predictions ranging from seconds to hours, applicable to both pre-trained and custom model scenarios, thus proving its utility in real-world applications. Case studies highlight that EnrichRBP provides robust and interpretable predictions, demonstrating the power of deep learning in the functional analysis of RBP interactions. Finally, EnrichRBP aims to enhance the reproducibility of computational method analyses for RBP sequences, as well as reduce the programming and hardware requirements for biologists, thereby offering meaningful functional insights.</p><p><strong>Availability and implementation: </strong>EnrichRBP is available at https://airbp.aibio-lab.com/. The source code is available at https://github.com/wangyb97/EnrichRBP, and detailed online documentation can be found at https://enrichrbp.readthedocs.io/en/latest/.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11783304/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142973819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FuncFetch: an LLM-assisted workflow enables mining thousands of enzyme-substrate interactions from published manuscripts. FuncFetch: llm辅助的工作流程可以从已发表的手稿中挖掘数千种酶-底物相互作用。
Bioinformatics (Oxford, England) Pub Date : 2024-12-26 DOI: 10.1093/bioinformatics/btae756
Nathaniel Smith, Xinyu Yuan, Chesney Melissinos, Gaurav Moghe
{"title":"FuncFetch: an LLM-assisted workflow enables mining thousands of enzyme-substrate interactions from published manuscripts.","authors":"Nathaniel Smith, Xinyu Yuan, Chesney Melissinos, Gaurav Moghe","doi":"10.1093/bioinformatics/btae756","DOIUrl":"10.1093/bioinformatics/btae756","url":null,"abstract":"<p><strong>Motivation: </strong>Thousands of genomes are publicly available, however, most genes in those genomes have poorly defined functions. This is partly due to a gap between previously published, experimentally characterized protein activities and activities deposited in databases. This activity deposition is bottlenecked by the time-consuming biocuration process. The emergence of large language models presents an opportunity to speed up the text-mining of protein activities for biocuration.</p><p><strong>Results: </strong>We developed FuncFetch-a workflow that integrates NCBI E-Utilities, OpenAI's GPT-4, and Zotero-to screen thousands of manuscripts and extract enzyme activities. Extensive validation revealed high precision and recall of GPT-4 in determining whether the abstract of a given paper indicates the presence of a characterized enzyme activity in that paper. Provided the manuscript, FuncFetch extracted data such as species information, enzyme names, sequence identifiers, substrates, and products, which were subjected to extensive quality analyses. Comparison of this workflow against a manually curated dataset of BAHD acyltransferase activities demonstrated a precision/recall of 0.86/0.64 in extracting substrates. We further deployed FuncFetch on nine large plant enzyme families. Screening 26 543 papers, FuncFetch retrieved 32 605 entries from 5459 selected papers. We also identified multiple extraction errors including incorrect associations, nontarget enzymes, and hallucinations, which highlight the need for further manual curation. The BAHD activities were verified, resulting in a comprehensive functional fingerprint of this family and revealing that ∼70% of the experimentally characterized enzymes are uncurated in the public domain. FuncFetch represents an advance in biocuration and lays the groundwork for predicting the functions of uncharacterized enzymes.</p><p><strong>Availability and implementation: </strong>Code and minimally curated activities are available at: https://github.com/moghelab/funcfetch and https://tools.moghelab.org/funczymedb.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11734755/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142883899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RNA-TorsionBERT: leveraging language models for RNA 3D torsion angles prediction. RNA- torsionbert:利用语言模型进行RNA三维扭转角预测。
Bioinformatics (Oxford, England) Pub Date : 2024-12-26 DOI: 10.1093/bioinformatics/btaf004
Clément Bernard, Guillaume Postic, Sahar Ghannay, Fariza Tahi
{"title":"RNA-TorsionBERT: leveraging language models for RNA 3D torsion angles prediction.","authors":"Clément Bernard, Guillaume Postic, Sahar Ghannay, Fariza Tahi","doi":"10.1093/bioinformatics/btaf004","DOIUrl":"10.1093/bioinformatics/btaf004","url":null,"abstract":"<p><strong>Motivation: </strong>Predicting the 3D structure of RNA is an ongoing challenge that has yet to be completely addressed despite continuous advancements. RNA 3D structures rely on distances between residues and base interactions but also backbone torsional angles. Knowing the torsional angles for each residue could help reconstruct its global folding, which is what we tackle in this work. This paper presents a novel approach for directly predicting RNA torsional angles from raw sequence data. Our method draws inspiration from the successful application of language models in various domains and adapts them to RNA.</p><p><strong>Results: </strong>We have developed a language-based model, RNA-TorsionBERT, incorporating better sequential interactions for predicting RNA torsional and pseudo-torsional angles from the sequence only. Through extensive benchmarking, we demonstrate that our method improves the prediction of torsional angles compared to state-of-the-art methods. In addition, by using our predictive model, we have inferred a torsion angle-dependent scoring function, called TB-MCQ, that replaces the true reference angles by our model prediction. We show that it accurately evaluates the quality of near-native predicted structures, in terms of RNA backbone torsion angle values. Our work demonstrates promising results, suggesting the potential utility of language models in advancing RNA 3D structure prediction.</p><p><strong>Availability and implementation: </strong>Source code is freely available on the EvryRNA platform: https://evryrna.ibisc.univ-evry.fr/evryrna/RNA-TorsionBERT.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11758789/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142960272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信