Bioinformatics advances最新文献

筛选
英文 中文
Changes in total charge on spike protein of SARS-CoV-2 in emerging lineages. SARS-CoV-2 新血统尖峰蛋白总电荷的变化。
Bioinformatics advances Pub Date : 2024-04-08 eCollection Date: 2024-01-01 DOI: 10.1093/bioadv/vbae053
Anže Božič, Rudolf Podgornik
{"title":"Changes in total charge on spike protein of SARS-CoV-2 in emerging lineages.","authors":"Anže Božič, Rudolf Podgornik","doi":"10.1093/bioadv/vbae053","DOIUrl":"https://doi.org/10.1093/bioadv/vbae053","url":null,"abstract":"<p><strong>Motivation: </strong>Charged amino acid residues on the spike protein of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) have been shown to influence its binding to different cell surface receptors, its non-specific electrostatic interactions with the environment, and its structural stability and conformation. It is therefore important to obtain a good understanding of amino acid mutations that affect the total charge on the spike protein which have arisen across different SARS-CoV-2 lineages during the course of the virus' evolution.</p><p><strong>Results: </strong>We analyse the change in the number of ionizable amino acids and the corresponding total charge on the spike proteins of almost 2200 SARS-CoV-2 lineages that have emerged over the span of the pandemic. Our results show that the previously observed trend toward an increase in the positive charge on the spike protein of SARS-CoV-2 variants of concern has essentially stopped with the emergence of the early omicron variants. Furthermore, recently emerged lineages show a greater diversity in terms of their composition of ionizable amino acids. We also demonstrate that the patterns of change in the number of ionizable amino acids on the spike protein are characteristic of related lineages within the broader clade division of the SARS-CoV-2 phylogenetic tree. Due to the ubiquity of electrostatic interactions in the biological environment, our findings are relevant for a broad range of studies dealing with the structural stability of SARS-CoV-2 and its interactions with the environment.</p><p><strong>Availability and implementation: </strong>The data underlying the article are available in the Supplementary material.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae053"},"PeriodicalIF":0.0,"publicationDate":"2024-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11031363/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140874999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
WAS IT A MATch I SAW? Approximate palindromes lead to overstated false match rates in benchmarks using reversed sequences. 我看到的是匹配吗?在使用反向序列的基准测试中,近似回文导致高估的错误匹配率。
Bioinformatics advances Pub Date : 2024-04-08 eCollection Date: 2024-01-01 DOI: 10.1093/bioadv/vbae052
George Glidden-Handgis, Travis J Wheeler
{"title":"WAS IT A MATch I SAW? Approximate palindromes lead to overstated false match rates in benchmarks using reversed sequences.","authors":"George Glidden-Handgis, Travis J Wheeler","doi":"10.1093/bioadv/vbae052","DOIUrl":"10.1093/bioadv/vbae052","url":null,"abstract":"<p><strong>Background: </strong>Software for labeling biological sequences typically produces a theory-based statistic for each match (the E-value) that indicates the likelihood of seeing that match's score by chance. E-values accurately predict false match rate for comparisons of random (shuffled) sequences, and thus provide a reasoned mechanism for setting score thresholds that enable high sensitivity with low expected false match rate. This threshold-setting strategy is challenged by real biological sequences, which contain regions of local repetition and low sequence complexity that cause excess matches between non-homologous sequences. Knowing this, tool developers often develop benchmarks that use realistic-seeming decoy sequences to explore empirical tradeoffs between sensitivity and false match rate. A recent trend has been to employ reversed biological sequences as realistic decoys, because these preserve the distribution of letters and the existence of local repeats, while disrupting the original sequence's functional properties. However, we and others have observed that sequences appear to produce high scoring alignments to their reversals with surprising frequency, leading to overstatement of false match risk that may negatively affect downstream analysis.</p><p><strong>Results: </strong>We demonstrate that an alignment between a sequence S and its (possibly mutated) reversal tends to produce higher scores than alignment between truly unrelated sequences, even when S is a shuffled string with no notable repetitive or low-complexity regions. This phenomenon is due to the unintuitive fact that (even randomly shuffled) sequences contain palindromes that are on average longer than the longest common substrings (LCS) shared between permuted variants of the same sequence. Though the expected palindrome length is only slightly larger than the expected LCS, the distribution of alignment scores involving reversed sequences is strongly right-shifted, leading to greatly increased frequency of high-scoring alignments to reversed sequences.</p><p><strong>Impact: </strong>Overestimates of false match risk can motivate unnecessarily high score thresholds, leading to potentially reduced true match sensitivity. Also, when tool sensitivity is only reported up to the score of the first matched decoy sequence, a large decoy set consisting of reversed sequences can obscure sensitivity differences between tools. As a result of these observations, we advise that reversed biological sequences be used as decoys only when care is taken to remove positive matches in the original (un-reversed) sequences, or when overstatement of false labeling is not a concern. Though the primary focus of the analysis is on sequence annotation, we also demonstrate that the prevalence of internal palindromes may lead to an overstatement of the rate of false labels in protein identification with mass spectrometry.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae052"},"PeriodicalIF":0.0,"publicationDate":"2024-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11099658/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141066149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Single-cell type annotation with deep learning in 265 cell types for humans. 利用深度学习对人类 265 种细胞类型进行单细胞类型注释。
Bioinformatics advances Pub Date : 2024-04-08 eCollection Date: 2024-01-01 DOI: 10.1093/bioadv/vbae054
Sherry Dong, Kaiwen Deng, Xiuzhen Huang
{"title":"Single-cell type annotation with deep learning in 265 cell types for humans.","authors":"Sherry Dong, Kaiwen Deng, Xiuzhen Huang","doi":"10.1093/bioadv/vbae054","DOIUrl":"https://doi.org/10.1093/bioadv/vbae054","url":null,"abstract":"<p><strong>Motivation: </strong>Annotating cell types is a challenging yet essential task in analyzing single-cell RNA sequencing data. However, due to the lack of a gold standard, it is difficult to evaluate the algorithms fairly and an overfitting algorithm may be favored in benchmarks. To address this challenge, we developed a deep learning-based single-cell type prediction tool that assigns the cell type to 265 different cell types for humans, based on data from approximately five million cells.</p><p><strong>Results: </strong>We achieved a median area under the ROC curve (AUC) of 0.93 when evaluated across datasets. We found that inconsistent labeling in the existing database generated by different labs contributed to the mistakes of the model. Therefore, we used cell ontology to correct the annotations and retrained the model, which resulted in 0.971 median AUC. Our study reveals a limiting factor of the accuracy one may achieve with the current database annotation and points to the solutions towards an algorithm-based correction of the gold standard for future automated cell annotation approaches.</p><p><strong>Availability and implementation: </strong>The code is available at: https://github.com/SherrySDong/Hierarchical-Correction-Improves-Automated-Single-cell-Type-Annotation. Data used in this study are listed in Supplementary Table S1 and are retrievable at the CZI database.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae054"},"PeriodicalIF":0.0,"publicationDate":"2024-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11031354/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140869341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ExplaineR: an R package to explain machine learning models. ExplaineR: 用于解释机器学习模型的 R 软件包。
Bioinformatics advances Pub Date : 2024-03-26 eCollection Date: 2024-01-01 DOI: 10.1093/bioadv/vbae049
Ramtin Zargari Marandi
{"title":"ExplaineR: an R package to explain machine learning models.","authors":"Ramtin Zargari Marandi","doi":"10.1093/bioadv/vbae049","DOIUrl":"https://doi.org/10.1093/bioadv/vbae049","url":null,"abstract":"<p><strong>Summary: </strong>SHapley Additive exPlanations (SHAP) is a widely used method for model interpretation. However, its full potential often remains untapped due to the absence of dedicated software tools. In response, <i>ExplaineR</i>, an R package to facilitate interpretation of binary classification and regression models based on clustering functionality for SHAP analysis is introduced here. It additionally offers user-interactive elements in visualizations for evaluating model performance, fairness analysis, decision-curve analysis, and a diverse range of SHAP plots. It facilitates in-depth post-prediction analysis of models, enabling users to pinpoint potentially significant patterns in SHAP plots and subsequently trace them back to instances through SHAP clustering. This functionality is particularly valuable for identifying patient subgroups in clinical cohorts, thus enhancing its role as a robust profiling tool. <i>ExplaineR</i> empowers users to generate comprehensive reports on machine learning outcomes, ensuring consistent and thorough documentation of model performance and interpretations.</p><p><strong>Availability and implementation: </strong><i>ExplaineR</i> 1.0.0 is available on GitHub (https://persimune.github.io/explainer/) and CRAN (https://cran.r-project.org/web/packages/explainer/index.html).</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae049"},"PeriodicalIF":0.0,"publicationDate":"2024-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10994716/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140859282","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Text-mining-based feature selection for anticancer drug response prediction. 基于文本挖掘的抗癌药物反应预测特征选择。
Bioinformatics advances Pub Date : 2024-03-26 eCollection Date: 2024-01-01 DOI: 10.1093/bioadv/vbae047
Grace Wu, Arvin Zaker, Amirhosein Ebrahimi, Shivanshi Tripathi, Arvind Singh Mer
{"title":"Text-mining-based feature selection for anticancer drug response prediction.","authors":"Grace Wu, Arvin Zaker, Amirhosein Ebrahimi, Shivanshi Tripathi, Arvind Singh Mer","doi":"10.1093/bioadv/vbae047","DOIUrl":"https://doi.org/10.1093/bioadv/vbae047","url":null,"abstract":"<p><strong>Motivation: </strong>Predicting anticancer treatment response from baseline genomic data is a critical obstacle in personalized medicine. Machine learning methods are commonly used for predicting drug response from gene expression data. In the process of constructing these machine learning models, one of the most significant challenges is identifying appropriate features among a massive number of genes.</p><p><strong>Results: </strong>In this study, we utilize features (genes) extracted using the text-mining of scientific literatures. Using two independent cancer pharmacogenomic datasets, we demonstrate that text-mining-based features outperform traditional feature selection techniques in machine learning tasks. In addition, our analysis reveals that text-mining feature-based machine learning models trained on <i>in vitro</i> data also perform well when predicting the response of <i>in vivo</i> cancer models. Our results demonstrate that text-mining-based feature selection is an easy to implement approach that is suitable for building machine learning models for anticancer drug response prediction.</p><p><strong>Availability and implementation: </strong>https://github.com/merlab/text_features.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae047"},"PeriodicalIF":0.0,"publicationDate":"2024-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11009020/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140869478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CATD: a reproducible pipeline for selecting cell-type deconvolution methods across tissues. CATD:用于选择跨组织细胞类型解卷积方法的可重复管道。
Bioinformatics advances Pub Date : 2024-03-23 eCollection Date: 2024-01-01 DOI: 10.1093/bioadv/vbae048
Anna Vathrakokoili Pournara, Zhichao Miao, Ozgur Yilimaz Beker, Nadja Nolte, Alvis Brazma, Irene Papatheodorou
{"title":"CATD: a reproducible pipeline for selecting cell-type deconvolution methods across tissues.","authors":"Anna Vathrakokoili Pournara, Zhichao Miao, Ozgur Yilimaz Beker, Nadja Nolte, Alvis Brazma, Irene Papatheodorou","doi":"10.1093/bioadv/vbae048","DOIUrl":"https://doi.org/10.1093/bioadv/vbae048","url":null,"abstract":"<p><strong>Motivation: </strong>Cell-type deconvolution methods aim to infer cell composition from bulk transcriptomic data. The proliferation of developed methods coupled with inconsistent results obtained in many cases, highlights the pressing need for guidance in the selection of appropriate methods. Additionally, the growing accessibility of single-cell RNA sequencing datasets, often accompanied by bulk expression from related samples enable the benchmark of existing methods.</p><p><strong>Results: </strong>In this study, we conduct a comprehensive assessment of 31 methods, utilizing single-cell RNA-sequencing data from diverse human and mouse tissues. Employing various simulation scenarios, we reveal the efficacy of regression-based deconvolution methods, highlighting their sensitivity to reference choices. We investigate the impact of bulk-reference differences, incorporating variables such as sample, study and technology. We provide validation using a gold standard dataset from mononuclear cells and suggest a consensus prediction of proportions when ground truth is not available. We validated the consensus method on data from the stomach and studied its spillover effect. Importantly, we propose the use of the critical assessment of transcriptomic deconvolution (CATD) pipeline which encompasses functionalities for generating references and pseudo-bulks and running implemented deconvolution methods. CATD streamlines simultaneous deconvolution of numerous bulk samples, providing a practical solution for speeding up the evaluation of newly developed methods.</p><p><strong>Availability and implementation: </strong>https://github.com/Papatheodorou-Group/CATD_snakemake.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae048"},"PeriodicalIF":0.0,"publicationDate":"2024-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11023940/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140866913","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
dingo: a Python package for metabolic flux sampling. dingo:用于代谢通量采样的 Python 软件包。
Bioinformatics advances Pub Date : 2024-03-22 eCollection Date: 2024-01-01 DOI: 10.1093/bioadv/vbae037
Apostolos Chalkis, Vissarion Fisikopoulos, Elias Tsigaridas, Haris Zafeiropoulos
{"title":"dingo: a Python package for metabolic flux sampling.","authors":"Apostolos Chalkis, Vissarion Fisikopoulos, Elias Tsigaridas, Haris Zafeiropoulos","doi":"10.1093/bioadv/vbae037","DOIUrl":"https://doi.org/10.1093/bioadv/vbae037","url":null,"abstract":"<p><p>We present dingo, a Python package that supports a variety of methods to sample from the flux space of metabolic models, based on state-of-the-art random walks and rounding methods. For uniform sampling, dingo's sampling methods provide significant speed-ups and outperform existing software. Indicatively, dingo can sample from the flux space of the largest metabolic model up to now (Recon3D) in less than a day using a personal computer, under several statistical guarantees; this computation is out of reach for other similar software. In addition, dingo supports common analysis methods, such as flux balance analysis and flux variability analysis, and visualization components. dingo contributes to the arsenal of tools in metabolic modelling by enabling flux sampling in high dimensions (in the order of thousands).</p><p><strong>Availability and implementation: </strong>The dingo Python library is available in GitHub at https://github.com/GeomScale/dingo and the data underlying this article are available in https://doi.org/10.5281/zenodo.10423335.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae037"},"PeriodicalIF":0.0,"publicationDate":"2024-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10997433/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140871553","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Data-driven information extraction and enrichment of molecular profiling data for cancer cell lines. 数据驱动的信息提取和癌症细胞系分子图谱数据的富集。
Bioinformatics advances Pub Date : 2024-03-16 eCollection Date: 2024-01-01 DOI: 10.1093/bioadv/vbae045
Ellery Smith, Rahel Paloots, Dimitris Giagkos, Michael Baudis, Kurt Stockinger
{"title":"Data-driven information extraction and enrichment of molecular profiling data for cancer cell lines.","authors":"Ellery Smith, Rahel Paloots, Dimitris Giagkos, Michael Baudis, Kurt Stockinger","doi":"10.1093/bioadv/vbae045","DOIUrl":"10.1093/bioadv/vbae045","url":null,"abstract":"<p><strong>Motivation: </strong>With the proliferation of research means and computational methodologies, published biomedical literature is growing exponentially in numbers and volume. Cancer cell lines are frequently used models in biological and medical research that are currently applied for a wide range of purposes, from studies of cellular mechanisms to drug development, which has led to a wealth of related data and publications. Sifting through large quantities of text to gather relevant information on cell lines of interest is tedious and extremely slow when performed by humans. Hence, novel computational information extraction and correlation mechanisms are required to boost meaningful knowledge extraction.</p><p><strong>Results: </strong>In this work, we present the design, implementation, and application of a novel data extraction and exploration system. This system extracts deep semantic relations between textual entities from scientific literature to enrich existing structured clinical data concerning cancer cell lines. We introduce a new public data exploration portal, which enables automatic linking of genomic copy number variants plots with ranked, related entities such as affected genes. Each relation is accompanied by literature-derived evidences, allowing for deep, yet rapid, literature search, using existing structured data as a springboard.</p><p><strong>Availability and implementation: </strong>Our system is publicly available on the web at https://cancercelllines.org.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae045"},"PeriodicalIF":0.0,"publicationDate":"2024-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10978572/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140337879","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DeGenPrime provides robust primer design and optimization unlocking the biosphere. DeGenPrime 提供强大的引物设计和优化功能,为生物圈解锁。
Bioinformatics advances Pub Date : 2024-03-14 eCollection Date: 2024-01-01 DOI: 10.1093/bioadv/vbae044
Bryan Fulghum, Sophie H Tanker, Richard Allen White
{"title":"DeGenPrime provides robust primer design and optimization unlocking the biosphere.","authors":"Bryan Fulghum, Sophie H Tanker, Richard Allen White","doi":"10.1093/bioadv/vbae044","DOIUrl":"https://doi.org/10.1093/bioadv/vbae044","url":null,"abstract":"<p><strong>Motivation: </strong>Polymerase chain reaction (PCR) is the world's most important molecular diagnostic with applications ranging from medicine to ecology. PCR can fail because of poor primer design. The nearest-neighbor thermodynamic properties, picking conserved regions, and filtration via penalty of oligonucleotides form the basis for good primer design.</p><p><strong>Results: </strong>DeGenPrime is a console-based high-quality PCR primer design tool that can utilize MSA formats and degenerate bases expanding the target range for a single primer set. Our software utilizes thermodynamic properties, filtration metrics, penalty scoring, and conserved region finding of any proposed primer. It has degeneracy, repeated <i>k</i>-mers, relative GC content, and temperature range filters. Minimal penalty scoring is included according to secondary structure self-dimerization metrics, GC clamping, tri- and tetra-loop hairpins, and internal repetition. We compared PrimerDesign-M, DegePrime, ConsensusPrimer, and DeGenPrime on acceptable primer yield. PrimerDesign-M, DegePrime, and ConsensusPrimer provided 0%, 11%, and 17% yield, respectively, for the alternative iron nitrogenase (<i>anfD</i>) gene target. DeGenPrime successfully identified quality primers within the conserved regions of the T4-like phage major capsid protein (<i>g23</i>), conserved regions of molybdenum-based nitrogenase (<i>nif</i>), and its alternatives vanadium (<i>vnf</i>) and iron (<i>anf</i>) nitrogenase. DeGenPrime provides a universal and scalable primer design tool for the entire tree of life.</p><p><strong>Availability and implementation: </strong>DeGenPrime is written in C++ and distributed under a BSD-3-Clause license. The source code for DeGenPrime is freely available on www.github.com/raw-lab/degenprime.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae044"},"PeriodicalIF":0.0,"publicationDate":"2024-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11001487/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140862861","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CAFA-evaluator: a Python tool for benchmarking ontological classification methods. CAFA-evaluator:本体分类方法基准测试的 Python 工具。
Bioinformatics advances Pub Date : 2024-03-14 eCollection Date: 2024-01-01 DOI: 10.1093/bioadv/vbae043
Damiano Piovesan, Davide Zago, Parnal Joshi, M Clara De Paolis Kaluza, Mahta Mehdiabadi, Rashika Ramola, Alexander Miguel Monzon, Walter Reade, Iddo Friedberg, Predrag Radivojac, Silvio C E Tosatto
{"title":"CAFA-evaluator: a Python tool for benchmarking ontological classification methods.","authors":"Damiano Piovesan, Davide Zago, Parnal Joshi, M Clara De Paolis Kaluza, Mahta Mehdiabadi, Rashika Ramola, Alexander Miguel Monzon, Walter Reade, Iddo Friedberg, Predrag Radivojac, Silvio C E Tosatto","doi":"10.1093/bioadv/vbae043","DOIUrl":"10.1093/bioadv/vbae043","url":null,"abstract":"<p><p>We present CAFA-evaluator, a powerful Python program designed to evaluate the performance of prediction methods on targets with hierarchical concept dependencies. It generalizes multi-label evaluation to modern ontologies where the prediction targets are drawn from a directed acyclic graph and achieves high efficiency by leveraging matrix computation and topological sorting. The program requirements include a small number of standard Python libraries, making CAFA-evaluator easy to maintain. The code replicates the Critical Assessment of protein Function Annotation (CAFA) benchmarking, which evaluates predictions of the consistent subgraphs in Gene Ontology. Owing to its reliability and accuracy, the organizers have selected CAFA-evaluator as the official CAFA evaluation software.</p><p><strong>Availability and implementation: </strong>https://pypi.org/project/cafaeval.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae043"},"PeriodicalIF":0.0,"publicationDate":"2024-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10965419/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140308052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信