Shamini Hemandhar Kumar, Ines Tapken, Daniela Kuhn, Peter Claus, Klaus Jung
{"title":"bootGSEA: a bootstrap and rank aggregation pipeline for multi-study and multi-omics enrichment analyses","authors":"Shamini Hemandhar Kumar, Ines Tapken, Daniela Kuhn, Peter Claus, Klaus Jung","doi":"10.3389/fbinf.2024.1380928","DOIUrl":"https://doi.org/10.3389/fbinf.2024.1380928","url":null,"abstract":"Introduction: Gene set enrichment analysis (GSEA) subsequent to differential expression analysis is a standard step in transcriptomics and proteomics data analysis. Although many tools for this step are available, the results are often difficult to reproduce because set annotations can change in the databases, that is, new features can be added or existing features can be removed. Finally, such changes in set compositions can have an impact on biological interpretation. Methods: We present bootGSEA, a novel computational pipeline, to study the robustness of GSEA. By repeating GSEA based on bootstrap samples, the variability and robustness of results can be studied. In our pipeline, not all genes or proteins are involved in the different bootstrap replicates of the analyses. Finally, we aggregate the ranks from the bootstrap replicates to obtain a score per gene set that shows whether it gains or loses evidence compared to the ranking of the standard GSEA. Rank aggregation is also used to combine GSEA results from different omics levels or from multiple independent studies at the same omics level. Results: By applying our approach to six independent cancer transcriptomics datasets, we showed that bootstrap GSEA can aid in the selection of more robust enriched gene sets. Additionally, we applied our approach to paired transcriptomics and proteomics data obtained from a mouse model of spinal muscular atrophy (SMA), a neurodegenerative and neurodevelopmental disease associated with multi-system involvement. After obtaining a robust ranking at both omics levels, both ranking lists were combined to aggregate the findings from the transcriptomics and proteomics results. Furthermore, we constructed the new R-package “bootGSEA,” which implements the proposed methods and provides graphical views of the findings. Bootstrap-based GSEA was able in the example datasets to identify gene or protein sets that were less robust when the set composition changed during bootstrap analysis. Discussion: The rank aggregation step was useful for combining bootstrap results and making them comparable to the original findings on the single-omics level or for combining findings from multiple different omics levels.","PeriodicalId":507586,"journal":{"name":"Frontiers in Bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140747342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alban Obel Slabowska, Charles Pyke, Henning Hvid, Leon Eyrich Jessen, Simon Baumgart, Vivek Das
{"title":"A systematic evaluation of state-of-the-art deconvolution methods in spatial transcriptomics: insights from cardiovascular disease and chronic kidney disease","authors":"Alban Obel Slabowska, Charles Pyke, Henning Hvid, Leon Eyrich Jessen, Simon Baumgart, Vivek Das","doi":"10.3389/fbinf.2024.1352594","DOIUrl":"https://doi.org/10.3389/fbinf.2024.1352594","url":null,"abstract":"A major challenge in sequencing-based spatial transcriptomics (ST) is resolution limitations. Tissue sections are divided into hundreds of thousands of spots, where each spot invariably contains a mixture of cell types. Methods have been developed to deconvolute the mixed transcriptional signal into its constituents. Although ST is becoming essential for drug discovery, especially in cardiometabolic diseases, to date, no deconvolution benchmark has been performed on these types of tissues and diseases. However, the three methods, Cell2location, RCTD, and spatialDWLS, have previously been shown to perform well in brain tissue and simulated data. Here, we compare these methods to assess the best performance when using human data from cardiovascular disease (CVD) and chronic kidney disease (CKD) from patients in different pathological states, evaluated using expert annotation. In this study, we found that all three methods performed comparably well in deconvoluting verifiable cell types, including smooth muscle cells and macrophages in vascular samples and podocytes in kidney samples. RCTD shows the best performance accuracy scores in CVD samples, while Cell2location, on average, achieved the highest performance across all test experiments. Although all three methods had similar accuracies, Cell2location needed less reference data to converge at the expense of higher computational intensity. Finally, we also report that RCTD has the fastest computational time and the simplest workflow, requiring fewer computational dependencies. In conclusion, we find that each method has particular advantages, and the optimal choice depends on the use case.","PeriodicalId":507586,"journal":{"name":"Frontiers in Bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140376150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An interactive visualization tool for educational outreach in protein contact map overlap analysis","authors":"Kevan Baker, Nathaniel Hughes, Sutanu Bhattacharya","doi":"10.3389/fbinf.2024.1358550","DOIUrl":"https://doi.org/10.3389/fbinf.2024.1358550","url":null,"abstract":"Recent advancements in contact map-based protein three-dimensional (3D) structure prediction have been driven by the evolution of deep learning algorithms. However, the gap in accessible software tools for novices in this domain remains a significant challenge. This study introduces GoFold, a novel, standalone graphical user interface (GUI) designed for beginners to perform contact map overlap (CMO) problems for better template selection. Unlike existing tools that cater more to research needs or assume foundational knowledge, GoFold offers an intuitive, user-friendly platform with comprehensive tutorials. It stands out in its ability to visually represent the CMO problem, allowing users to input proteins in various formats and explore the CMO problem. The educational value of GoFold is demonstrated through benchmarking against the state-of-the-art contact map overlap method, map_align, using two datasets: PSICOV and CAMEO. GoFold exhibits superior performance in terms of TM-score and Z-score metrics across diverse qualities of contact maps and target difficulties. Notably, GoFold runs efficiently on personal computers without any third-party dependencies, thereby making it accessible to the general public for promoting citizen science. The tool is freely available for download at for macOS, Linux, and Windows.","PeriodicalId":507586,"journal":{"name":"Frontiers in Bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140238864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Ten common issues with reference sequence databases and how to mitigate them","authors":"Samuel D. Chorlton","doi":"10.3389/fbinf.2024.1278228","DOIUrl":"https://doi.org/10.3389/fbinf.2024.1278228","url":null,"abstract":"Metagenomic sequencing has revolutionized our understanding of microbiology. While metagenomic tools and approaches have been extensively evaluated and benchmarked, far less attention has been given to the reference sequence database used in metagenomic classification. Issues with reference sequence databases are pervasive. Database contamination is the most recognized issue in the literature; however, it remains relatively unmitigated in most analyses. Other common issues with reference sequence databases include taxonomic errors, inappropriate inclusion and exclusion criteria, and sequence content errors. This review covers ten common issues with reference sequence databases and the potential downstream consequences of these issues. Mitigation measures are discussed for each issue, including bioinformatic tools and database curation strategies. Together, these strategies present a path towards more accurate, reproducible and translatable metagenomic sequencing.","PeriodicalId":507586,"journal":{"name":"Frontiers in Bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140238827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kai O. Kreissner, Benjamin Faller, Ivan Talucci, Hans M. Maric
{"title":"MARTin—an open-source platform for microarray analysis","authors":"Kai O. Kreissner, Benjamin Faller, Ivan Talucci, Hans M. Maric","doi":"10.3389/fbinf.2024.1329062","DOIUrl":"https://doi.org/10.3389/fbinf.2024.1329062","url":null,"abstract":"Background: Microarray technology has brought significant advancements to high-throughput analysis, particularly in the comprehensive study of biomolecular interactions involving proteins, peptides, and antibodies, as well as in the fields of gene expression and genotyping. With the ever-increasing volume and intricacy of microarray data, an accurate, reliable and reproducible analysis is essential. Furthermore, there is a high level of variation in the format of microarrays. This not only holds true between different sample types but is also due to differences in the hardware used during the production of the arrays, as well as the personal preferences of the individual users. Therefore, there is a need for transparent, broadly applicable and user-friendly image quantification techniques to extract meaningful information from these complex datasets, while also addressing the challenges posed by specific microarray and imager formats, which can flaw analysis and interpretation.Results: Here we introduce MicroArray Rastering Tool (MARTin), as a versatile tool developed primarily for the analysis of protein and peptide microarrays. Our software provides state-of-the-art methodologies, offering researchers a comprehensive tool for microarray image quantification. MARTin is independent of the microarray platform used and supports various configurations including high-density formats and printed arrays with significant x and y offsets. This is made possible by granting the user the ability to freely customize parts of the application to their specific microarray format. Thanks to built-in features like adaptive filtering and autofit, measurements can be done very efficiently and are highly reproducible. Furthermore, our tool integrates metadata management and integrity check features, providing a straightforward quality control method, along with a ready-to-use interface for in-depth data analysis. This not only promotes good scientific practice in the field of microarray analysis but also enhances the ability to explore and examine the generated data.Conclusion: MARTin has been developed to empower its users with a reliable, efficient, and intuitive tool for peptidomic and proteomic array analysis, thereby facilitating data-driven discovery across disciplines. Our software is an open-source project freely available via the GNU Affero General Public License licence on GitHub.","PeriodicalId":507586,"journal":{"name":"Frontiers in Bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139790675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A perspective on FAIR quality control in multiplexed imaging data processing","authors":"Wouter‐Michiel A.M. Vierdag, Sinem K. Saka","doi":"10.3389/fbinf.2024.1336257","DOIUrl":"https://doi.org/10.3389/fbinf.2024.1336257","url":null,"abstract":"Multiplexed imaging approaches are getting increasingly adopted for imaging of large tissue areas, yielding big imaging datasets both in terms of the number of samples and the size of image data per sample. The processing and analysis of these datasets is complex owing to frequent technical artifacts and heterogeneous profiles from a high number of stained targets To streamline the analysis of multiplexed images, automated pipelines making use of state-of-the-art algorithms have been developed. In these pipelines, the output quality of one processing step is typically dependent on the output of the previous step and errors from each step, even when they appear minor, can propagate and confound the results. Thus, rigorous quality control (QC) at each of these different steps of the image processing pipeline is of paramount importance both for the proper analysis and interpretation of the analysis results and for ensuring the reusability of the data. Ideally, QC should become an integral and easily retrievable part of the imaging datasets and the analysis process. Yet, limitations of the currently available frameworks make integration of interactive QC difficult for large multiplexed imaging data. Given the increasing size and complexity of multiplexed imaging datasets, we present the different challenges for integrating QC in image analysis pipelines as well as suggest possible solutions that build on top of recent advances in bioimage analysis.","PeriodicalId":507586,"journal":{"name":"Frontiers in Bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139850479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kai O. Kreissner, Benjamin Faller, Ivan Talucci, Hans M. Maric
{"title":"MARTin—an open-source platform for microarray analysis","authors":"Kai O. Kreissner, Benjamin Faller, Ivan Talucci, Hans M. Maric","doi":"10.3389/fbinf.2024.1329062","DOIUrl":"https://doi.org/10.3389/fbinf.2024.1329062","url":null,"abstract":"Background: Microarray technology has brought significant advancements to high-throughput analysis, particularly in the comprehensive study of biomolecular interactions involving proteins, peptides, and antibodies, as well as in the fields of gene expression and genotyping. With the ever-increasing volume and intricacy of microarray data, an accurate, reliable and reproducible analysis is essential. Furthermore, there is a high level of variation in the format of microarrays. This not only holds true between different sample types but is also due to differences in the hardware used during the production of the arrays, as well as the personal preferences of the individual users. Therefore, there is a need for transparent, broadly applicable and user-friendly image quantification techniques to extract meaningful information from these complex datasets, while also addressing the challenges posed by specific microarray and imager formats, which can flaw analysis and interpretation.Results: Here we introduce MicroArray Rastering Tool (MARTin), as a versatile tool developed primarily for the analysis of protein and peptide microarrays. Our software provides state-of-the-art methodologies, offering researchers a comprehensive tool for microarray image quantification. MARTin is independent of the microarray platform used and supports various configurations including high-density formats and printed arrays with significant x and y offsets. This is made possible by granting the user the ability to freely customize parts of the application to their specific microarray format. Thanks to built-in features like adaptive filtering and autofit, measurements can be done very efficiently and are highly reproducible. Furthermore, our tool integrates metadata management and integrity check features, providing a straightforward quality control method, along with a ready-to-use interface for in-depth data analysis. This not only promotes good scientific practice in the field of microarray analysis but also enhances the ability to explore and examine the generated data.Conclusion: MARTin has been developed to empower its users with a reliable, efficient, and intuitive tool for peptidomic and proteomic array analysis, thereby facilitating data-driven discovery across disciplines. Our software is an open-source project freely available via the GNU Affero General Public License licence on GitHub.","PeriodicalId":507586,"journal":{"name":"Frontiers in Bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139850376","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A perspective on FAIR quality control in multiplexed imaging data processing","authors":"Wouter‐Michiel A.M. Vierdag, Sinem K. Saka","doi":"10.3389/fbinf.2024.1336257","DOIUrl":"https://doi.org/10.3389/fbinf.2024.1336257","url":null,"abstract":"Multiplexed imaging approaches are getting increasingly adopted for imaging of large tissue areas, yielding big imaging datasets both in terms of the number of samples and the size of image data per sample. The processing and analysis of these datasets is complex owing to frequent technical artifacts and heterogeneous profiles from a high number of stained targets To streamline the analysis of multiplexed images, automated pipelines making use of state-of-the-art algorithms have been developed. In these pipelines, the output quality of one processing step is typically dependent on the output of the previous step and errors from each step, even when they appear minor, can propagate and confound the results. Thus, rigorous quality control (QC) at each of these different steps of the image processing pipeline is of paramount importance both for the proper analysis and interpretation of the analysis results and for ensuring the reusability of the data. Ideally, QC should become an integral and easily retrievable part of the imaging datasets and the analysis process. Yet, limitations of the currently available frameworks make integration of interactive QC difficult for large multiplexed imaging data. Given the increasing size and complexity of multiplexed imaging datasets, we present the different challenges for integrating QC in image analysis pipelines as well as suggest possible solutions that build on top of recent advances in bioimage analysis.","PeriodicalId":507586,"journal":{"name":"Frontiers in Bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139790460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MetaWin 3: open-source software for meta-analysis","authors":"Michael S. Rosenberg","doi":"10.3389/fbinf.2024.1305969","DOIUrl":"https://doi.org/10.3389/fbinf.2024.1305969","url":null,"abstract":"The rise of research synthesis and systematic reviews over the last 25 years has been aided by a series of software packages providing simple and accessible GUI interfaces which are intuitively easy to use by novice analysts and users. Development of many of these packages has been abandoned over time due to a variety of factors, leaving a gap in the software infrastructure available for meta-analysis. To fulfill the continued demand for a GUI-based meta-analytic system, we have now released MetaWin 3 as free, open-source, multi-platform software. MetaWin3 is written in Python and developed from scratch relative to earlier versions. The codebase is available on Github, with pre-compiled executables for both Windows and macOS available from the MetaWin website. MetaWin includes standardized effect size calculations, exploratory and publication bias analyses, and allows for both simple and complex explanatory models of variation within a meta-analytic framework, including meta-regression, using traditional least-squares/moments estimation.","PeriodicalId":507586,"journal":{"name":"Frontiers in Bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139791053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}