Lucas Pagano, Guillaume Thibault, Walid Bousselham, Jessica L Riesterer, Xubo Song, Joe W Gray
{"title":"Efficient semi-supervised semantic segmentation of electron microscopy cancer images with sparse annotations.","authors":"Lucas Pagano, Guillaume Thibault, Walid Bousselham, Jessica L Riesterer, Xubo Song, Joe W Gray","doi":"10.3389/fbinf.2023.1308707","DOIUrl":"10.3389/fbinf.2023.1308707","url":null,"abstract":"<p><p>Electron microscopy (EM) enables imaging at a resolution of nanometers and can shed light on how cancer evolves to develop resistance to therapy. Acquiring these images has become a routine task.However, analyzing them is now a bottleneck, as manual structure identification is very time-consuming and can take up to several months for a single sample. Deep learning approaches offer a suitable solution to speed up the analysis. In this work, we present a study of several state-of-the-art deep learning models for the task of segmenting nuclei and nucleoli in volumes from tumor biopsies. We compared previous results obtained with the ResUNet architecture to the more recent UNet++, FracTALResNet, SenFormer, and CEECNet models. In addition, we explored the utilization of unlabeled images through semi-supervised learning with Cross Pseudo Supervision. We have trained and evaluated all of the models on sparse manual labels from three fully annotated in-house datasets that we have made available on demand, demonstrating improvements in terms of 3D Dice score. From the analysis of these results, we drew conclusions on the relative gains of using more complex models, and semi-supervised learning as well as the next steps for the mitigation of the manual segmentation bottleneck.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"3 ","pages":"1308707"},"PeriodicalIF":2.8,"publicationDate":"2023-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10757843/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139076063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Archana Machireddy, Guillaume Thibault, Kevin G. Loftis, Kevin Stoltz, Cecilia Bueno, Hannah R. Smith, J. Riesterer, Joe W. Gray, Xubo Song
{"title":"Segmentation of cellular ultrastructures on sparsely labeled 3D electron microscopy images using deep learning","authors":"Archana Machireddy, Guillaume Thibault, Kevin G. Loftis, Kevin Stoltz, Cecilia Bueno, Hannah R. Smith, J. Riesterer, Joe W. Gray, Xubo Song","doi":"10.3389/fbinf.2023.1308708","DOIUrl":"https://doi.org/10.3389/fbinf.2023.1308708","url":null,"abstract":"Focused ion beam-scanning electron microscopy (FIB-SEM) images can provide a detailed view of the cellular ultrastructure of tumor cells. A deeper understanding of their organization and interactions can shed light on cancer mechanisms and progression. However, the bottleneck in the analysis is the delineation of the cellular structures to enable quantitative measurements and analysis. We mitigated this limitation using deep learning to segment cells and subcellular ultrastructure in 3D FIB-SEM images of tumor biopsies obtained from patients with metastatic breast and pancreatic cancers. The ultrastructures, such as nuclei, nucleoli, mitochondria, endosomes, and lysosomes, are relatively better defined than their surroundings and can be segmented with high accuracy using a neural network trained with sparse manual labels. Cell segmentation, on the other hand, is much more challenging due to the lack of clear boundaries separating cells in the tissue. We adopted a multi-pronged approach combining detection, boundary propagation, and tracking for cell segmentation. Specifically, a neural network was employed to detect the intracellular space; optical flow was used to propagate cell boundaries across the z-stack from the nearest ground truth image in order to facilitate the separation of individual cells; finally, the filopodium-like protrusions were tracked to the main cells by calculating the intersection over union measure for all regions detected in consecutive images along z-stack and connecting regions with maximum overlap. The proposed cell segmentation methodology resulted in an average Dice score of 0.93. For nuclei, nucleoli, and mitochondria, the segmentation achieved Dice scores of 0.99, 0.98, and 0.86, respectively. The segmentation of FIB-SEM images will enable interpretative rendering and provide quantitative image features to be associated with relevant clinical variables.","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"209 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138997115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jack M Craig, Grace L. Bamba, Jose Barba-Montoya, S. Hedges, Sudhir Kumar, Sankar Subramanian, Yuanning Li, Gagandeep Singh
{"title":"Completing a molecular timetree of apes and monkeys","authors":"Jack M Craig, Grace L. Bamba, Jose Barba-Montoya, S. Hedges, Sudhir Kumar, Sankar Subramanian, Yuanning Li, Gagandeep Singh","doi":"10.3389/fbinf.2023.1284744","DOIUrl":"https://doi.org/10.3389/fbinf.2023.1284744","url":null,"abstract":"The primate infraorder Simiiformes, comprising Old and New World monkeys and apes, includes the most well-studied species on earth. Their most comprehensive molecular timetree, assembled from thousands of published studies, is found in the TimeTree database and contains 268 simiiform species. It is, however, missing 38 out of 306 named species in the NCBI taxonomy for which at least one molecular sequence exists in the NCBI GenBank. We developed a three-pronged approach to expanding the timetree of Simiiformes to contain 306 species. First, molecular divergence times were searched and found for 21 missing species in timetrees published across 15 studies. Second, untimed molecular phylogenies were searched and scaled to time using relaxed clocks to add four more species. Third, we reconstructed ten new timetrees from genetic data in GenBank, allowing us to incorporate 13 more species. Finally, we assembled the most comprehensive molecular timetree of Simiiformes containing all 306 species for which any molecular data exists. We compared the species divergence times with those previously imputed using statistical approaches in the absence of molecular data. The latter data-less imputed times were not significantly correlated with those derived from the molecular data. Also, using phylogenies containing imputed times produced different trends of evolutionary distinctiveness and speciation rates over time than those produced using the molecular timetree. These results demonstrate that more complete clade-specific timetrees can be produced by analyzing existing information, which we hope will encourage future efforts to fill in the missing taxa in the global timetree of life.","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"122 16","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138999675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Proteinortho6: pseudo-reciprocal best alignment heuristic for graph-based detection of (co-)orthologs","authors":"Paul Klemm, Peter F. Stadler, Marcus Lechner","doi":"10.3389/fbinf.2023.1322477","DOIUrl":"https://doi.org/10.3389/fbinf.2023.1322477","url":null,"abstract":"Proteinortho is a widely used tool to predict (co)-orthologous groups of genes for any set of species. It finds application in comparative and functional genomics, phylogenomics, and evolutionary reconstructions. With a rapidly increasing number of available genomes, the demand for large-scale predictions is also growing. In this contribution, we evaluate and implement major algorithmic improvements that significantly enhance the speed of the analysis without reducing precision. Graph-based detection of (co-)orthologs is typically based on a reciprocal best alignment heuristic that requires an all vs. all comparison of proteins from all species under study. The initial identification of similar proteins is accelerated by introducing an alternative search tool along with a revised search strategy—the pseudo-reciprocal best alignment heuristic—that reduces the number of required sequence comparisons by one-half. The clustering algorithm was reworked to efficiently decompose very large clusters and accelerate processing. Proteinortho6 reduces the overall processing time by an order of magnitude compared to its predecessor while maintaining its small memory footprint and good predictive quality.","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"173 S394","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139006198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jan Rothörl, M. Brems, Tim J. Stevens, Peter Virnau
{"title":"Reconstructing diploid 3D chromatin structures from single cell Hi-C data with a polymer-based approach","authors":"Jan Rothörl, M. Brems, Tim J. Stevens, Peter Virnau","doi":"10.3389/fbinf.2023.1284484","DOIUrl":"https://doi.org/10.3389/fbinf.2023.1284484","url":null,"abstract":"Detailed understanding of the 3D structure of chromatin is a key ingredient to investigate a variety of processes inside the cell. Since direct methods to experimentally ascertain these structures lack the desired spatial fidelity, computational inference methods based on single cell Hi-C data have gained significant interest. Here, we develop a progressive simulation protocol to iteratively improve the resolution of predicted interphase structures by maximum-likelihood association of ambiguous Hi-C contacts using lower-resolution predictions. Compared to state-of-the-art methods, our procedure is not limited to haploid cell data and allows us to reach a resolution of up to 5,000 base pairs per bead. High resolution chromatin models grant access to a multitude of structural phenomena. Exemplarily, we verify the formation of chromosome territories and holes near aggregated chromocenters as well as the inversion of the CpG content for rod photoreceptor cells.","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"149 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138981396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Annette Lien, Leonardo Pestana Legori, Louis Kraft, Peter Wad Sackett, Gabriel Renaud
{"title":"Benchmarking software tools for trimming adapters and merging next-generation sequencing data for ancient DNA.","authors":"Annette Lien, Leonardo Pestana Legori, Louis Kraft, Peter Wad Sackett, Gabriel Renaud","doi":"10.3389/fbinf.2023.1260486","DOIUrl":"10.3389/fbinf.2023.1260486","url":null,"abstract":"<p><p>Ancient DNA is highly degraded, resulting in very short sequences. Reads generated with modern high-throughput sequencing machines are generally longer than ancient DNA molecules, therefore the reads often contain some portion of the sequencing adaptors. It is crucial to remove those adaptors, as they can interfere with downstream analysis. Furthermore, overlapping portions when DNA has been read forward and backward (paired-end) can be merged to correct sequencing errors and improve read quality. Several tools have been developed for adapter trimming and read merging, however, no one has attempted to evaluate their accuracy and evaluate their potential impact on downstream analyses. Through the simulation of sequencing data, seven commonly used tools were analyzed in their ability to reconstruct ancient DNA sequences through read merging. The analyzed tools exhibit notable differences in their abilities to correct sequence errors and identify the correct read overlap, but the most substantial difference is observed in their ability to calculate quality scores for merged bases. Selecting the most appropriate tool for a given project depends on several factors, although some tools such as fastp have some shortcomings, whereas others like leeHom outperform the other tools in most aspects. While the choice of tool did not result in a measurable difference when analyzing population genetics using principal component analysis, it is important to note that downstream analyses that are sensitive to wrongly merged reads or that rely on quality scores can be significantly impacted by the choice of tool.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"3 ","pages":"1260486"},"PeriodicalIF":0.0,"publicationDate":"2023-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10733496/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138833352","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Segura, Yana Rose, Chunxiao Bi, Jose M. Duarte, Stephen K. Burley, S. Bittrich
{"title":"RCSB Protein Data Bank: visualizing groups of experimentally determined PDB structures alongside computed structure models of proteins","authors":"J. Segura, Yana Rose, Chunxiao Bi, Jose M. Duarte, Stephen K. Burley, S. Bittrich","doi":"10.3389/fbinf.2023.1311287","DOIUrl":"https://doi.org/10.3389/fbinf.2023.1311287","url":null,"abstract":"Recent advances in Artificial Intelligence and Machine Learning (e.g., AlphaFold, RosettaFold, and ESMFold) enable prediction of three-dimensional (3D) protein structures from amino acid sequences alone at accuracies comparable to lower-resolution experimental methods. These tools have been employed to predict structures across entire proteomes and the results of large-scale metagenomic sequence studies, yielding an exponential increase in available biomolecular 3D structural information. Given the enormous volume of this newly computed biostructure data, there is an urgent need for robust tools to manage, search, cluster, and visualize large collections of structures. Equally important is the capability to efficiently summarize and visualize metadata, biological/biochemical annotations, and structural features, particularly when working with vast numbers of protein structures of both experimental origin from the Protein Data Bank (PDB) and computationally-predicted models. Moreover, researchers require advanced visualization techniques that support interactive exploration of multiple sequences and structural alignments. This paper introduces a suite of tools provided on the RCSB PDB research-focused web portal RCSB. org, tailor-made for efficient management, search, organization, and visualization of this burgeoning corpus of 3D macromolecular structure data.","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"6 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138603602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kevin K D Tan, Mark A Tsuchida, Jenu V Chacko, Niklas A Gahm, Kevin W Eliceiri
{"title":"Real-time open-source FLIM analysis.","authors":"Kevin K D Tan, Mark A Tsuchida, Jenu V Chacko, Niklas A Gahm, Kevin W Eliceiri","doi":"10.3389/fbinf.2023.1286983","DOIUrl":"10.3389/fbinf.2023.1286983","url":null,"abstract":"<p><p>Fluorescence lifetime imaging microscopy (FLIM) provides valuable quantitative insights into fluorophores' chemical microenvironment. Due to long computation times and the lack of accessible, open-source real-time analysis toolkits, traditional analysis of FLIM data, particularly with the widely used time-correlated single-photon counting (TCSPC) approach, typically occurs after acquisition. As a result, uncertainties about the quality of FLIM data persist even after collection, frequently necessitating the extension of imaging sessions. Unfortunately, prolonged sessions not only risk missing important biological events but also cause photobleaching and photodamage. We present the first open-source program designed for real-time FLIM analysis during specimen scanning to address these challenges. Our approach combines acquisition with real-time computational and visualization capabilities, allowing us to assess FLIM data quality on the fly. Our open-source real-time FLIM viewer, integrated as a Napari plugin, displays phasor analysis and rapid lifetime determination (RLD) results computed from real-time data transmitted by acquisition software such as the open-source Micro-Manager-based OpenScan package. Our method facilitates early identification of FLIM signatures and data quality assessment by providing preliminary analysis during acquisition. This not only speeds up the imaging process, but it is especially useful when imaging sensitive live biological samples.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"3 ","pages":"1286983"},"PeriodicalIF":0.0,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10720713/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138813817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Cluster analysis for localisation-based data sets: dos and don'ts when quantifying protein aggregates.","authors":"Luca Panconi, Dylan M Owen, Juliette Griffié","doi":"10.3389/fbinf.2023.1237551","DOIUrl":"https://doi.org/10.3389/fbinf.2023.1237551","url":null,"abstract":"<p><p>Many proteins display a non-random distribution on the cell surface. From dimers to nanoscale clusters to large, micron-scale aggregations, these distributions regulate protein-protein interactions and signalling. Although these distributions show organisation on length-scales below the resolution limit of conventional optical microscopy, single molecule localisation microscopy (SMLM) can map molecule locations with nanometre precision. The data from SMLM is not a conventional pixelated image and instead takes the form of a point-pattern-a list of the x, y coordinates of the localised molecules. To extract the biological insights that researchers require cluster analysis is often performed on these data sets, quantifying such parameters as the size of clusters, the percentage of monomers and so on. Here, we provide some guidance on how SMLM clustering should best be performed.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"3 ","pages":"1237551"},"PeriodicalIF":0.0,"publicationDate":"2023-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10704244/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138813816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Giorgio Valentini, Dario Malchiodi, Jessica Gliozzo, Marco Mesiti, Mauricio Soto-Gomez, Alberto Cabri, Justin Reese, Elena Casiraghi, Peter N Robinson
{"title":"The promises of large language models for protein design and modeling.","authors":"Giorgio Valentini, Dario Malchiodi, Jessica Gliozzo, Marco Mesiti, Mauricio Soto-Gomez, Alberto Cabri, Justin Reese, Elena Casiraghi, Peter N Robinson","doi":"10.3389/fbinf.2023.1304099","DOIUrl":"10.3389/fbinf.2023.1304099","url":null,"abstract":"<p><p>The recent breakthroughs of Large Language Models (LLMs) in the context of natural language processing have opened the way to significant advances in protein research. Indeed, the relationships between human natural language and the \"language of proteins\" invite the application and adaptation of LLMs to protein modelling and design. Considering the impressive results of GPT-4 and other recently developed LLMs in processing, generating and translating human languages, we anticipate analogous results with the language of proteins. Indeed, protein language models have been already trained to accurately predict protein properties, generate novel functionally characterized proteins, achieving state-of-the-art results. In this paper we discuss the promises and the open challenges raised by this novel and exciting research area, and we propose our perspective on how LLMs will affect protein modeling and design.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"3 ","pages":"1304099"},"PeriodicalIF":2.8,"publicationDate":"2023-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10701588/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138813818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}