Marin Truchi, Caroline Lacoux, Cyprien Gille, Julien Fassy, Virginie Magnone, Rafael Lopes Goncalves, Cédric Girard-Riboulleau, Iris Manosalva-Pena, Marine Gautier-Isola, Kevin Lebrigand, Pascal Barbry, Salvatore Spicuglia, Georges Vassaux, Roger Rezzonico, Michel Barlaud, Bernard Mari
{"title":"Detecting subtle transcriptomic perturbations induced by lncRNAs knock-down in single-cell CRISPRi screening using a new sparse supervised autoencoder neural network.","authors":"Marin Truchi, Caroline Lacoux, Cyprien Gille, Julien Fassy, Virginie Magnone, Rafael Lopes Goncalves, Cédric Girard-Riboulleau, Iris Manosalva-Pena, Marine Gautier-Isola, Kevin Lebrigand, Pascal Barbry, Salvatore Spicuglia, Georges Vassaux, Roger Rezzonico, Michel Barlaud, Bernard Mari","doi":"10.3389/fbinf.2024.1340339","DOIUrl":"10.3389/fbinf.2024.1340339","url":null,"abstract":"<p><p>Single-cell CRISPR-based transcriptome screens are potent genetic tools for concomitantly assessing the expression profiles of cells targeted by a set of guides RNA (gRNA), and inferring target gene functions from the observed perturbations. However, due to various limitations, this approach lacks sensitivity in detecting weak perturbations and is essentially reliable when studying master regulators such as transcription factors. To overcome the challenge of detecting subtle gRNA induced transcriptomic perturbations and classifying the most responsive cells, we developed a new supervised autoencoder neural network method. Our Sparse supervised autoencoder (SSAE) neural network provides selection of both relevant features (genes) and actual perturbed cells. We applied this method on an in-house single-cell CRISPR-interference-based (CRISPRi) transcriptome screening (CROP-Seq) focusing on a subset of long non-coding RNAs (lncRNAs) regulated by hypoxia, a condition that promote tumor aggressiveness and drug resistance, in the context of lung adenocarcinoma (LUAD). The CROP-seq library of validated gRNA against a subset of lncRNAs and, as positive controls, HIF1A and HIF2A, the 2 main transcription factors of the hypoxic response, was transduced in A549 LUAD cells cultured in normoxia or exposed to hypoxic conditions during 3, 6 or 24 h. We first validated the SSAE approach on HIF1A and HIF2 by confirming the specific effect of their knock-down during the temporal switch of the hypoxic response. Next, the SSAE method was able to detect stable short hypoxia-dependent transcriptomic signatures induced by the knock-down of some lncRNAs candidates, outperforming previously published machine learning approaches. This proof of concept demonstrates the relevance of the SSAE approach for deciphering weak perturbations in single-cell transcriptomic data readout as part of CRISPR-based screening.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"4 ","pages":"1340339"},"PeriodicalIF":0.0,"publicationDate":"2024-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10945021/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140159701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lieke Michielsen, Marcel J T Reinders, Ahmed Mahfouz
{"title":"Predicting cell population-specific gene expression from genomic sequence.","authors":"Lieke Michielsen, Marcel J T Reinders, Ahmed Mahfouz","doi":"10.3389/fbinf.2024.1347276","DOIUrl":"10.3389/fbinf.2024.1347276","url":null,"abstract":"<p><p>Most regulatory elements, especially enhancer sequences, are cell population-specific. One could even argue that a distinct set of regulatory elements is what defines a cell population. However, discovering which non-coding regions of the DNA are essential in which context, and as a result, which genes are expressed, is a difficult task. Some computational models tackle this problem by predicting gene expression directly from the genomic sequence. These models are currently limited to predicting bulk measurements and mainly make tissue-specific predictions. Here, we present a model that leverages single-cell RNA-sequencing data to predict gene expression. We show that cell population-specific models outperform tissue-specific models, especially when the expression profile of a cell population and the corresponding tissue are dissimilar. Further, we show that our model can prioritize GWAS variants and learn motifs of transcription factor binding sites. We envision that our model can be useful for delineating cell population-specific regulatory elements.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"4 ","pages":"1347276"},"PeriodicalIF":0.0,"publicationDate":"2024-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10944912/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140159702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Claudia Sala, Pietro Di Lena, Danielle Fernandes Durso, Italo Faria do Valle, Maria Giulia Bacalini, Daniele Dall'Olio, Claudio Franceschi, Gastone Castellani, Paolo Garagnani, Christine Nardini
{"title":"Where are we in the implementation of tissue-specific epigenetic clocks?","authors":"Claudia Sala, Pietro Di Lena, Danielle Fernandes Durso, Italo Faria do Valle, Maria Giulia Bacalini, Daniele Dall'Olio, Claudio Franceschi, Gastone Castellani, Paolo Garagnani, Christine Nardini","doi":"10.3389/fbinf.2024.1306244","DOIUrl":"10.3389/fbinf.2024.1306244","url":null,"abstract":"<p><p><b>Introduction:</b> DNA methylation clocks presents advantageous characteristics with respect to the ambitious goal of identifying very early markers of disease, based on the concept that accelerated ageing is a reliable predictor in this sense. <b>Methods:</b> Such tools, being epigenomic based, are expected to be conditioned by sex and tissue specificities, and this work is about quantifying this dependency as well as that from the regression model and the size of the training set. <b>Results:</b> Our quantitative results indicate that elastic-net penalization is the best performing strategy, and better so when-unsurprisingly-the data set is bigger; sex does not appear to condition clocks performances and tissue specific clocks appear to perform better than generic blood clocks. Finally, when considering all trained clocks, we identified a subset of genes that, to the best of our knowledge, have not been presented yet and might deserve further investigation: CPT1A, MMP15, SHROOM3, SLIT3, and SYNGR. <b>Conclusion:</b> These factual starting points can be useful for the future medical translation of clocks and in particular in the debate between multi-tissue clocks, generally trained on a large majority of blood samples, and tissue-specific clocks.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"4 ","pages":"1306244"},"PeriodicalIF":0.0,"publicationDate":"2024-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10944965/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140159892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rang Li, Sabrina Wilderotter, Madison Stoddard, Debra Van Egeren, Arijit Chakravarty, Diane Joseph-McCarthy
{"title":"Computational identification of antibody-binding epitopes from mimotope datasets.","authors":"Rang Li, Sabrina Wilderotter, Madison Stoddard, Debra Van Egeren, Arijit Chakravarty, Diane Joseph-McCarthy","doi":"10.3389/fbinf.2024.1295972","DOIUrl":"10.3389/fbinf.2024.1295972","url":null,"abstract":"<p><p><b>Introduction:</b> A fundamental challenge in computational vaccinology is that most B-cell epitopes are conformational and therefore hard to predict from sequence alone. Another significant challenge is that a great deal of the amino acid sequence of a viral surface protein might not in fact be antigenic. Thus, identifying the regions of a protein that are most promising for vaccine design based on the degree of surface exposure may not lead to a clinically relevant immune response. <b>Methods:</b> Linear peptides selected by phage display experiments that have high affinity to the monoclonal antibody of interest (\"mimotopes\") usually have similar physicochemical properties to the antigen epitope corresponding to that antibody. The sequences of these linear peptides can be used to find possible epitopes on the surface of the antigen structure or a homology model of the antigen in the absence of an antigen-antibody complex structure. <b>Results and Discussion:</b> Herein we describe two novel methods for mapping mimotopes to epitopes. The first is a novel algorithm named MimoTree that allows for gaps in the mimotopes and epitopes on the antigen. More specifically, a mimotope may have a gap that does not match to the epitope to allow it to adopt a conformation relevant for binding to an antibody, and residues may similarly be discontinuous in conformational epitopes. MimoTree is a fully automated epitope detection algorithm suitable for the identification of conformational as well as linear epitopes. The second is an ensemble approach, which combines the prediction results from MimoTree and two existing methods.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"4 ","pages":"1295972"},"PeriodicalIF":0.0,"publicationDate":"2024-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10920257/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140095259","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Limits of experimental evidence in RNA secondary structure prediction.","authors":"Sarah von Löhneysen, Mario Mörl, Peter F Stadler","doi":"10.3389/fbinf.2024.1346779","DOIUrl":"10.3389/fbinf.2024.1346779","url":null,"abstract":"","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"4 ","pages":"1346779"},"PeriodicalIF":2.8,"publicationDate":"2024-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10918467/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140061443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yanlin Zhang, Christopher J F Cameron, Mathieu Blanchette
{"title":"Posterior inference of Hi-C contact frequency through sampling.","authors":"Yanlin Zhang, Christopher J F Cameron, Mathieu Blanchette","doi":"10.3389/fbinf.2023.1285828","DOIUrl":"10.3389/fbinf.2023.1285828","url":null,"abstract":"<p><p>Hi-C is one of the most widely used approaches to study three-dimensional genome conformations. Contacts captured by a Hi-C experiment are represented in a contact frequency matrix. Due to the limited sequencing depth and other factors, Hi-C contact frequency matrices are only approximations of the true interaction frequencies and are further reported without any quantification of uncertainty. Hence, downstream analyses based on Hi-C contact maps (e.g., TAD and loop annotation) are themselves point estimations. Here, we present the Hi-C interaction frequency sampler (HiCSampler) that reliably infers the posterior distribution of the interaction frequency for a given Hi-C contact map by exploiting dependencies between neighboring loci. Posterior predictive checks demonstrate that HiCSampler can infer highly predictive chromosomal interaction frequency. Summary statistics calculated by HiCSampler provide a measurement of the uncertainty for Hi-C experiments, and samples inferred by HiCSampler are ready for use by most downstream analysis tools off the shelf and permit uncertainty measurements in these analyses without modifications.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"3 ","pages":"1285828"},"PeriodicalIF":0.0,"publicationDate":"2024-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10919286/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140061442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A L Swan, A Broadbent, P Singh Gaur, A Mishra, K Gurwitz, A Mithani, S L Morgan, G Malhotra, C Brooksbank
{"title":"Making bioinformatics training FAIR: the EMBL-EBI training portal.","authors":"A L Swan, A Broadbent, P Singh Gaur, A Mishra, K Gurwitz, A Mithani, S L Morgan, G Malhotra, C Brooksbank","doi":"10.3389/fbinf.2024.1347168","DOIUrl":"10.3389/fbinf.2024.1347168","url":null,"abstract":"<p><p>EMBL-EBI provides a broad range of training in data-driven life sciences. To improve awareness and access to training course listings and to make digital learning materials findable and simple to use, the EMBL-EBI Training website, www.ebi.ac.uk/training, was redesigned and restructured. To provide a framework for the redesign of the website, the FAIR (findable, accessible, interoperable, reusable) principles were applied to both the listings of live training courses and the presentation of on-demand training content. Each of the FAIR principles guided decisions on the choice of technology used to develop the website, including the details provided about training and the way in which training was presented. Since its release the openly accessible website has been accessed by an average of 58,492 users a month. There have also been over 12,000 unique users creating accounts since the functionality was added in March 2022, allowing these users to track their learning and record completion of training. Development of the website was completed using the Agile Scrum project management methodology and a focus on user experience. This framework continues to be used now that the website is live for the maintenance and improvement of the website, as feedback continues to be collected and further ways to make training FAIR are identified. Here, we describe the process of making EMBL-EBI's training FAIR through the development of a new website and our experience of implementing Agile Scrum.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"4 ","pages":"1347168"},"PeriodicalIF":0.0,"publicationDate":"2024-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10866141/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139736872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Aimer Gutierrez-Diaz, Steve Hoffmann, Juan Carlos Gallego-Gómez, Clara Isabel Bermudez-Santana
{"title":"Systematic computational hunting for small RNAs derived from ncRNAs during dengue virus infection in endothelial HMEC-1 cells.","authors":"Aimer Gutierrez-Diaz, Steve Hoffmann, Juan Carlos Gallego-Gómez, Clara Isabel Bermudez-Santana","doi":"10.3389/fbinf.2024.1293412","DOIUrl":"10.3389/fbinf.2024.1293412","url":null,"abstract":"<p><p>In recent years, a population of small RNA fragments derived from non-coding RNAs (sfd-RNAs) has gained significant interest due to its functional and structural resemblance to miRNAs, adding another level of complexity to our comprehension of small-RNA-mediated gene regulation. Despite this, scientists need more tools to test the differential expression of sfd-RNAs since the current methods to detect miRNAs may not be directly applied to them. The primary reasons are the lack of accurate small RNA and ncRNA annotation, the multi-mapping read (MMR) placement, and the multicopy nature of ncRNAs in the human genome. To solve these issues, a methodology that allows the detection of differentially expressed sfd-RNAs, including canonical miRNAs, by using an integrated copy-number-corrected ncRNA annotation was implemented. This approach was coupled with sixteen different computational strategies composed of combinations of four aligners and four normalization methods to provide a rank-order of prediction for each differentially expressed sfd-RNA. By systematically addressing the three main problems, we could detect differentially expressed miRNAs and sfd-RNAs in dengue virus-infected human dermal microvascular endothelial cells. Although more biological evaluations are required, two molecular targets of the hsa-mir-103a and hsa-mir-494 (CDK5 and PI3/AKT) appear relevant for dengue virus (DENV) infections. Here, we performed a comprehensive annotation and differential expression analysis, which can be applied in other studies addressing the role of small fragment RNA populations derived from ncRNAs in virus infection.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"4 ","pages":"1293412"},"PeriodicalIF":0.0,"publicationDate":"2024-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10864640/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139736873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A breast cancer-specific combinational QSAR model development using machine learning and deep learning approaches.","authors":"Anush Karampuri, Shyam Perugu","doi":"10.3389/fbinf.2023.1328262","DOIUrl":"10.3389/fbinf.2023.1328262","url":null,"abstract":"<p><p>Breast cancer is the most prevalent and heterogeneous form of cancer affecting women worldwide. Various therapeutic strategies are in practice based on the extent of disease spread, such as surgery, chemotherapy, radiotherapy, and immunotherapy. Combinational therapy is another strategy that has proven to be effective in controlling cancer progression. Administration of Anchor drug, a well-established primary therapeutic agent with known efficacy for specific targets, with Library drug, a supplementary drug to enhance the efficacy of anchor drugs and broaden the therapeutic approach. Our work focused on harnessing regression-based Machine learning (ML) and deep learning (DL) algorithms to develop a structure-activity relationship between the molecular descriptors of drug pairs and their combined biological activity through a QSAR (Quantitative structure-activity relationship) model. 11 popularly known machine learning and deep learning algorithms were used to develop QSAR models. A total of 52 breast cancer cell lines, 25 anchor drugs, and 51 library drugs were considered in developing the QSAR model. It was observed that Deep Neural Networks (DNNs) achieved an impressive R<sup>2</sup> (Coefficient of Determination) of 0.94, with an RMSE (Root Mean Square Error) value of 0.255, making it the most effective algorithm for developing a structure-activity relationship with strong generalization capabilities. In conclusion, applying combinational therapy alongside ML and DL techniques represents a promising approach to combating breast cancer.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"3 ","pages":"1328262"},"PeriodicalIF":2.8,"publicationDate":"2024-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10822965/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139577087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"BPAGS: a web application for bacteriocin prediction via feature evaluation using alternating decision tree, genetic algorithm, and linear support vector classifier","authors":"Suraiya Akhter, John H. Miller","doi":"10.3389/fbinf.2023.1284705","DOIUrl":"https://doi.org/10.3389/fbinf.2023.1284705","url":null,"abstract":"The use of bacteriocins has emerged as a propitious strategy in the development of new drugs to combat antibiotic resistance, given their ability to kill bacteria with both broad and narrow natural spectra. Hence, a compelling requirement arises for a precise and efficient computational model that can accurately predict novel bacteriocins. Machine learning’s ability to learn patterns and features from bacteriocin sequences that are difficult to capture using sequence matching-based methods makes it a potentially superior choice for accurate prediction. A web application for predicting bacteriocin was created in this study, utilizing a machine learning approach. The feature sets employed in the application were chosen using alternating decision tree (ADTree), genetic algorithm (GA), and linear support vector classifier (linear SVC)-based feature evaluation methods. Initially, potential features were extracted from the physicochemical, structural, and sequence-profile attributes of both bacteriocin and non-bacteriocin protein sequences. We assessed the candidate features first using the Pearson correlation coefficient, followed by separate evaluations with ADTree, GA, and linear SVC to eliminate unnecessary features. Finally, we constructed random forest (RF), support vector machine (SVM), decision tree (DT), logistic regression (LR), k-nearest neighbors (KNN), and Gaussian naïve Bayes (GNB) models using reduced feature sets. We obtained the overall top performing model using SVM with ADTree-reduced features, achieving an accuracy of 99.11% and an AUC value of 0.9984 on the testing dataset. We also assessed the predictive capabilities of our best-performing models for each reduced feature set relative to our previously developed software solution, a sequence alignment-based tool, and a deep-learning approach. A web application, titled BPAGS (Bacteriocin Prediction based on ADTree, GA, and linear SVC), was developed to incorporate the predictive models built using ADTree, GA, and linear SVC-based feature sets. Currently, the web-based tool provides classification results with associated probability values and has options to add new samples in the training data to improve the predictive efficacy. BPAGS is freely accessible at https://shiny.tricities.wsu.edu/bacteriocin-prediction/.","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"8 12","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139439460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}