Frontiers in bioinformatics最新文献

筛选
英文 中文
Optimization of clustering parameters for single-cell RNA analysis using intrinsic goodness metrics. 利用内在良度指标优化单细胞RNA分析聚类参数。
IF 2.8
Frontiers in bioinformatics Pub Date : 2025-06-11 eCollection Date: 2025-01-01 DOI: 10.3389/fbinf.2025.1562410
Nicolina Sciaraffa, Antonino Gagliano, Luigi Augugliaro, Claudia Coronnello
{"title":"Optimization of clustering parameters for single-cell RNA analysis using intrinsic goodness metrics.","authors":"Nicolina Sciaraffa, Antonino Gagliano, Luigi Augugliaro, Claudia Coronnello","doi":"10.3389/fbinf.2025.1562410","DOIUrl":"10.3389/fbinf.2025.1562410","url":null,"abstract":"<p><strong>Introduction: </strong>The accurate clustering of cell subpopulations is a crucial aspect of single-cell RNA sequencing. The ability to correctly subdivide cell subpopulations hinges on the efficacy of unsupervised clustering. Despite the advancements and numerous adaptations of clustering algorithms, the correct clustering of cells remains a challenging endeavor that is dependent on the data in question and on the parameters selected for the clustering process. In this context, the present study aimed to predict the accuracy of clustering methods when varying different parameters by exploiting the intrinsic goodness metrics.</p><p><strong>Methods: </strong>This study utilized three datasets, each originating from a distinct anatomical district and with a ground truth cell annotation. Moreover, the investigation employed two clustering methods: the Leiden and the Deep Embedding for Single-cell Clustering (DESC) algorithm. Firstly, a robust linear mixed regression model has been implemented in order to analyze the impact of clustering parameters on the accuracy. Consequently, fifteen intrinsic measures have been calculated and used to train an ElasticNet regression model in both intra- and cross-dataset approaches to evaluate the possibility of predicting the clustering accuracy.</p><p><strong>Results and discussion: </strong>The first-order interactions demonstrated that the use of the UMAP method for the generation of the neighborhood graph and an increase in resolution has a beneficial impact on accuracy. The impact of the resolution parameter is accentuated by the reduced number of nearest neighbors, resulting in sparser and more locally sensitive graphs, which better preserve fine-grained cellular relationships. Furthermore, it is advisable to test different numbers of principal components, given that this parameter is highly affected by data complexity. This procedure has enabled the effective prediction of clustering accuracy through the utilization of intrinsic metrics. The findings demonstrated that the within-cluster dispersion and the Banfield-Raftery index could be effectively used as proxies for accuracy, for an immediate comparison of different clustering parameter configurations.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1562410"},"PeriodicalIF":2.8,"publicationDate":"2025-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12187673/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144499694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TICTAC: target illumination clinical trial analytics with cheminformatics. TICTAC:靶向照明临床试验分析与化学信息学。
IF 2.8
Frontiers in bioinformatics Pub Date : 2025-06-09 eCollection Date: 2025-01-01 DOI: 10.3389/fbinf.2025.1579865
Jeremiah I Abok, Jeremy S Edwards, Jeremy J Yang
{"title":"TICTAC: target illumination clinical trial analytics with cheminformatics.","authors":"Jeremiah I Abok, Jeremy S Edwards, Jeremy J Yang","doi":"10.3389/fbinf.2025.1579865","DOIUrl":"10.3389/fbinf.2025.1579865","url":null,"abstract":"<p><strong>Introduction: </strong>Identifying disease-target associations is a pivotal step in drug discovery, offering insights that guide the development and optimization of therapeutic interventions. Clinical trial data serves as a valuable source for inferring these associations. However, issues such as inconsistent data quality and limited interpretability pose significant challenges. To overcome these limitations, an integrated approach is required that consolidates evidence from diverse data sources to support the effective prioritization of biological targets for further research.</p><p><strong>Methods: </strong>We developed a comprehensive data integration and visualization pipeline to infer and evaluate associations between diseases and known and potential drug targets. This pipeline integrates clinical trial data with standardized metadata, providing an analytical workflow that enables the exploration of diseases linked to specific drug targets as well as facilitating the discovery of drug targets associated with specific diseases. The pipeline employs robust aggregation techniques to consolidate multivariate evidence from multiple studies, leveraging harmonized datasets to ensure consistency and reliability. Disease-target associations are systematically ranked and filtered using a rational scoring framework that assigns confidence scores derived from aggregated statistical metrics.</p><p><strong>Results: </strong>Our pipeline evaluates disease-target associations by linking protein-coding genes to diseases and incorporates a confidence assessment method based on aggregated evidence. Metrics such as meanRank scores are employed to prioritize associations, enabling researchers to focus on the most promising hypotheses. This systematic approach streamlines the identification and prioritization of biological targets, enhancing hypothesis generation and evidence-based decision-making.</p><p><strong>Discussion: </strong>This innovative pipeline provides a scalable solution for hypothesis generation, scoring, and ranking in drug discovery. As an open-source tool, it is equipped with publicly available datasets and designed for ease of use by researchers. The platform empowers scientists to make data-driven decisions in the prioritization of biological targets, facilitating the discovery of novel therapeutic opportunities.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1579865"},"PeriodicalIF":2.8,"publicationDate":"2025-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12183303/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144478098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DeepPredict: a state-of-the-art web server for protein secondary structure and relative solvent accessibility prediction. DeepPredict:一个最先进的蛋白质二级结构和相对溶剂可及性预测的web服务器。
IF 2.8
Frontiers in bioinformatics Pub Date : 2025-06-06 eCollection Date: 2025-01-01 DOI: 10.3389/fbinf.2025.1607402
Wafa Alanazi, Di Meng, Gianluca Pollastri
{"title":"DeepPredict: a state-of-the-art web server for protein secondary structure and relative solvent accessibility prediction.","authors":"Wafa Alanazi, Di Meng, Gianluca Pollastri","doi":"10.3389/fbinf.2025.1607402","DOIUrl":"10.3389/fbinf.2025.1607402","url":null,"abstract":"<p><p>DeepPredict is a freely accessible web server that integrates Porter6 and PaleAle6, two state-of-the-art deep learning models designed for protein secondary structure prediction (PSSP) and relative solvent accessibility (RSA) prediction, respectively. Built on an advanced deep learning framework, DeepPredict leverages pre-trained protein language models (PLMs), specifically ESM-2, to eliminate the need for multiple sequence alignments (MSAs), enabling rapid and accurate predictions. Compared to existing methods, DeepPredict outperforms in both PSSP and RSA prediction tasks, delivering state-of-the-art performance. The server offers a user-friendly interface, catering to both computational biologists and experimental researchers. DeepPredict is available at [ https://pcrgwd.ucd.ie/wafa/] with comprehensive online documentation and downloadable example datasets.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1607402"},"PeriodicalIF":2.8,"publicationDate":"2025-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12179536/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144478096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Interactive visualization of large molecular systems with VTX: example with a minimal whole-cell model. 用VTX进行大分子系统的交互式可视化:以最小全细胞模型为例。
IF 2.8
Frontiers in bioinformatics Pub Date : 2025-06-06 eCollection Date: 2025-01-01 DOI: 10.3389/fbinf.2025.1588661
Maxime Maria, Valentin Guillaume, Simon Guionnière, Nicolas Dacquay, Cyprien Plateau-Holleville, Vincent Larroque, Jean Lardé, Yassine Naimi, Jean-Philip Piquemal, Guillaume Levieux, Nathalie Lagarde, Stéphane Mérillou, Matthieu Montes
{"title":"Interactive visualization of large molecular systems with VTX: example with a minimal whole-cell model.","authors":"Maxime Maria, Valentin Guillaume, Simon Guionnière, Nicolas Dacquay, Cyprien Plateau-Holleville, Vincent Larroque, Jean Lardé, Yassine Naimi, Jean-Philip Piquemal, Guillaume Levieux, Nathalie Lagarde, Stéphane Mérillou, Matthieu Montes","doi":"10.3389/fbinf.2025.1588661","DOIUrl":"10.3389/fbinf.2025.1588661","url":null,"abstract":"<p><p>VTX is an open-source molecular visualization software designed to overcome the scaling limitations of existing real-time molecular visualization software when handling massive molecular datasets. VTX employs a meshless molecular graphics engine utilizing impostor-based techniques and adaptive level-of-detail (LOD) rendering. This approach significantly reduces memory usage and enables real-time visualization and manipulation of large molecular systems. Performance benchmarks against VMD, PyMOL, and ChimeraX using a 114-million-bead Martini minimal whole-cell model demonstrate VTX's efficiency, maintaining consistent frame rates even under interactive manipulation on standard computer hardware. VTX incorporates features such as Screen-Space Ambient Occlusion (SSAO) for enhanced depth perception and free-fly navigation for intuitive exploration of large molecular systems. VTX is open-source and free for non commercial use. Binaries for Windows and Ubuntu Linux are available at http://vtx.drugdesign.fr. VTX source code is available at https://github.com/VTX-Molecular-Visualization.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1588661"},"PeriodicalIF":2.8,"publicationDate":"2025-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12179134/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144478097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Trail-blazing and keeping pace: building, retaining and expanding image analysis expertise. 开拓创新,与时俱进:建立、保留和扩展图像分析专业知识。
IF 2.8
Frontiers in bioinformatics Pub Date : 2025-05-30 eCollection Date: 2025-01-01 DOI: 10.3389/fbinf.2025.1613866
David Kirchenbuechler, Mariana De Niz, Constadina Arvanitis
{"title":"Trail-blazing and keeping pace: building, retaining and expanding image analysis expertise.","authors":"David Kirchenbuechler, Mariana De Niz, Constadina Arvanitis","doi":"10.3389/fbinf.2025.1613866","DOIUrl":"10.3389/fbinf.2025.1613866","url":null,"abstract":"<p><p>Scientific studies are increasingly complex, involving quantification of many different experimental approaches and technologies. However, it is challenging for any individual scientist to build and retain sufficient expertise and competency in a large range of scientific tools. A deep expertise is critical for rigor and reproducibility; however, focused expertise can easily become a hindrance to inter-disciplinary science. This is particularly true with respect to microscopy and image analysis. Core facilities often bridge this gap, serving as an access point to expertise in cutting-edge technologies while facilitating collaboration. Our purpose with this perspective piece is to share our experience with other Microscopy Core Facility Directors and Image analysts who are aiming to establish image analysis training as a service. We hope that this shared experience can help others optimize their service though our lessons learned, and avoid pitfalls we faced during our Core's timeline. In this paper we explore three elements that have been vital for the establishment and expansion of image analysis at the Center for Advanced Microscopy at Northwestern University. The first is a commitment to dedicated image analysis service. The second is establishing image analysis training programs for the local scientific community, which facilitates integration of analysis into microscopy workflows. The third is engagement with international organizations such as BINA. These organization foster collaborations which ultimately result in the fruitful dissemination of novel tools across the community. These three elements are essential to maximize the potential of imaging-based scientific research and ultimately ensuring equal access to image informatics.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1613866"},"PeriodicalIF":2.8,"publicationDate":"2025-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12162573/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144303804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
In silico analysis of Triphala-derived polyphenols as inhibitors of TIR-TIR homodimerization in the inflammatory pathway. 在炎症途径中,三联衍生物多酚作为TIR-TIR同二聚化抑制剂的硅分析。
IF 2.8
Frontiers in bioinformatics Pub Date : 2025-05-29 eCollection Date: 2025-01-01 DOI: 10.3389/fbinf.2025.1565700
Durgadevi Rajendran, Nalini Easwaran
{"title":"In silico analysis of Triphala-derived polyphenols as inhibitors of TIR-TIR homodimerization in the inflammatory pathway.","authors":"Durgadevi Rajendran, Nalini Easwaran","doi":"10.3389/fbinf.2025.1565700","DOIUrl":"10.3389/fbinf.2025.1565700","url":null,"abstract":"<p><p>Downstream signaling of the nuclear factor kappa-light-chain-enhancer of activated B cells (NF-κB) pathway is mediated by the adaptor protein myeloid differentiation primary response gene 88 (<i>MyD88</i>). The TIR domain present in <i>MyD88</i> plays a pivotal role in regulating the expression of pro-inflammatory cytokines. Although synthetic drugs, including M20 and TJ-M2010-5, have been studied to mitigate the overexpression of <i>MyD88</i>, their prolonged usage is known to cause adverse side effects, highlighting the need for a safer, risk-free alternative. An Ayurvedic formulation named Triphala, which is rich in polyphenols and traditionally used to treat various ailments, was selected for this investigation. Although polyphenols are gaining attention as anti-inflammatory agents, their precise mode of action remains insufficiently understood. Previous studies have explored the anti-inflammatory properties of Triphala in a broad spectrum, but this study notably focuses on the interactions of Triphala-derived polyphenols with the TIR domain of the MyD88 adaptor protein in the NF-κB signaling pathway. This study employs computational docking and a molecular dynamics (MD) simulation to study the interaction and stability of the polyphenols with the target protein. The polyphenols were virtually docked to the TIR domain of MyD88 using AutoDock tools 1.5.7. Among them, the top three protein-polyphenol complexes with the highest binding affinities were selected and subjected to MD simulation for 200 ns to evaluate their interaction properties in detail. The findings of the MD simulation corroborated the docking results, showing that two complexes (protein-punicalagin and protein-chebulagic acid) demonstrated better interaction patterns. The MD trajectory revealed that polyphenol binding enhanced the stability of the target protein, as indicated by lower root-mean-square deviation (RMSD) (∼0.25 nm), solvent accessible surface area (SASA) (∼96.848-100.666 nm<sup>2</sup>), and stabilized radius of gyration (Rg) (∼1.50-1.53 nm) values for punicalagin and chebulagic acid complexes compared to the reference complex. Our findings have supported the hypothesis that Triphala polyphenols may interact with the TIR domain of MyD88, thereby inhibiting the production of inflammatory cytokines. This study provides a combination of computational validation of specific molecular targets and mechanistic insights into the anti-inflammatory potential of Triphala-derived polyphenols.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1565700"},"PeriodicalIF":2.8,"publicationDate":"2025-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12158960/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144287344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bridging the gap between hepatocellular carcinoma management guidelines and personalised medicine: a Bayesian network study. 弥合肝细胞癌管理指南和个性化医疗之间的差距:一项贝叶斯网络研究。
IF 2.8
Frontiers in bioinformatics Pub Date : 2025-05-29 eCollection Date: 2025-01-01 DOI: 10.3389/fbinf.2025.1574797
Yi-Chun Wang, Daniel Bulte, Michael Brady
{"title":"Bridging the gap between hepatocellular carcinoma management guidelines and personalised medicine: a Bayesian network study.","authors":"Yi-Chun Wang, Daniel Bulte, Michael Brady","doi":"10.3389/fbinf.2025.1574797","DOIUrl":"10.3389/fbinf.2025.1574797","url":null,"abstract":"<p><strong>Introduction: </strong>There are numerous treatment options available for patients with confirmed hepatocellular carcinoma (HCC). Guidelines such as Barcelona Clinic Liver Cancer (BCLC) support treatment decisions by way of a flow diagram that is organized around groups of patients. Though such guidelines continue to make a major contribution to standardization of treatment, in clinical reality, cases are often more nuanced than is captured in any flow diagram, even one as comprehensive as BCLC. A fundamental challenge for a clinician is to combine such a population-wide guideline with specific information about the individual patient. Bayesian networks (BNs) offer a way to \"bridge this gap\" and combine standardized care and precision medicine. They do this by enabling answers to detailed \"what-if\" questions from the clinician.</p><p><strong>Methods: </strong>We use real-world data of HCC patients who received treatments between 2019 and 2020 to construct a BN to assess the potential treatment effect for cases that were <b><i>not</i></b> treated in compliance with BCLC.</p><p><strong>Results: </strong>We report detailed scenarios for ten randomly selected cases and summarise the difference in survival time for each scenario. For each case, the counterfactual treatment scenarios are made based on whether or not the case is in compliance with BCLC guidelines, the type of treatment received and the waiting time to receive treatment.</p><p><strong>Discussion: </strong>We consider two cases with similar clinical characteristics (but received different treatments) and discuss whether or not they are treated in compliance to the guidelines resulting in better outcomes than the actual clinical decision. We include a detailed discussion about the assumptions made in constructing the BN and we highlight why such a BN can serve as an AI-based clinical decision support system particularly when there is need for further patient stratification.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1574797"},"PeriodicalIF":2.8,"publicationDate":"2025-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12158914/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144287434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Quality over quantity: how to get the best results when using docking for repurposing. 质重于量:如何在对接再利用时获得最佳效果。
IF 2.8
Frontiers in bioinformatics Pub Date : 2025-05-26 eCollection Date: 2025-01-01 DOI: 10.3389/fbinf.2025.1536504
Lenin Domínguez-Ramírez, Maricruz Anaya-Ruiz, Paulina Cortés-Hernández
{"title":"Quality over quantity: how to get the best results when using docking for repurposing.","authors":"Lenin Domínguez-Ramírez, Maricruz Anaya-Ruiz, Paulina Cortés-Hernández","doi":"10.3389/fbinf.2025.1536504","DOIUrl":"10.3389/fbinf.2025.1536504","url":null,"abstract":"<p><p>Molecular docking is among the fastest and most readily available computational tools to explore protein-ligand interactions. However, little effort has been put into assessing the quality of its results. In this paper, we compared eight free license docking programs to screen a drug library against the human target, phosphodiesterase 5A (PDE5A), to evaluate their ability to find its known ligand, sildenafil, and other ligands that became erectile dysfunction drugs because they inhibit this target. GNINA was superior at identifying the known target because it offers a convolutional neural network (CNN) score that ranks the quality of docking results. Using this CNN score improved the ranking of known positives. Receiver operating characteristic (ROC) analysis revealed that all docking suites lack specificity; that is, they often misidentify true negatives. Employing a CNN score cutoff before ranking by docking affinity raised specificity with a small loss in sensitivity. After the cutoff, datasets became smaller but of higher quality. We propose a heuristic to produce relevant docking results, which includes an overall evaluation of the target on docking performance through ROC and an improvement of candidate binder selection using a CNN score cutoff of 0.9.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1536504"},"PeriodicalIF":2.8,"publicationDate":"2025-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12146287/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144259526","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CytoLNCpred-a computational method for predicting cytoplasm associated long non-coding RNAs in 15 cell-lines. cytolncpred -一种预测15种细胞系细胞质相关长链非编码rna的计算方法。
IF 2.8
Frontiers in bioinformatics Pub Date : 2025-05-26 eCollection Date: 2025-01-01 DOI: 10.3389/fbinf.2025.1585794
Shubham Choudhury, Naman Kumar Mehta, Gajendra P S Raghava
{"title":"CytoLNCpred-a computational method for predicting cytoplasm associated long non-coding RNAs in 15 cell-lines.","authors":"Shubham Choudhury, Naman Kumar Mehta, Gajendra P S Raghava","doi":"10.3389/fbinf.2025.1585794","DOIUrl":"10.3389/fbinf.2025.1585794","url":null,"abstract":"<p><p>The function of long non-coding RNA (lncRNA) is largely determined by its specific location within a cell. Previous methods have used noisy datasets, including mRNA transcripts in tools intended for lncRNAs, and excluded lncRNAs lacking significant differential localization between the cytoplasm and nucleus. In order to overcome these shortcomings, a method has been developed for predicting cytoplasm-associated lncRNAs in 15 human cell-lines, identifying which lncRNAs are more abundant in the cytoplasm compared to the nucleus. All models in this study were trained using five-fold cross validation and tested on an validation dataset. Initially, we developed machine and deep learning based models using traditional features like composition and correlation. Using composition and correlation based features, machine learning algorithms achieved an average AUC of 0.7049 and 0.7089, respectively for 15 cell-lines. Secondly, we developed machine based models developed using embedding features obtained from the large language model DNABERT-2. The average AUC for all the cell-lines achieved by this approach was 0.665. Subsequently, we also fine-tuned DNABERT-2 on our training dataset and evaluated the fine-tuned DNABERT-2 model on the validation dataset. The fine-tuned DNABERT-2 model achieved an average AUC of 0.6336. Correlation-based features combined with ML algorithms outperform LLM-based models, in the case of predicting differential lncRNA localization. These cell-line specific models as well as web-based service are available to the public from our web server (https://webs.iiitd.edu.in/raghava/cytolncpred/).</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1585794"},"PeriodicalIF":2.8,"publicationDate":"2025-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12146324/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144259525","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MORE-RNAseq: a pipeline for quantifying retrotransposition-capable LINE1 expression based on RNA-seq data. MORE-RNAseq:基于RNA-seq数据定量逆转录LINE1表达的管道。
IF 2.8
Frontiers in bioinformatics Pub Date : 2025-05-22 eCollection Date: 2025-01-01 DOI: 10.3389/fbinf.2025.1575346
Yutaka Nakachi, Jianbin Du, Risa Watanabe, Yutaro Yanagida, Miki Bundo, Kazuya Iwamoto
{"title":"MORE-RNAseq: a pipeline for quantifying retrotransposition-capable LINE1 expression based on RNA-seq data.","authors":"Yutaka Nakachi, Jianbin Du, Risa Watanabe, Yutaro Yanagida, Miki Bundo, Kazuya Iwamoto","doi":"10.3389/fbinf.2025.1575346","DOIUrl":"10.3389/fbinf.2025.1575346","url":null,"abstract":"<p><p>Retrotransposon long interspersed nuclear element-1 (LINE-1, L1) constitutes a large proportion of the mammalian genome. A fraction of L1s, which have no deleterious mutations in the structure, can amplify their copies via a process called retrotransposition (RT). RT affects genome stability and gene expression and is involved in the pathogenesis of many hereditary diseases. Measuring expression of RT-capable L1s (rc-L1s) among the hundreds of thousands of non rc-L1s is an essential step to understand the impact of RT. We developed mobile element-originated read enrichment from RNA-seq data (MORE-RNAseq), a pipeline for calculating expression of rc-L1s using manually curated L1 references in humans and mice. MORE-RNAseq allows for quantification of expression levels of overall (sum of the expression of all rc-L1s) and individual rc-L1s with consideration of the genomic context. We applied MORE-RNAseq to publicly available RNA-seq data of human and mouse cancer cell lines from the studies that reported increased L1 expression. We found the significant increase of rc-L1 expressions at the overall level in both inter- and intragenic contexts. We also identified differentially expressed rc-L1s at the locus level, which will be the important candidates for downstream analysis. We also applied our method to young and aged human muscle RNA-seq data with no prior information about L1 expression, and found a significant increase of rc-L1 expression in the aged samples. Our method will contribute to understand the role of rc-L1s in various physiological and pathophysiological conditions using standard RNA-seq data. All scripts are available at https://github.com/molbrain/MORE-RNAseq.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1575346"},"PeriodicalIF":2.8,"publicationDate":"2025-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12138260/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144236098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信