BMC Bioinformatics最新文献

筛选
英文 中文
Utilization of a natural language processing-based approach to determine the composition of artifact residues. 利用基于自然语言处理的方法确定人工制品残留物的成分。
IF 2.9 3区 生物学
BMC Bioinformatics Pub Date : 2024-09-27 DOI: 10.1186/s12859-024-05888-2
Tung Tho Nguyen, Korey J Brownstein
{"title":"Utilization of a natural language processing-based approach to determine the composition of artifact residues.","authors":"Tung Tho Nguyen, Korey J Brownstein","doi":"10.1186/s12859-024-05888-2","DOIUrl":"https://doi.org/10.1186/s12859-024-05888-2","url":null,"abstract":"<p><strong>Background: </strong>Determining the composition of artifact residues is a central problem in ancient residue metabolomics. This is done by comparing mass spectral features in common with an experimental artifact and an ancient artifact (standard method). While this method is simple and straightforward, we sought to increase the accuracy of predicting which plant species had been used in which artifacts.</p><p><strong>Results: </strong>Here, we introduce an algorithm (new method) based on ideas from the field of natural language processing (NLP) to solve this problem. We tested our strategy on a set of modern clay pipes. To limit biases, we were not provided information on which plant species had been smoked in which clay pipes. The results indicate that our new method performed 12.5% better than the standard method in predicting the plant species smoked in each artifact.</p><p><strong>Conclusions: </strong>Utilizing an NLP-based approach, we developed a robust algorithm for characterizing the composition of artifact residues. This work also discusses other general applications in which our algorithm could be used in the field of metabolomics, such as datasets where there are a limited number of replicates.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"25 1","pages":"311"},"PeriodicalIF":2.9,"publicationDate":"2024-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11437931/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142341014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DiscovEpi: automated whole proteome MHC-I-epitope prediction and visualization. DiscovEpi:自动全蛋白质组 MHC-I 表位预测和可视化。
IF 2.9 3区 生物学
BMC Bioinformatics Pub Date : 2024-09-27 DOI: 10.1186/s12859-024-05931-2
C Mahncke, F Schmiedeke, S Simm, L Kaderali, B M Bröker, U Seifert, C Cammann
{"title":"DiscovEpi: automated whole proteome MHC-I-epitope prediction and visualization.","authors":"C Mahncke, F Schmiedeke, S Simm, L Kaderali, B M Bröker, U Seifert, C Cammann","doi":"10.1186/s12859-024-05931-2","DOIUrl":"https://doi.org/10.1186/s12859-024-05931-2","url":null,"abstract":"<p><strong>Background: </strong>Antigen presentation is a central step in initiating and shaping the adaptive immune response. To activate CD8<sup>+</sup> T cells, pathogen-derived peptides are presented on the cell surface of antigen-presenting cells bound to major histocompatibility complex (MHC) class I molecules. CD8<sup>+</sup> T cells that recognize these complexes with their T cell receptor are activated and ideally eliminate infected cells. Prediction of putative peptides binding to MHC class I (MHC-I) is crucial for understanding pathogen recognition in specific immune responses and for supporting drug and vaccine design. There are reliable databases for epitope prediction algorithms available however they primarily focus on the prediction of epitopes in single immunogenic proteins.</p><p><strong>Results: </strong>We have developed the tool DiscovEpi to establish an interface between whole proteomes and epitope prediction. The tool allows the automated identification of all potential MHC-I-binding peptides within a proteome and calculates the epitope density and average binding score for every protein, a protein-centric approach. DiscovEpi provides a convenient interface between automated multiple sequence extraction by organism and cell compartment from the database UniProt for subsequent epitope prediction via NetMHCpan. Furthermore, it allows ranking of proteins by their predicted immunogenicity on the one hand and comparison of different proteomes on the other. By applying the tool, we predict a higher immunogenic potential of membrane-associated proteins of SARS-CoV-2 compared to those of influenza A based on the presented metrics epitope density and binding score. This could be confirmed visually by comparing the epitope maps of the influenza A strain and SARS-CoV-2.</p><p><strong>Conclusion: </strong>Automated prediction of whole proteomes and the subsequent visualization of the location of putative epitopes on sequence-level facilitate the search for putative immunogenic proteins or protein regions and support the study of adaptive immune responses and vaccine design.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"25 1","pages":"310"},"PeriodicalIF":2.9,"publicationDate":"2024-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11438315/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142341005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SpeciateIT and vSpeciateDB: novel, fast, and accurate per sequence 16S rRNA gene taxonomic classification of vaginal microbiota. SpeciateIT 和 vSpeciateDB:新颖、快速、准确的阴道微生物群 16S rRNA 基因分类。
IF 2.9 3区 生物学
BMC Bioinformatics Pub Date : 2024-09-27 DOI: 10.1186/s12859-024-05930-3
Johanna B Holm, Pawel Gajer, Jacques Ravel
{"title":"SpeciateIT and vSpeciateDB: novel, fast, and accurate per sequence 16S rRNA gene taxonomic classification of vaginal microbiota.","authors":"Johanna B Holm, Pawel Gajer, Jacques Ravel","doi":"10.1186/s12859-024-05930-3","DOIUrl":"10.1186/s12859-024-05930-3","url":null,"abstract":"<p><strong>Background: </strong>Clustering of sequences into operational taxonomic units (OTUs) and denoising methods are a mainstream stopgap to taxonomically classifying large numbers of 16S rRNA gene sequences. Environment-specific reference databases generally yield optimal taxonomic assignment.</p><p><strong>Results: </strong>We developed SpeciateIT, a novel taxonomic classification tool which rapidly and accurately classifies individual amplicon sequences ( https://github.com/Ravel-Laboratory/speciateIT ). We also present vSpeciateDB, a custom reference database for the taxonomic classification of 16S rRNA gene amplicon sequences from vaginal microbiota. We show that SpeciateIT requires minimal computational resources relative to other algorithms and, when combined with vSpeciateDB, affords accurate species level classification in an environment-specific manner.</p><p><strong>Conclusions: </strong>Herein, two resources with new and practical importance are described. The novel classification algorithm, SpeciateIT, is based on 7th order Markov chain models and allows for fast and accurate per-sequence taxonomic assignments (as little as 10 min for 10<sup>7</sup> sequences). vSpeciateDB, a meticulously tailored reference database, stands as a vital and pragmatic contribution. Its significance lies in the superiority of this environment-specific database to provide more species-resolution over its universal counterparts.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"25 1","pages":"313"},"PeriodicalIF":2.9,"publicationDate":"2024-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11437924/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142341012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Turbocharging protein binding site prediction with geometric attention, inter-resolution transfer learning, and homology-based augmentation. 利用几何注意力、分辨率间转移学习和基于同源性的增强技术,加速蛋白质结合位点预测。
IF 2.9 3区 生物学
BMC Bioinformatics Pub Date : 2024-09-20 DOI: 10.1186/s12859-024-05923-2
Daeseok Lee, Wonjun Hwang, Jeunghyun Byun, Bonggun Shin
{"title":"Turbocharging protein binding site prediction with geometric attention, inter-resolution transfer learning, and homology-based augmentation.","authors":"Daeseok Lee, Wonjun Hwang, Jeunghyun Byun, Bonggun Shin","doi":"10.1186/s12859-024-05923-2","DOIUrl":"https://doi.org/10.1186/s12859-024-05923-2","url":null,"abstract":"<p><strong>Background: </strong>Locating small molecule binding sites in target proteins, in the resolution of either pocket or residue, is critical in many drug-discovery scenarios. Since it is not always easy to find such binding sites using conventional methods, different deep learning methods to predict binding sites out of protein structures have been developed in recent years. The existing deep learning based methods have several limitations, including (1) the inefficiency of the CNN-only architecture, (2) loss of information due to excessive post-processing, and (3) the under-utilization of available data sources.</p><p><strong>Methods: </strong>We present a new model architecture and training method that resolves the aforementioned problems. First, by layering geometric self-attention units on top of residue-level 3D CNN outputs, our model overcomes the problems of CNN-only architectures. Second, by configuring the fundamental units of computation as residues and pockets instead of voxels, our method reduced the information loss from post-processing. Lastly, by employing inter-resolution transfer learning and homology-based augmentation, our method maximizes the utilization of available data sources to a significant extent.</p><p><strong>Results: </strong>The proposed method significantly outperformed all state-of-the-art baselines regarding both resolutions-pocket and residue. An ablation study demonstrated the indispensability of our proposed architecture, as well as transfer learning and homology-based augmentation, for achieving optimal performance. We further scrutinized our model's performance through a case study involving human serum albumin, which demonstrated our model's superior capability in identifying multiple binding sites of the protein, outperforming the existing methods.</p><p><strong>Conclusions: </strong>We believe that our contribution to the literature is twofold. Firstly, we introduce a novel computational method for binding site prediction with practical applications, substantiated by its strong performance across diverse benchmarks and case studies. Secondly, the innovative aspects in our method- specifically, the design of the model architecture, inter-resolution transfer learning, and homology-based augmentation-would serve as useful components for future work.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"25 1","pages":"306"},"PeriodicalIF":2.9,"publicationDate":"2024-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11416008/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142280013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Leveraging gene correlations in single cell transcriptomic data 利用单细胞转录组数据中的基因相关性
IF 3 3区 生物学
BMC Bioinformatics Pub Date : 2024-09-18 DOI: 10.1186/s12859-024-05926-z
Kai Silkwood, Emmanuel Dollinger, Joshua Gervin, Scott Atwood, Qing Nie, Arthur D. Lander
{"title":"Leveraging gene correlations in single cell transcriptomic data","authors":"Kai Silkwood, Emmanuel Dollinger, Joshua Gervin, Scott Atwood, Qing Nie, Arthur D. Lander","doi":"10.1186/s12859-024-05926-z","DOIUrl":"https://doi.org/10.1186/s12859-024-05926-z","url":null,"abstract":"Many approaches have been developed to overcome technical noise in single cell RNA-sequencing (scRNAseq). As researchers dig deeper into data—looking for rare cell types, subtleties of cell states, and details of gene regulatory networks—there is a growing need for algorithms with controllable accuracy and fewer ad hoc parameters and thresholds. Impeding this goal is the fact that an appropriate null distribution for scRNAseq cannot simply be extracted from data in which ground truth about biological variation is unknown (i.e., usually). We approach this problem analytically, assuming that scRNAseq data reflect only cell heterogeneity (what we seek to characterize), transcriptional noise (temporal fluctuations randomly distributed across cells), and sampling error (i.e., Poisson noise). We analyze scRNAseq data without normalization—a step that skews distributions, particularly for sparse data—and calculate p values associated with key statistics. We develop an improved method for selecting features for cell clustering and identifying gene–gene correlations, both positive and negative. Using simulated data, we show that this method, which we call BigSur (Basic Informatics and Gene Statistics from Unnormalized Reads), captures even weak yet significant correlation structures in scRNAseq data. Applying BigSur to data from a clonal human melanoma cell line, we identify thousands of correlations that, when clustered without supervision into gene communities, align with known cellular components and biological processes, and highlight potentially novel cell biological relationships. New insights into functionally relevant gene regulatory networks can be obtained using a statistically grounded approach to the identification of gene–gene correlations.","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"11 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142255005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Taxanorm: a novel taxa-specific normalization approach for microbiome data Taxanorm:微生物组数据的新型特定分类归一化方法
IF 3 3区 生物学
BMC Bioinformatics Pub Date : 2024-09-16 DOI: 10.1186/s12859-024-05918-z
Ziyue Wang, Dillon Lloyd, Shanshan Zhao, Alison Motsinger-Reif
{"title":"Taxanorm: a novel taxa-specific normalization approach for microbiome data","authors":"Ziyue Wang, Dillon Lloyd, Shanshan Zhao, Alison Motsinger-Reif","doi":"10.1186/s12859-024-05918-z","DOIUrl":"https://doi.org/10.1186/s12859-024-05918-z","url":null,"abstract":"In high-throughput sequencing studies, sequencing depth, which quantifies the total number of reads, varies across samples. Unequal sequencing depth can obscure true biological signals of interest and prevent direct comparisons between samples. To remove variability due to differential sequencing depth, taxa counts are usually normalized before downstream analysis. However, most existing normalization methods scale counts using size factors that are sample specific but not taxa specific, which can result in over- or under-correction for some taxa. We developed TaxaNorm, a novel normalization method based on a zero-inflated negative binomial model. This method assumes the effects of sequencing depth on mean and dispersion vary across taxa. Incorporating the zero-inflation part can better capture the nature of microbiome data. We also propose two corresponding diagnosis tests on the varying sequencing depth effect for validation. We find that TaxaNorm achieves comparable performance to existing methods in most simulation scenarios in downstream analysis and reaches a higher power for some cases. Specifically, it balances power and false discovery control well. When applying the method in a real dataset, TaxaNorm has improved performance when correcting technical bias. TaxaNorm both sample- and taxon- specific bias by introducing an appropriate regression framework in the microbiome data, which aids in data interpretation and visualization. The ‘TaxaNorm’ R package is freely available through the CRAN repository https://CRAN.R-project.org/package=TaxaNorm and the source code can be downloaded at https://github.com/wangziyue57/TaxaNorm .","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"15 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142255006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mining impactful discoveries from the biomedical literature 从生物医学文献中挖掘有影响力的发现
IF 3 3区 生物学
BMC Bioinformatics Pub Date : 2024-09-16 DOI: 10.1186/s12859-024-05881-9
Erwan Moreau, Orla Hardiman, Mark Heverin, Declan O’Sullivan
{"title":"Mining impactful discoveries from the biomedical literature","authors":"Erwan Moreau, Orla Hardiman, Mark Heverin, Declan O’Sullivan","doi":"10.1186/s12859-024-05881-9","DOIUrl":"https://doi.org/10.1186/s12859-024-05881-9","url":null,"abstract":"Literature-based discovery (LBD) aims to help researchers to identify relations between concepts which are worthy of further investigation by text-mining the biomedical literature. While the LBD literature is rich and the field is considered mature, standard practice in the evaluation of LBD methods is methodologically poor and has not progressed on par with the domain. The lack of properly designed and decent-sized benchmark dataset hinders the progress of the field and its development into applications usable by biomedical experts. This work presents a method for mining past discoveries from the biomedical literature. It leverages the impact made by a discovery, using descriptive statistics to detect surges in the prevalence of a relation across time. The validity of the method is tested against a baseline representing the state-of-the-art “time-sliced” method. This method allows the collection of a large amount of time-stamped discoveries. These can be used for LBD evaluation, alleviating the long-standing issue of inadequate evaluation. It might also pave the way for more fine-grained LBD methods, which could exploit the diversity of these past discoveries to train supervised models. Finally the dataset (or some future version of it inspired by our method) could be used as a methodological tool for systematic reviews. We provide an online exploration tool in this perspective, available at https://brainmend.adaptcentre.ie/ .","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"32 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142269133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Distinguishing word identity and sequence context in DNA language models 在 DNA 语言模型中区分单词特征和序列上下文
IF 3 3区 生物学
BMC Bioinformatics Pub Date : 2024-09-13 DOI: 10.1186/s12859-024-05869-5
Melissa Sanabria, Jonas Hirsch, Anna R. Poetsch
{"title":"Distinguishing word identity and sequence context in DNA language models","authors":"Melissa Sanabria, Jonas Hirsch, Anna R. Poetsch","doi":"10.1186/s12859-024-05869-5","DOIUrl":"https://doi.org/10.1186/s12859-024-05869-5","url":null,"abstract":"Transformer-based large language models (LLMs) are very suited for biological sequence data, because of analogies to natural language. Complex relationships can be learned, because a concept of \"words\" can be generated through tokenization. Training the models with masked token prediction, they learn both token sequence identity and larger sequence context. We developed methodology to interrogate model learning, which is both relevant for the interpretability of the model and to evaluate its potential for specific tasks. We used DNABERT, a DNA language model trained on the human genome with overlapping k-mers as tokens. To gain insight into the model′s learning, we interrogated how the model performs predictions, extracted token embeddings, and defined a fine-tuning benchmarking task to predict the next tokens of different sizes without overlaps. This task evaluates foundation models without interrogating specific genome biology, it does not depend on tokenization strategies, vocabulary size, the dictionary, or the number of training parameters. Lastly, there is no leakage of information from token identity into the prediction task, which makes it particularly useful to evaluate the learning of sequence context. We discovered that the model with overlapping k-mers struggles to learn larger sequence context. Instead, the learned embeddings largely represent token sequence. Still, good performance is achieved for genome-biology-inspired fine-tuning tasks. Models with overlapping tokens may be used for tasks where a larger sequence context is of less relevance, but the token sequence directly represents the desired learning features. This emphasizes the need to interrogate knowledge representation in biological LLMs.","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"37 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142219721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
scBubbletree: computational approach for visualization of single cell RNA-seq data scBubbletree:单细胞 RNA-seq 数据可视化计算方法
IF 3 3区 生物学
BMC Bioinformatics Pub Date : 2024-09-13 DOI: 10.1186/s12859-024-05927-y
Simo Kitanovski, Yingying Cao, Dimitris Ttoouli, Farnoush Farahpour, Jun Wang, Daniel Hoffmann
{"title":"scBubbletree: computational approach for visualization of single cell RNA-seq data","authors":"Simo Kitanovski, Yingying Cao, Dimitris Ttoouli, Farnoush Farahpour, Jun Wang, Daniel Hoffmann","doi":"10.1186/s12859-024-05927-y","DOIUrl":"https://doi.org/10.1186/s12859-024-05927-y","url":null,"abstract":"Visualization approaches transform high-dimensional data from single cell RNA sequencing (scRNA-seq) experiments into two-dimensional plots that are used for analysis of cell relationships, and as a means of reporting biological insights. Yet, many standard approaches generate visuals that suffer from overplotting, lack of quantitative information, and distort global and local properties of biological patterns relative to the original high-dimensional space. We present scBubbletree, a new, scalable method for visualization of scRNA-seq data. The method identifies clusters of cells of similar transcriptomes and visualizes such clusters as “bubbles” at the tips of dendrograms (bubble trees), corresponding to quantitative summaries of cluster properties and relationships. scBubbletree stacks bubble trees with further cluster-associated information in a visually easily accessible way, thus facilitating quantitative assessment and biological interpretation of scRNA-seq data. We demonstrate this with large scRNA-seq data sets, including one with over 1.2 million cells. To facilitate coherent quantification and visualization of scRNA-seq data we developed the R-package scBubbletree, which is freely available as part of the Bioconductor repository at: https://bioconductor.org/packages/scBubbletree/ ","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"213 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142255008","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A novel approach to the analysis of Overall Survival (OS) as response with Progression-Free Interval (PFI) as condition based on the RNA-seq expression data in The Cancer Genome Atlas (TCGA) 基于《癌症基因组图谱》(The Cancer Genome Atlas,TCGA)中的 RNA-seq 表达数据,以无进展间期(Progression-Free Interval,PFI)为条件,分析作为反应的总生存期(Overall Survival,OS)的新方法
IF 3 3区 生物学
BMC Bioinformatics Pub Date : 2024-09-13 DOI: 10.1186/s12859-024-05897-1
Bo Lin, Kaipeng Wang, Yuan Yuan, Yueguo Wang, Qingyuan Liu, Yulan Wang, Jian Sun, Wenwen Wang, Huanli Wang, Shusheng Zhou, Kui Jin, Mengping Zhang, Yinglei Lai
{"title":"A novel approach to the analysis of Overall Survival (OS) as response with Progression-Free Interval (PFI) as condition based on the RNA-seq expression data in The Cancer Genome Atlas (TCGA)","authors":"Bo Lin, Kaipeng Wang, Yuan Yuan, Yueguo Wang, Qingyuan Liu, Yulan Wang, Jian Sun, Wenwen Wang, Huanli Wang, Shusheng Zhou, Kui Jin, Mengping Zhang, Yinglei Lai","doi":"10.1186/s12859-024-05897-1","DOIUrl":"https://doi.org/10.1186/s12859-024-05897-1","url":null,"abstract":"Overall Survival (OS) and Progression-Free Interval (PFI) as survival times have been collected in The Cancer Genome Atlas (TCGA). It is of biomedical interest to consider their dependence in pathway detection and survival prediction. We intend to develop novel methods for integrating PFI as condition based on parametric survival models for identifying pathways associated with OS and predicting OS. Based on the framework of conditional probability, we developed a family of frailty-based parametric-models for this purpose, with exponential or Weibull distribution as baseline. We also considered two classes of existing methods with PFI as a covariate. We evaluated the performance of three approaches by analyzing RNA-seq expression data from TCGA for lung squamous cell carcinoma and lung adenocarcinoma (LUNG), brain lower grade glioma and glioblastoma multiforme (GBMLGG), as well as skin cutaneous melanoma (SKCM). Our focus was on fourteen general cancer-related pathways. The 10-fold cross-validation was employed for the evaluation of predictive accuracy. For LUNG, p53 signaling and cell cycle pathways were detected by all approaches. Furthermore, three approaches with the consideration of PFI demonstrated significantly better predictive performance compared to the approaches without the consideration of PFI. For GBMLGG, ten pathways (e.g., Wnt signaling, JAK-STAT signaling, ECM-receptor interaction, etc.) were detected by all approaches. Furthermore, three approaches with the consideration of PFI demonstrated better predictive performance compared to the approaches without the consideration of PFI. For SKCM, p53 signaling pathway was detected only by our Weibull-baseline-based model. And three approaches with the consideration of PFI demonstrated significantly better predictive performance compared to the approaches without the consideration of PFI. Based on our study, it is necessary to incorporate PFI into the survival analysis of OS. Furthermore, PFI is a survival-type time, and improved results can be achieved by our conditional-probability-based approach.","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"21 1","pages":""},"PeriodicalIF":3.0,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142219722","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信