Briefings in bioinformatics最新文献

筛选
英文 中文
MethPriorGCN: a deep learning tool for inferring DNA methylation prior knowledge and guiding personalized medicine.
IF 6.8 2区 生物学
Briefings in bioinformatics Pub Date : 2025-03-04 DOI: 10.1093/bib/bbaf131
Jie Ni, Bin Li, Shumei Miao, Xinting Zhang, Donghui Yan, Shengqi Jing, Shan Lu, Zhuoying Xie, Xin Zhang, Yun Liu
{"title":"MethPriorGCN: a deep learning tool for inferring DNA methylation prior knowledge and guiding personalized medicine.","authors":"Jie Ni, Bin Li, Shumei Miao, Xinting Zhang, Donghui Yan, Shengqi Jing, Shan Lu, Zhuoying Xie, Xin Zhang, Yun Liu","doi":"10.1093/bib/bbaf131","DOIUrl":"10.1093/bib/bbaf131","url":null,"abstract":"<p><p>DNA methylation plays a crucial role in human diseases pathogenesis. Substantial experimental evidence from clinical and biological studies has confirmed numerous methylation-disease associations, which provide valuable prior knowledge for advancing precision medicine through biomarker discovery and disease subtyping. To systematically mine reliable methylation prior knowledge from known DNA methylation-disease associations and develop robust computational methods for precision medicine applications, we propose MethPriorGCN. By integrating layer attention mechanisms and feature weighting mechanisms, MethPriorGCN not only identified reliable methylation digital biomarkers but also achieved superior disease subtype classification accuracy.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 2","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11934576/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143699500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
$mathcal{S}$ able: bridging the gap in protein structure understanding with an empowering and versatile pre-training paradigm.
IF 6.8 2区 生物学
Briefings in bioinformatics Pub Date : 2025-03-04 DOI: 10.1093/bib/bbaf120
Jiashan Li, Xi Chen, He Huang, Mingliang Zeng, Jingcheng Yu, Xinqi Gong, Qiwei Ye
{"title":"$mathcal{S}$ able: bridging the gap in protein structure understanding with an empowering and versatile pre-training paradigm.","authors":"Jiashan Li, Xi Chen, He Huang, Mingliang Zeng, Jingcheng Yu, Xinqi Gong, Qiwei Ye","doi":"10.1093/bib/bbaf120","DOIUrl":"https://doi.org/10.1093/bib/bbaf120","url":null,"abstract":"<p><p>Protein pre-training has emerged as a transformative approach for solving diverse biological tasks. While many contemporary methods focus on sequence-based language models, recent findings highlight that protein sequences alone are insufficient to capture the extensive information inherent in protein structures. Recognizing the crucial role of protein structure in defining function and interactions, we introduce $mathcal{S}$able, a versatile pre-training model designed to comprehensively understand protein structures. $mathcal{S}$able incorporates a novel structural encoding mechanism that enhances inter-atomic information exchange and spatial awareness, combined with robust pre-training strategies and lightweight decoders optimized for specific downstream tasks. This approach enables $mathcal{S}$able to consistently outperform existing methods in tasks such as generation, classification, and regression, demonstrating its superior capability in protein structure representation. The code and models can be accessed via GitHub repository at https://github.com/baaihealth/Sable.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 2","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143751263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HSSPPI: hierarchical and spatial-sequential modeling for PPIs prediction.
IF 6.8 2区 生物学
Briefings in bioinformatics Pub Date : 2025-03-04 DOI: 10.1093/bib/bbaf079
Yuguang Li, Zhen Tian, Xiaofei Nan, Shoutao Zhang, Qinglei Zhou, Shuai Lu
{"title":"HSSPPI: hierarchical and spatial-sequential modeling for PPIs prediction.","authors":"Yuguang Li, Zhen Tian, Xiaofei Nan, Shoutao Zhang, Qinglei Zhou, Shuai Lu","doi":"10.1093/bib/bbaf079","DOIUrl":"10.1093/bib/bbaf079","url":null,"abstract":"<p><strong>Motivation: </strong>Protein-protein interactions play a fundamental role in biological systems. Accurate detection of protein-protein interaction sites (PPIs) remains a challenge. And, the methods of PPIs prediction based on biological experiments are expensive. Recently, a lot of computation-based methods have been developed and made great progress. However, current computational methods only focus on one form of protein, using only protein spatial conformation or primary sequence. And, the protein's natural hierarchical structure is ignored.</p><p><strong>Results: </strong>In this study, we propose a novel network architecture, HSSPPI, through hierarchical and spatial-sequential modeling of protein for PPIs prediction. In this network, we represent protein as a hierarchical graph, in which a node in the protein is a residue (residue-level graph) and a node in the residue is an atom (atom-level graph). Moreover, we design a spatial-sequential block for capturing complex interaction relationships from spatial and sequential forms of protein. We evaluate HSSPPI on public benchmark datasets and the predicting results outperform the comparative models. This indicates the effectiveness of hierarchical protein modeling and also illustrates that HSSPPI has a strong feature extraction ability by considering spatial and sequential information simultaneously.</p><p><strong>Availability and implementation: </strong>The code of HSSPPI is available at https://github.com/biolushuai/Hierarchical-Spatial-Sequential-Modeling-of-Protein.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 2","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11879409/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143555835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning directed acyclic graphs for ligands and receptors based on spatially resolved transcriptomic data of ovarian cancer.
IF 6.8 2区 生物学
Briefings in bioinformatics Pub Date : 2025-03-04 DOI: 10.1093/bib/bbaf085
Shrabanti Chowdhury, Sammy Ferri-Borgogno, Peng Yang, Wenyi Wang, Jie Peng, Samuel C Mok, Pei Wang
{"title":"Learning directed acyclic graphs for ligands and receptors based on spatially resolved transcriptomic data of ovarian cancer.","authors":"Shrabanti Chowdhury, Sammy Ferri-Borgogno, Peng Yang, Wenyi Wang, Jie Peng, Samuel C Mok, Pei Wang","doi":"10.1093/bib/bbaf085","DOIUrl":"10.1093/bib/bbaf085","url":null,"abstract":"<p><p>To unravel the mechanism of immune activation and suppression within tumors, a critical step is to identify transcriptional signals governing cell-cell communication between tumor and immune/stromal cells in the tumor microenvironment. Central to this communication are interactions between secreted ligands and cell-surface receptors, creating a highly connected signaling network among cells. Recent advancements in in situ-omics profiling, particularly spatial transcriptomic (ST) technology, provide unique opportunities to directly characterize ligand-receptor signaling networks that power cell-cell communication. In this paper, we propose a novel statistical method, LRnetST, to characterize the ligand-receptor interaction networks between adjacent tumor and immune/stroma cells based on ST data. LRnetST utilizes a directed acyclic graph model with a novel approach to handle the zero-inflated distributions of ST data. It also leverages existing ligand-receptor regulation databases as prior information, and employs a bootstrap aggregation strategy to achieve robust network estimation. Application of LRnetST to ST data of high-grade serous ovarian tumor samples revealed both common and distinct ligand-receptor regulations across different tumors. Some of these interactions were validated through both a MERFISH dataset and a CosMx SMI dataset of independent ovarian tumor samples. These results cast light on biological processes relating to the communication between tumor and immune/stromal cells in ovarian tumors. An open-source R package of LRnetST is available on GitHub at https://github.com/jie108/LRnetST.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 2","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11891659/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143584174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TopoQA: a topological deep learning-based approach for protein complex structure interface quality assessment.
IF 6.8 2区 生物学
Briefings in bioinformatics Pub Date : 2025-03-04 DOI: 10.1093/bib/bbaf083
Bingqing Han, Yipeng Zhang, Longlong Li, Xinqi Gong, Kelin Xia
{"title":"TopoQA: a topological deep learning-based approach for protein complex structure interface quality assessment.","authors":"Bingqing Han, Yipeng Zhang, Longlong Li, Xinqi Gong, Kelin Xia","doi":"10.1093/bib/bbaf083","DOIUrl":"10.1093/bib/bbaf083","url":null,"abstract":"<p><p>Even with the significant advances of AlphaFold-Multimer (AF-Multimer) and AlphaFold3 (AF3) in protein complex structure prediction, their accuracy is still not comparable with monomer structure prediction. Efficient and effective quality assessment (QA) or estimation of model accuracy models that can evaluate the quality of the predicted protein-complexes without knowing their native structures are of key importance for protein structure generation and model selection. In this paper, we leverage persistent homology (PH) to capture the atomic-level topological information around residues and design a topological deep learning-based QA method, TopoQA, to assess the accuracy of protein complex interfaces. We integrate PH from topological data analysis into graph neural networks (GNNs) to characterize complex higher-order structures that GNNs might overlook, enhancing the learning of the relationship between the topological structure of complex interfaces and quality scores. Our TopoQA model is extensively validated based on the two most-widely used benchmark datasets, Docking Benchmark5.5 AF2 (DBM55-AF2) and Heterodimer-AF2 (HAF2), along with our newly constructed ABAG-AF3 dataset to facilitate comparisons with AF3. For all three datasets, TopoQA outperforms AF-Multimer-based AF2Rank and shows an advantage over AF3 in nearly half of the targets. In particular, in the DBM55-AF2 dataset, a ranking loss of 73.6% lower than AF-Multimer-based AF2Rank is obtained. Further, other than AF-Multimer and AF3, we have also extensively compared with nearly-all the state-of-the-art models (as far as we know), it has been found that our TopoQA can achieve the highest Top 10 Hit-rate on the DBM55-AF2 dataset and the lowest ranking loss on the HAF2 dataset. Ablation experiments show that our topological features significantly improve the model's performance. At the same time, our method also provides a new paradigm for protein structure representation learning.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 2","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11891663/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143584536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DockEM: an enhanced method for atomic-scale protein-ligand docking refinement leveraging low-to-medium resolution cryo-EM density maps.
IF 6.8 2区 生物学
Briefings in bioinformatics Pub Date : 2025-03-04 DOI: 10.1093/bib/bbaf091
Jing Zou, Wenyi Zhang, Jun Hu, Xiaogen Zhou, Biao Zhang
{"title":"DockEM: an enhanced method for atomic-scale protein-ligand docking refinement leveraging low-to-medium resolution cryo-EM density maps.","authors":"Jing Zou, Wenyi Zhang, Jun Hu, Xiaogen Zhou, Biao Zhang","doi":"10.1093/bib/bbaf091","DOIUrl":"10.1093/bib/bbaf091","url":null,"abstract":"<p><p>Protein-ligand docking plays a pivotal role in virtual drug screening, and recent advancements in cryo-electron microscopy (cryo-EM) technology have significantly accelerated the progress of structure-based drug discovery. However, the majority of cryo-EM density maps are of medium to low resolution (3-10 Å), which presents challenges in effectively integrating cryo-EM data into molecular docking workflows. In this study, we present an updated protein-ligand docking method, DockEM, which leverages local cryo-EM density maps and physical energy refinement to precisely dock ligands into specific protein binding sites. Tested on a dataset of 121 protein-ligand compound, our results demonstrate that DockEM outperforms other advanced docking methods. The strength of DockEM lies in its ability to incorporate cryo-EM density map information, effectively leveraging the structural information of ligands embedded within these maps. This advancement enhances the use of cryo-EM density maps in virtual drug screening, offering a more reliable framework for drug discovery.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 2","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11891657/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143584800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
REDInet: a temporal convolutional network-based classifier for A-to-I RNA editing detection harnessing million known events.
IF 6.8 2区 生物学
Briefings in bioinformatics Pub Date : 2025-03-04 DOI: 10.1093/bib/bbaf107
Adriano Fonzino, Pietro Luca Mazzacuva, Adam Handen, Domenico Alessandro Silvestris, Annette Arnold, Riccardo Pecori, Graziano Pesole, Ernesto Picardi
{"title":"REDInet: a temporal convolutional network-based classifier for A-to-I RNA editing detection harnessing million known events.","authors":"Adriano Fonzino, Pietro Luca Mazzacuva, Adam Handen, Domenico Alessandro Silvestris, Annette Arnold, Riccardo Pecori, Graziano Pesole, Ernesto Picardi","doi":"10.1093/bib/bbaf107","DOIUrl":"10.1093/bib/bbaf107","url":null,"abstract":"<p><p>A-to-I ribonucleic acid (RNA) editing detection is still a challenging task. Current bioinformatics tools rely on empirical filters and whole genome sequencing or whole exome sequencing data to remove background noise, sequencing errors, and artifacts. Sometimes they make use of cumbersome and time-consuming computational procedures. Here, we present REDInet, a temporal convolutional network-based deep learning algorithm, to profile RNA editing in human RNA sequencing (RNAseq) data. It has been trained on REDIportal RNA editing sites, the largest collection of human A-to-I changes from >8000 RNAseq data of the genotype-tissue expression project. REDInet can classify editing events with high accuracy harnessing RNAseq nucleotide frequencies of 101-base windows without the need for coupled genomic data.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 2","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11924403/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143668919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
EMcnv: enhancing CNV detection performance through ensemble strategies with heterogeneous meta-graph neural networks.
IF 6.8 2区 生物学
Briefings in bioinformatics Pub Date : 2025-03-04 DOI: 10.1093/bib/bbaf135
Xuwen Wang, Zhili Chang, Yuqian Liu, Shenjie Wang, Xiaoyan Zhu, Yang Shao, Jiayin Wang
{"title":"EMcnv: enhancing CNV detection performance through ensemble strategies with heterogeneous meta-graph neural networks.","authors":"Xuwen Wang, Zhili Chang, Yuqian Liu, Shenjie Wang, Xiaoyan Zhu, Yang Shao, Jiayin Wang","doi":"10.1093/bib/bbaf135","DOIUrl":"https://doi.org/10.1093/bib/bbaf135","url":null,"abstract":"<p><p>Copy number variation (CNV) is a crucial biomarker for many complex traits and diseases. Although numerous CNV detection tools are available, no single method consistently achieves optimal performance across diverse sequencing samples, as each tool has distinct advantages and limitations. Therefore, integrating the strengths of these tools to improve CNV detection accuracy is both a promising strategy and a significant challenge. To address this, we propose EMcnv, a novel deep ensemble framework based on meta-learning. EMcnv combines multiple CNV detection strategies through a three-step approach: (i) leveraging meta-learning and meta-path heterogeneous graphs, employing Relational Graph Convolutional Networks as a specific model within the Heterogeneous Graph Neural Networks framework to develop a probabilistic weight meta-model that ensembles various CNV detection strategies; (ii) assigning probabilistic weights to calls from different CNV detection tools and aggregating them into weighted CNV regions (CNVRs); (iii) refining Copy number variations based on weighted CNVRs. We conducted comprehensive experiments on both simulated and real sequencing data using benchmark datasets. The results demonstrate that EMcnv significantly outperforms popular existing methods, underscoring its superiority and importance in CNV detection. To support further research, the source code is available for academic use at https://github.com/Sherwin-xjtu/EMcnv.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 2","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143751265","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DS-MVP: identifying disease-specific pathogenicity of missense variants by pre-training representation.
IF 6.8 2区 生物学
Briefings in bioinformatics Pub Date : 2025-03-04 DOI: 10.1093/bib/bbaf119
Qiufeng Chen, Lijun Quan, Lexin Cao, Bei Zhang, Zhijun Zhang, Liangchen Peng, Junkai Wang, Yelu Jiang, Liangpeng Nie, Geng Li, Tingfang Wu, Qiang Lyu
{"title":"DS-MVP: identifying disease-specific pathogenicity of missense variants by pre-training representation.","authors":"Qiufeng Chen, Lijun Quan, Lexin Cao, Bei Zhang, Zhijun Zhang, Liangchen Peng, Junkai Wang, Yelu Jiang, Liangpeng Nie, Geng Li, Tingfang Wu, Qiang Lyu","doi":"10.1093/bib/bbaf119","DOIUrl":"10.1093/bib/bbaf119","url":null,"abstract":"<p><p>Accurately predicting the pathogenicity of missense variants is crucial for improving disease diagnosis and advancing clinical research. However, existing computational methods primarily focus on general pathogenicity predictions, overlooking assessments of disease-specific conditions. In this study, we propose DS-MVP, a method capable of predicting disease-specific pathogenicity of missense variants in human genomes. DS-MVP first leverages a deep learning model pre-trained on a large general pathogenicity dataset to learn rich representation of missense variants. It then fine-tunes these representations with an XGBoost model on smaller datasets for specific diseases. We evaluated the learned representation by testing it on multiple binary pathogenicity datasets and gene-level statistics, demonstrating that DS-MVP outperforms existing state-of-the-art methods, such as MetaRNN and AlphaMissense. Additionally, DS-MVP excels in multi-label and multi-class classification, effectively classifying disease-specific pathogenic missense variants based on disease conditions. It further enhances predictions by fine-tuning the pre-trained model on disease-specific datasets. Finally, we analyzed the contributions of the pre-trained model and various feature types, with gene description corpus features from large language model and genetic feature fusion contributing the most. These results underscore that DS-MVP represents a broader perspective on pathogenicity prediction and holds potential as an effective tool for disease diagnosis.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 2","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11932084/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143699493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A comparison of random forest variable selection methods for regression modeling of continuous outcomes.
IF 6.8 2区 生物学
Briefings in bioinformatics Pub Date : 2025-03-04 DOI: 10.1093/bib/bbaf096
Nathaniel S O'Connell, Byron C Jaeger, Garrett S Bullock, Jaime Lynn Speiser
{"title":"A comparison of random forest variable selection methods for regression modeling of continuous outcomes.","authors":"Nathaniel S O'Connell, Byron C Jaeger, Garrett S Bullock, Jaime Lynn Speiser","doi":"10.1093/bib/bbaf096","DOIUrl":"10.1093/bib/bbaf096","url":null,"abstract":"<p><p>Random forest (RF) regression is popular machine learning method to develop prediction models for continuous outcomes. Variable selection, also known as feature selection or reduction, involves selecting a subset of predictor variables for modeling. Potential benefits of variable selection are methodologic (i.e. improving prediction accuracy and computational efficiency) and practical (i.e. reducing the burden of data collection and improving efficiency). Several variable selection methods leveraging RFs have been proposed, but there is limited evidence to guide decisions on which methods may be preferable for different types of datasets with continuous outcomes. Using 59 publicly available datasets in a benchmarking study, we evaluated the implementation of 13 RF variable selection methods. Performance of variable selection was measured via out-of-sample R2 of a RF that used the variables selected for each method. Simplicity of variable selection was measured via the percent reduction in the number of variables selected out of the number of variables available. Efficiency was measured via computational time required to complete the variable selection. Based on our benchmarking study, variable selection methods implemented in the Boruta and aorsf R packages selected the best subset of variables for axis-based RF models, whereas methods implemented in the aorsf R package selected the best subset of variables for oblique RF models. A significant contribution of this study is the ability to assess different variable selection methods in the setting of RF regression for continuous outcomes to identify preferable methods using an open science approach.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 2","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11891652/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143584797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信