NAR Genomics and Bioinformatics最新文献

筛选
英文 中文
SProtFP: a machine learning-based method for functional classification of small ORFs in prokaryotes. SProtFP:基于机器学习的原核生物小orf功能分类方法。
IF 4
NAR Genomics and Bioinformatics Pub Date : 2025-01-07 eCollection Date: 2025-03-01 DOI: 10.1093/nargab/lqae186
Akshay Khanduja, Debasisa Mohanty
{"title":"SProtFP: a machine learning-based method for functional classification of small ORFs in prokaryotes.","authors":"Akshay Khanduja, Debasisa Mohanty","doi":"10.1093/nargab/lqae186","DOIUrl":"10.1093/nargab/lqae186","url":null,"abstract":"<p><p>Small proteins (≤100 amino acids) play important roles across all life forms, ranging from unicellular bacteria to higher organisms. In this study, we have developed SProtFP which is a machine learning-based method for functional annotation of prokaryotic small proteins into selected functional categories. SProtFP uses independent artificial neural networks (ANNs) trained using a combination of physicochemical descriptors for classifying small proteins into antitoxin type 2, bacteriocin, DNA-binding, metal-binding, ribosomal protein, RNA-binding, type 1 toxin and type 2 toxin proteins. We have also trained a model for identification of small open reading frame (smORF)-encoded antimicrobial peptides (AMPs). Comprehensive benchmarking of SProtFP revealed an average area under the receiver operator curve (ROC-AUC) of 0.92 during 10-fold cross-validation and an ROC-AUC of 0.94 and 0.93 on held-out balanced and imbalanced test sets. Utilizing our method to annotate bacterial isolates from the human gut microbiome, we could identify thousands of remote homologs of known small protein families and assign putative functions to uncharacterized proteins. This highlights the utility of SProtFP for large-scale functional annotation of microbiome datasets, especially in cases where sequence homology is low. SProtFP is freely available at http://www.nii.ac.in/sprotfp.html and can be combined with genome annotation tools such as ProsmORF-pred to uncover the functional repertoire of novel small proteins in bacteria.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"7 1","pages":"lqae186"},"PeriodicalIF":4.0,"publicationDate":"2025-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11704790/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142956065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Specifying cellular context of transcription factor regulons for exploring context-specific gene regulation programs. 指定转录因子调控的细胞环境,探索环境特异性基因调控程序。
IF 4
NAR Genomics and Bioinformatics Pub Date : 2025-01-07 eCollection Date: 2025-03-01 DOI: 10.1093/nargab/lqae178
Mariia Minaeva, Júlia Domingo, Philipp Rentzsch, Tuuli Lappalainen
{"title":"Specifying cellular context of transcription factor regulons for exploring context-specific gene regulation programs.","authors":"Mariia Minaeva, Júlia Domingo, Philipp Rentzsch, Tuuli Lappalainen","doi":"10.1093/nargab/lqae178","DOIUrl":"10.1093/nargab/lqae178","url":null,"abstract":"<p><p>Understanding the role of transcription and transcription factors (TFs) in cellular identity and disease, such as cancer, is essential. However, comprehensive data resources for cell line-specific TF-to-target gene annotations are currently limited. To address this, we employed a straightforward method to define regulons that capture the cell-specific aspects of TF binding and transcript expression levels. By integrating cellular transcriptome and TF binding data, we generated regulons for 40 common cell lines comprising both proximal and distal cell line-specific regulatory events. Through systematic benchmarking involving TF knockout experiments, we demonstrated performance on par with state-of-the-art methods, with our method being easily applicable to other cell types of interest. We present case studies using three cancer single-cell datasets to showcase the utility of these cell-type-specific regulons in exploring transcriptional dysregulation. In summary, this study provides a valuable pipeline and a resource for systematically exploring cell line-specific transcriptional regulations, emphasizing the utility of network analysis in deciphering disease mechanisms.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"7 1","pages":"lqae178"},"PeriodicalIF":4.0,"publicationDate":"2025-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11704787/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142956064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
vClean: assessing virus sequence contamination in viral genomes. vClean:评估病毒基因组中的病毒序列污染。
IF 4
NAR Genomics and Bioinformatics Pub Date : 2025-01-07 eCollection Date: 2025-03-01 DOI: 10.1093/nargab/lqae185
Ryota Wagatsuma, Yohei Nishikawa, Masahito Hosokawa, Haruko Takeyama
{"title":"vClean: assessing virus sequence contamination in viral genomes.","authors":"Ryota Wagatsuma, Yohei Nishikawa, Masahito Hosokawa, Haruko Takeyama","doi":"10.1093/nargab/lqae185","DOIUrl":"10.1093/nargab/lqae185","url":null,"abstract":"<p><p>Recent advancements in viral metagenomics and single-virus genomics have improved our ability to obtain the draft genomes of environmental viruses. However, these methods can introduce virus sequence contaminations into viral genomes when short, fragmented partial sequences are present in the assembled contigs. These contaminations can lead to incorrect analyses; however, practical detection tools are lacking. In this study, we introduce vClean, a novel automated tool that detects contaminations in viral genomes. By applying machine learning to the nucleotide sequence features and gene patterns of the input viral genome, vClean could identify contaminations. Specifically, for tailed double-stranded DNA phages, we attempted accurate predictions by defining single-copy-like genes and counting their duplications. We evaluated the performance of vClean using simulated datasets derived from complete reference genomes, achieving a binary accuracy of 0.932. When vClean was applied to 4693 genomes of medium or higher quality derived from public ocean metagenomic data, 1604 genomes (34.2%) were identified as contaminated. We also demonstrated that vClean can detect contamination in single-virus genome data obtained from river water. vClean provides a new benchmark for quality control of environmental viral genomes and has the potential to become an essential tool for environmental viral genome analysis.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"7 1","pages":"lqae185"},"PeriodicalIF":4.0,"publicationDate":"2025-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11704788/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142956067","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ProPr54 web server: predicting σ54 promoters and regulon with a hybrid convolutional and recurrent deep neural network. ProPr54 web服务器:用混合卷积和循环深度神经网络预测σ54启动子和调控子。
IF 4
NAR Genomics and Bioinformatics Pub Date : 2025-01-07 eCollection Date: 2025-03-01 DOI: 10.1093/nargab/lqae188
Tristan Achterberg, Anne de Jong
{"title":"ProPr54 web server: predicting σ<sup>54</sup> promoters and regulon with a hybrid convolutional and recurrent deep neural network.","authors":"Tristan Achterberg, Anne de Jong","doi":"10.1093/nargab/lqae188","DOIUrl":"10.1093/nargab/lqae188","url":null,"abstract":"<p><p>σ<sup>54</sup> serves as an unconventional sigma factor with a distinct mechanism of transcription initiation, which depends on the involvement of a transcription activator. This unique sigma factor σ<sup>54</sup> is indispensable for orchestrating the transcription of genes crucial to nitrogen regulation, flagella biosynthesis, motility, chemotaxis and various other essential cellular processes. Currently, no comprehensive tools are available to determine σ<sup>54</sup> promoters and regulon in bacterial genomes. Here, we report a σ<sup>54</sup> promoter prediction method ProPr54, based on a convolutional neural network trained on a set of 446 validated σ<sup>54</sup> binding sites derived from 33 bacterial species. Model performance was tested and compared with respect to bacterial intergenic regions, demonstrating robust applicability. ProPr54 exhibits high performance when tested on various bacterial species, highly surpassing other available σ<sup>54</sup> regulon identification methods. Furthermore, analysis on bacterial genomes, which have no experimentally validated σ<sup>54</sup> binding sites, demonstrates the generalization of the model. ProPr54 is the first reliable <i>in</i> <i>silico</i> method for predicting σ<sup>54</sup> binding sites, making it a valuable tool to support experimental studies on σ<sup>54</sup>. In conclusion, ProPr54 offers a reliable, broadly applicable tool for predicting σ<sup>54</sup> promoters and regulon genes in bacterial genome sequences. A web server is freely accessible at http://propr54.molgenrug.nl.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"7 1","pages":"lqae188"},"PeriodicalIF":4.0,"publicationDate":"2025-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11704786/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142956062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Long-read structural and epigenetic profiling of a kidney tumor-matched sample with nanopore sequencing and optical genome mapping. 用纳米孔测序和光学基因组作图对肾肿瘤匹配样本进行长读结构和表观遗传分析。
IF 4
NAR Genomics and Bioinformatics Pub Date : 2025-01-07 eCollection Date: 2025-03-01 DOI: 10.1093/nargab/lqae190
Sapir Margalit, Zuzana Tulpová, Tahir Detinis Zur, Yael Michaeli, Jasline Deek, Gil Nifker, Rita Haldar, Yehudit Gnatek, Dorit Omer, Benjamin Dekel, Hagit Baris Feldman, Assaf Grunwald, Yuval Ebenstein
{"title":"Long-read structural and epigenetic profiling of a kidney tumor-matched sample with nanopore sequencing and optical genome mapping.","authors":"Sapir Margalit, Zuzana Tulpová, Tahir Detinis Zur, Yael Michaeli, Jasline Deek, Gil Nifker, Rita Haldar, Yehudit Gnatek, Dorit Omer, Benjamin Dekel, Hagit Baris Feldman, Assaf Grunwald, Yuval Ebenstein","doi":"10.1093/nargab/lqae190","DOIUrl":"10.1093/nargab/lqae190","url":null,"abstract":"<p><p>Carcinogenesis often involves significant alterations in the cancer genome, marked by large structural variants (SVs) and copy number variations (CNVs) that are difficult to capture with short-read sequencing. Traditionally, cytogenetic techniques are applied to detect such aberrations, but they are limited in resolution and do not cover features smaller than several hundred kilobases. Optical genome mapping (OGM) and nanopore sequencing [Oxford Nanopore Technologies (ONT)] bridge this resolution gap and offer enhanced performance for cytogenetic applications. Additionally, both methods can capture epigenetic information as they profile native, individual DNA molecules. We compared the effectiveness of the two methods in characterizing the structural, copy number and epigenetic landscape of a clear cell renal cell carcinoma tumor. Both methods provided comparable results for basic karyotyping and CNVs, but differed in their ability to detect SVs of different sizes and types. ONT outperformed OGM in detecting small SVs, while OGM excelled in detecting larger SVs, including translocations. Differences were also observed among various ONT SV callers. Additionally, both methods provided insights into the tumor's methylome and hydroxymethylome. While ONT was superior in methylation calling, hydroxymethylation reports can be further optimized. Our findings underscore the importance of carefully selecting the most appropriate platform based on specific research questions.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"7 1","pages":"lqae190"},"PeriodicalIF":4.0,"publicationDate":"2025-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11704781/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142955985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Phenotype prediction in plants is improved by integrating large-scale transcriptomic datasets. 通过整合大规模转录组数据集,植物表型预测得到了改善。
IF 4
NAR Genomics and Bioinformatics Pub Date : 2024-12-27 eCollection Date: 2024-12-01 DOI: 10.1093/nargab/lqae184
Zefeng Wu, Yali Sun, Xiaoqiang Zhao, Zigang Liu, Wenqi Zhou, Yining Niu
{"title":"Phenotype prediction in plants is improved by integrating large-scale transcriptomic datasets.","authors":"Zefeng Wu, Yali Sun, Xiaoqiang Zhao, Zigang Liu, Wenqi Zhou, Yining Niu","doi":"10.1093/nargab/lqae184","DOIUrl":"10.1093/nargab/lqae184","url":null,"abstract":"<p><p>Research on the dynamic expression of genes in plants is important for understanding different biological processes. We used the large amounts of transcriptomic data from various plant sample sources that are publicly available to investigate whether the expression levels of a subset of highly variable genes (HVGs) can be used to accurately identify the phenotypes of plants. Using maize (<i>Zea mays</i> L.) as an example, we built machine learning (ML) models to predict phenotypes using a gene expression dataset of 21 612 bulk RNA sequencing samples. We showed that the ML models achieved excellent prediction accuracy using only the HVGs to identify different phenotypes, including tissue types, developmental stages, cultivars and stress conditions. By ML models, several important functional genes were found to be associated with different phenotypes. We performed a similar analysis in rice (<i>Orzya sativa</i> L.) and found that the ML models could be generalized across species. However, the models trained from maize did not perform well in rice, probably because of the expression divergence of the conserved HVGs between the two species. Overall, our results provide an ML framework for phenotype prediction using gene expression profiles, which may contribute to precision management of crops in agricultural practices.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 4","pages":"lqae184"},"PeriodicalIF":4.0,"publicationDate":"2024-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11672113/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142903716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SARS-CoV-2 Illumina GeNome Assembly Line (SIGNAL), a Snakemate workflow for rapid and bulk analysis of Illumina sequencing of SARS-CoV-2 genomes. SARS-CoV-2 Illumina GeNome Assembly Line (SIGNAL),用于快速和批量分析SARS-CoV-2基因组Illumina测序的Snakemate工作流程。
IF 4
NAR Genomics and Bioinformatics Pub Date : 2024-12-18 eCollection Date: 2024-12-01 DOI: 10.1093/nargab/lqae176
Jalees A Nasir, Finlay Maguire, Kendrick M Smith, Emily M Panousis, Sheridan J C Baker, Patryk Aftanas, Amogelang R Raphenya, Brian P Alcock, Hassaan Maan, Natalie C Knox, Arinjay Banerjee, Karen Mossman, Bo Wang, Jared T Simpson, Robert A Kozak, Samira Mubareka, Andrew G McArthur
{"title":"SARS-CoV-2 Illumina GeNome Assembly Line (SIGNAL), a Snakemate workflow for rapid and bulk analysis of Illumina sequencing of SARS-CoV-2 genomes.","authors":"Jalees A Nasir, Finlay Maguire, Kendrick M Smith, Emily M Panousis, Sheridan J C Baker, Patryk Aftanas, Amogelang R Raphenya, Brian P Alcock, Hassaan Maan, Natalie C Knox, Arinjay Banerjee, Karen Mossman, Bo Wang, Jared T Simpson, Robert A Kozak, Samira Mubareka, Andrew G McArthur","doi":"10.1093/nargab/lqae176","DOIUrl":"10.1093/nargab/lqae176","url":null,"abstract":"<p><p>The incorporation of sequencing technologies in frontline and public health healthcare settings was vital in developing virus surveillance programs during the Coronavirus Disease 2019 (COVID-19) pandemic caused by transmission of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). However, increased data acquisition poses challenges for both rapid and accurate analyses. To overcome these hurdles, we developed the SARS-CoV-2 Illumina GeNome Assembly Line (SIGNAL) for quick bulk analyses of Illumina short-read sequencing data. SIGNAL is a Snakemake workflow that seamlessly manages parallel tasks to process large volumes of sequencing data. A series of outputs are generated, including consensus genomes, variant calls, lineage assessments and identified variants of concern (VOCs). Compared to other existing SARS-CoV-2 sequencing workflows, SIGNAL is one of the fastest-performing analysis tools while maintaining high accuracy. The source code is publicly available (github.com/jaleezyy/covid-19-signal) and is optimized to run on various systems, with software compatibility and resource management all handled within the workflow. Overall, SIGNAL illustrated its capacity for high-volume analyses through several contributions to publicly funded government public health surveillance programs and can be a valuable tool for continuing SARS-CoV-2 Illumina sequencing efforts and will inform the development of similar strategies for rapid viral sequence assessment.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 4","pages":"lqae176"},"PeriodicalIF":4.0,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11655287/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142865723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring transcription modalities from bimodal, single-cell RNA sequencing data. 从双模单细胞 RNA 测序数据中探索转录模式
IF 4
NAR Genomics and Bioinformatics Pub Date : 2024-12-18 eCollection Date: 2024-12-01 DOI: 10.1093/nargab/lqae179
Enikő Regényi, Mir-Farzin Mashreghi, Christof Schütte, Vikram Sunkara
{"title":"Exploring transcription modalities from bimodal, single-cell RNA sequencing data.","authors":"Enikő Regényi, Mir-Farzin Mashreghi, Christof Schütte, Vikram Sunkara","doi":"10.1093/nargab/lqae179","DOIUrl":"10.1093/nargab/lqae179","url":null,"abstract":"<p><p>There is a growing interest in generating bimodal, single-cell RNA sequencing (RNA-seq) data for studying biological pathways. These data are predominantly utilized in understanding phenotypic trajectories using RNA velocities; however, the shape information encoded in the two-dimensional resolution of such data is not yet exploited. In this paper, we present an elliptical parametrization of two-dimensional RNA-seq data, from which we derived statistics that reveal four different modalities. These modalities can be interpreted as manifestations of the changes in the rates of splicing, transcription or degradation. We performed our analysis on a cell cycle and a colorectal cancer dataset. In both datasets, we found genes that are not picked up by differential gene expression analysis (DGEA), and are consequently unnoticed, yet visibly delineate phenotypes. This indicates that, in addition to DGEA, searching for genes that exhibit the discovered modalities could aid recovering genes that set phenotypes apart. For communities studying biomarkers and cellular phenotyping, the modalities present in bimodal RNA-seq data broaden the search space of genes, and furthermore, allow for incorporating cellular RNA processing into regulatory analyses.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 4","pages":"lqae179"},"PeriodicalIF":4.0,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11655292/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142865652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mining single-cell data for cell type-disease associations. 挖掘单细胞数据的细胞类型-疾病关联。
IF 4
NAR Genomics and Bioinformatics Pub Date : 2024-12-18 eCollection Date: 2024-12-01 DOI: 10.1093/nargab/lqae180
Kevin G Chen, Kathryn O Farley, Timo Lassmann
{"title":"Mining single-cell data for cell type-disease associations.","authors":"Kevin G Chen, Kathryn O Farley, Timo Lassmann","doi":"10.1093/nargab/lqae180","DOIUrl":"10.1093/nargab/lqae180","url":null,"abstract":"<p><p>A robust understanding of the cellular mechanisms underlying diseases sets the foundation for the effective design of drugs and other interventions. The wealth of existing single-cell atlases offers the opportunity to uncover high-resolution information on expression patterns across various cell types and time points. To better understand the associations between cell types and diseases, we leveraged previously developed tools to construct a standardized analysis pipeline and systematically explored associations across four single-cell datasets, spanning a range of tissue types, cell types and developmental time periods. We utilized a set of existing tools to identify co-expression modules and temporal patterns per cell type and then investigated these modules for known disease and phenotype enrichments. Our pipeline reveals known and novel putative cell type-disease associations across all investigated datasets. In addition, we found that automatically discovered gene co-expression modules and temporal clusters are enriched for drug targets, suggesting that our analysis could be used to identify novel therapeutic targets.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 4","pages":"lqae180"},"PeriodicalIF":4.0,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11655289/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142865709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PyNetCor: a high-performance Python package for large-scale correlation analysis. PyNetCor:用于大规模相关分析的高性能 Python 软件包。
IF 4
NAR Genomics and Bioinformatics Pub Date : 2024-12-18 eCollection Date: 2024-12-01 DOI: 10.1093/nargab/lqae177
Shibin Long, Yan Xia, Lifeng Liang, Ying Yang, Hailiang Xie, Xiaokai Wang
{"title":"PyNetCor: a high-performance Python package for large-scale correlation analysis.","authors":"Shibin Long, Yan Xia, Lifeng Liang, Ying Yang, Hailiang Xie, Xiaokai Wang","doi":"10.1093/nargab/lqae177","DOIUrl":"10.1093/nargab/lqae177","url":null,"abstract":"<p><p>The development of multi-omics technologies has generated an abundance of biological datasets, providing valuable resources for investigating potential relationships within complex biological systems. However, most correlation analysis tools face computational challenges when dealing with these high-dimensional datasets containing millions of features. Here, we introduce pyNetCor, a fast and scalable tool for constructing correlation networks on large-scale and high-dimensional data. PyNetCor features optimized algorithms for both full correlation coefficient matrix computation and top-k correlation search, outperforming other tools in the field in terms of runtime and memory consumption. It utilizes a linear interpolation strategy to rapidly estimate <i>P-</i>values and achieve false discovery rate control, demonstrating a speedup of over 110 times compared to existing methods. Overall, pyNetCor supports large-scale correlation analysis, a crucial foundational step for various bioinformatics workflows, and can be easily integrated into downstream applications to accelerate the process of extracting biological insights from data.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 4","pages":"lqae177"},"PeriodicalIF":4.0,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11655297/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142865719","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信