Briefings in bioinformatics最新文献

筛选
英文 中文
Seq2Topt: a sequence-based deep learning predictor of enzyme optimal temperature. Seq2Topt:基于序列的酶最佳温度深度学习预测器。
IF 6.8 2区 生物学
Briefings in bioinformatics Pub Date : 2025-03-04 DOI: 10.1093/bib/bbaf114
Sizhe Qiu, Bozhen Hu, Jing Zhao, Weiren Xu, Aidong Yang
{"title":"Seq2Topt: a sequence-based deep learning predictor of enzyme optimal temperature.","authors":"Sizhe Qiu, Bozhen Hu, Jing Zhao, Weiren Xu, Aidong Yang","doi":"10.1093/bib/bbaf114","DOIUrl":"10.1093/bib/bbaf114","url":null,"abstract":"<p><p>An accurate deep learning predictor is needed for enzyme optimal temperature (${T}_{opt}$), which quantitatively describes how temperature affects the enzyme catalytic activity. In comparison with existing models, a new model developed in this study, Seq2Topt, reached a superior accuracy on ${T}_{opt}$ prediction just using protein sequences (RMSE = 12.26°C and R2 = 0.57), and could capture key protein regions for enzyme ${T}_{opt}$ with multi-head attention on residues. Through case studies on thermophilic enzyme selection and predicting enzyme ${T}_{opt}$ shifts caused by point mutations, Seq2Topt was demonstrated as a promising computational tool for enzyme mining and in-silico enzyme design. Additionally, accurate deep learning predictors of enzyme optimal pH (Seq2pHopt, RMSE = 0.88 and R2 = 0.42) and melting temperature (Seq2Tm, RMSE = 7.57 °C and R2 = 0.64) were developed based on the model architecture of Seq2Topt, suggesting that the development of Seq2Topt could potentially give rise to a useful prediction platform of enzymes.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 2","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11904407/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143623604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluating long-read assemblers to assemble several aphididae genomes. 评估长读汇编器组装多个蚜虫基因组。
IF 6.8 2区 生物学
Briefings in bioinformatics Pub Date : 2025-03-04 DOI: 10.1093/bib/bbaf105
Nicolaas F V Burger, Vittorio F Nicolis, Anna-Maria Botha
{"title":"Evaluating long-read assemblers to assemble several aphididae genomes.","authors":"Nicolaas F V Burger, Vittorio F Nicolis, Anna-Maria Botha","doi":"10.1093/bib/bbaf105","DOIUrl":"10.1093/bib/bbaf105","url":null,"abstract":"<p><p>Aphids are a speciose family of the Hemiptera compromising >5500 species. They have adapted to feed off multiple plant species and occur on every continent on Earth. Although economically devastating, very few aphid genomes have been sequenced and assembled, and those that have suffer low contiguity due to repeat-rich and AT-rich genomes. With third-generation sequencing becoming more affordable and approaching quality levels to that of second-generation sequencing, the ability to produce more contiguous aphid genome assemblies is becoming a reality. With a growing list of long-read assemblers becoming available, the choice of which assembly tool to use becomes more complicated. In this study, six recently released long-read assemblers (Canu, Flye, Hifiasm, Mecat2, Raven, and Wtdbg2) were evaluated on several quality and contiguity metrics after assembling four populations (or biotypes) of the same species (Russian wheat aphid, Diuraphis noxia) and two unrelated aphid species that have publicly available long-read sequences. All assemblers did not fare equally well between the different read sets, but, overall, the Hifiasm and Canu assemblers performed the best. Merging of the best assemblies for each read set was also performed using quickmerge, where, in some cases, it resulted in superior assemblies and, in others, introduced more errors. Ab initio gene calling between assemblies of the same read set also showed surprisingly less similarity than expected. Overall, the quality control pipeline followed during the assembly resulted in chromosome-level assemblies with minimal structural or quality artefacts.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 2","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11904405/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143623664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Pragmatic soft-decision data readout of encoded large DNA. 编码大DNA的实用软决策数据读出。
IF 6.8 2区 生物学
Briefings in bioinformatics Pub Date : 2025-03-04 DOI: 10.1093/bib/bbaf102
Qi Ge, Rui Qin, Shuang Liu, Quan Guo, Changcai Han, Weigang Chen
{"title":"Pragmatic soft-decision data readout of encoded large DNA.","authors":"Qi Ge, Rui Qin, Shuang Liu, Quan Guo, Changcai Han, Weigang Chen","doi":"10.1093/bib/bbaf102","DOIUrl":"10.1093/bib/bbaf102","url":null,"abstract":"<p><p>The encoded large DNA can be cloned and stored in vivo, capable of write-once and stable replication for multiple retrievals, offering potential in economic data archiving. Nanopore sequencing is advantageous in data access of large DNA due to its rapidity and long-read sequencing capability. However, the data readout is commonly limited by insertion and deletion (indel) errors and sequence assembly complexity. Here, a pragmatic soft-decision data readout is presented, achieving assembly-free sequence reconstruction, indel error correction, and ultra-low coverage data readout. Specifically, the watermark is cleverly embedded within large DNA fragments, allowing for the direct localization of raw reads via watermark alignment to avoid complex read assembly. A soft-decision forward-backward algorithm is proposed, which can identify indel errors and provide probability information to the error correction code, enabling error-free data recovery. Additionally, a minimum state transition is maintained, and a read segmentation is incorporated to achieve fast information reading. The readout assays for two circular plasmids (~51 kb) with different coding rates were demonstrated and achieved error-free recovery directly from noisy reads (error rate ~1%) at coverage of 1-4×. Simulations conducted on large-scale datasets across various error rates further confirm the scalability of the method and its robust performance under extreme conditions. This readout method enables nearly single-molecule recovery of large DNA, particularly suitable for rapid readout of DNA storage.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 2","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11911122/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143647406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Elementary methods provide more replicable results in microbial differential abundance analysis. 基本方法在微生物差异丰度分析中提供了更多的可重复性结果。
IF 6.8 2区 生物学
Briefings in bioinformatics Pub Date : 2025-03-04 DOI: 10.1093/bib/bbaf130
Juho Pelto, Kari Auranen, Janne V Kujala, Leo Lahti
{"title":"Elementary methods provide more replicable results in microbial differential abundance analysis.","authors":"Juho Pelto, Kari Auranen, Janne V Kujala, Leo Lahti","doi":"10.1093/bib/bbaf130","DOIUrl":"10.1093/bib/bbaf130","url":null,"abstract":"<p><p>Differential abundance analysis (DAA) is a key component of microbiome studies. Although dozens of methods exist, there is currently no consensus on the preferred methods. While the correctness of results in DAA is an ambiguous concept and cannot be fully evaluated without setting the ground truth and employing simulated data, we argue that a well-performing method should be effective in producing highly reproducible results. We compared the performance of 14 DAA methods by employing datasets from 53 taxonomic profiling studies based on 16S rRNA gene or shotgun metagenomic sequencing. For each method, we examined how the results replicated between random partitions of each dataset and between datasets from separate studies. While certain methods showed good consistency, some widely used methods were observed to produce a substantial number of conflicting findings. Overall, when considering consistency together with sensitivity, the best performance was attained by analyzing relative abundances with a nonparametric method (Wilcoxon test or ordinal regression model) or linear regression/t-test. Moreover, a comparable performance was obtained by analyzing presence/absence of taxa with logistic regression.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 2","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11937625/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143708593","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Data imbalance in drug response prediction: multi-objective optimization approach in deep learning setting. 药物反应预测中的数据不平衡:深度学习环境下的多目标优化方法。
IF 6.8 2区 生物学
Briefings in bioinformatics Pub Date : 2025-03-04 DOI: 10.1093/bib/bbaf134
Oleksandr Narykov, Yitan Zhu, Thomas Brettin, Yvonne A Evrard, Alexander Partin, Fangfang Xia, Maulik Shukla, Priyanka Vasanthakumari, James H Doroshow, Rick L Stevens
{"title":"Data imbalance in drug response prediction: multi-objective optimization approach in deep learning setting.","authors":"Oleksandr Narykov, Yitan Zhu, Thomas Brettin, Yvonne A Evrard, Alexander Partin, Fangfang Xia, Maulik Shukla, Priyanka Vasanthakumari, James H Doroshow, Rick L Stevens","doi":"10.1093/bib/bbaf134","DOIUrl":"10.1093/bib/bbaf134","url":null,"abstract":"<p><p>Drug response prediction (DRP) methods tackle the complex task of associating the effectiveness of small molecules with the specific genetic makeup of the patient. Anti-cancer DRP is a particularly challenging task requiring costly experiments as underlying pathogenic mechanisms are broad and associated with multiple genomic pathways. The scientific community has exerted significant efforts to generate public drug screening datasets, giving a path to various machine learning models that attempt to reason over complex data space of small compounds and biological characteristics of tumors. However, the data depth is still lacking compared to application domains like computer vision or natural language processing domains, limiting current learning capabilities. To combat this issue and improves the generalizability of the DRP models, we are exploring strategies that explicitly address the imbalance in the DRP datasets. We reframe the problem as a multi-objective optimization across multiple drugs to maximize deep learning model performance. We implement this approach by constructing Multi-Objective Optimization Regularized by Loss Entropy loss function and plugging it into a Deep Learning model. We demonstrate the utility of proposed drug discovery methods and make suggestions for further potential application of the work to achieve desirable outcomes in the healthcare field.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 2","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11966611/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143771441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SMAC: identifying DNA N6-methyladenine (6mA) at the single-molecule level using SMRT CCS data. SMAC:利用SMRT CCS数据在单分子水平上鉴定DNA n6 -甲基腺嘌呤(6mA)。
IF 6.8 2区 生物学
Briefings in bioinformatics Pub Date : 2025-03-04 DOI: 10.1093/bib/bbaf153
Haicheng Li, Junhua Niu, Yalan Sheng, Yifan Liu, Shan Gao
{"title":"SMAC: identifying DNA N6-methyladenine (6mA) at the single-molecule level using SMRT CCS data.","authors":"Haicheng Li, Junhua Niu, Yalan Sheng, Yifan Liu, Shan Gao","doi":"10.1093/bib/bbaf153","DOIUrl":"https://doi.org/10.1093/bib/bbaf153","url":null,"abstract":"<p><p>DNA modifications, such as N6-methyladenine (6mA), play important roles in various processes in eukaryotes. Single-molecule, real-time (SMRT) sequencing enables the direct detection of DNA modifications without requiring special sample preparation. However, most SMRT-based studies of 6mA rely on ensemble-level consensus by combining multiple reads covering the same genomic position, which misses the single-molecule heterogeneity. While recent methods have aimed at single-molecule level detection of 6mA, limitations in sequencing platforms, resolution, accuracy, and usability restrict their application in comprehensive epigenetic studies. Here, we present SMAC (single-molecule 6mA analysis of CCS reads), a novel framework for accurately detecting 6mA at the single-molecule level using SMRT circular consensus sequencing (CCS) data from the Sequel II system. It is an automated method that streamlines the entire workflow by packaging both existing softwares and built-in scripts, with user-defined parameters to allow easy adaptation for various studies. By utilizing the statistical distribution characteristics of enzyme kinetic indicators on single DNA molecules rather than a fixed cutoff, SMAC significantly improves 6mA detection accuracy at the single-nucleotide and single-molecule levels. It simplifies analysis by providing comprehensive information, including quality control, statistical analysis, and site visualization, directly from raw sequencing data. SMAC is a powerful new tool that enables de novo detection of 6mA and empowers investigation of its functions in modulating physiological processes.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 2","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11980416/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143964121","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TaxSEA: rapid interpretation of microbiome alterations using taxon set enrichment analysis and public databases. TaxSEA:利用分类单元集富集分析和公共数据库快速解释微生物组变化。
IF 6.8 2区 生物学
Briefings in bioinformatics Pub Date : 2025-03-04 DOI: 10.1093/bib/bbaf173
Cong M Pham, Timothy J Rankin, Timothy P Stinear, Calum J Walsh, Feargal J Ryan
{"title":"TaxSEA: rapid interpretation of microbiome alterations using taxon set enrichment analysis and public databases.","authors":"Cong M Pham, Timothy J Rankin, Timothy P Stinear, Calum J Walsh, Feargal J Ryan","doi":"10.1093/bib/bbaf173","DOIUrl":"https://doi.org/10.1093/bib/bbaf173","url":null,"abstract":"<p><p>Microbial communities are essential regulators of ecosystem function, with their composition commonly assessed through DNA sequencing. Most current tools focus on detecting changes among individual taxa (e.g. species or genera), however in other omics fields, such as transcriptomics, enrichment analyses like gene set enrichment analysis are commonly used to uncover patterns not seen with individual features. Here, we introduce TaxSEA, a taxon set enrichment analysis tool available as an R package, a web portal (https://shiny.taxsea.app), and a Python package. TaxSEA integrates taxon sets from five public microbiota databases (BugSigDB, MiMeDB, GutMGene, mBodyMap, and GMRepoV2) while also allowing users to incorporate custom sets such as taxonomic groupings. In silico assessments show TaxSEA is accurate across a range of set sizes. When applied to differential abundance analysis output from inflammatory bowel disease and type 2 diabetes metagenomic data, TaxSEA can rapidly identify changes in functional groups corresponding to known associations. We also show that TaxSEA is robust to the choice of differential abundance analysis package. In summary, TaxSEA enables researchers to efficiently contextualize their findings within the broader microbiome literature, facilitating rapid interpretation, and advancing understanding of microbiome-host and environmental interactions.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 2","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12009713/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143964680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GEMDiff: a diffusion workflow bridges between normal and tumor gene expression states: a breast cancer case study. GEMDiff:正常和肿瘤基因表达状态之间的扩散工作流程桥梁:乳腺癌案例研究。
IF 6.8 2区 生物学
Briefings in bioinformatics Pub Date : 2025-03-04 DOI: 10.1093/bib/bbaf093
Xusheng Ai, Melissa C Smith, F Alex Feltus
{"title":"GEMDiff: a diffusion workflow bridges between normal and tumor gene expression states: a breast cancer case study.","authors":"Xusheng Ai, Melissa C Smith, F Alex Feltus","doi":"10.1093/bib/bbaf093","DOIUrl":"10.1093/bib/bbaf093","url":null,"abstract":"<p><p>Breast cancer remains a significant global health challenge due to its complexity, which arises from multiple genetic and epigenetic mutations that originate in normal breast tissue. Traditional machine learning models often fall short in addressing the intricate gene interactions that complicate drug design and treatment strategies. In contrast, our study introduces GEMDiff, a novel computational workflow leveraging a diffusion model to bridge the gene expression states between normal and tumor conditions. GEMDiff augments RNAseq data and simulates perturbation transformations between normal and tumor gene states, enhancing biomarker identification. GEMDiff can handle large-scale gene expression data without succumbing to the scalability and stability issues that plague other generative models. By avoiding the need for task-specific hyper-parameter tuning and specific loss functions, GEMDiff can be generalized across various tasks, making it a robust tool for gene expression analysis. The model's ability to augment RNA-seq data and simulate gene perturbations provides a valuable tool for researchers. This capability can be used to generate synthetic data for training other machine learning models, thereby addressing the issue of limited biological data and enhancing the performance of predictive models. The effectiveness of GEMDiff is demonstrated through a case study using breast mRNA gene expression data, identifying 307 core genes involved in the transition from a breast tumor to a normal gene expression state. GEMDiff is open source and available at https://github.com/xai990/GEMDiff.git under the MIT license.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 2","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11894803/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143603123","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Inferring kinase-phosphosite regulation from phosphoproteome-enriched cancer multi-omics datasets. 从富含磷酸化蛋白组的癌症多组学数据集推断激酶-磷酸化调控。
IF 6.8 2区 生物学
Briefings in bioinformatics Pub Date : 2025-03-04 DOI: 10.1093/bib/bbaf143
Haoyang Cheng, Zhuoran Liang, Yijin Wu, Jiamin Hu, Bijin Cao, Zekun Liu, Bo Liu, Han Cheng, Ze-Xian Liu
{"title":"Inferring kinase-phosphosite regulation from phosphoproteome-enriched cancer multi-omics datasets.","authors":"Haoyang Cheng, Zhuoran Liang, Yijin Wu, Jiamin Hu, Bijin Cao, Zekun Liu, Bo Liu, Han Cheng, Ze-Xian Liu","doi":"10.1093/bib/bbaf143","DOIUrl":"https://doi.org/10.1093/bib/bbaf143","url":null,"abstract":"<p><p>Phosphorylation in eukaryotic cells plays a key role in regulating cell signaling and disease progression. Despite the ability to detect thousands of phosphosites in a single experiment using high-throughput technologies, the kinases responsible for regulating these sites are largely unidentified. To solve this, we collected the quantitative data at the transcriptional, protein, and phosphorylation levels of 10 159 samples from 23 tumor datasets and 15 adjacent normal tissue datasets. Our analysis aimed to uncover the potential impact and linkage of kinase-phosphosite (KPS) pairs through experimental evidence in publications and prediction tools commonly used. We discovered that both experimentally validated and tool-predicted KPS pairs were enriched in groups where there is a significant correlation between kinase expression/phosphorylation level and the phosphorylation level of phosphosite. This suggested that a quantitative correlation could infer the KPS interconnections. Furthermore, the Spearman's correlation coefficient for these pairs were notably higher in tumor samples, indicating that these regulatory interactions are particularly pronounced in tumors. Consequently, building on the KPS correlations of different datasets as predictive features, we have developed an innovative approach that employed an oversampling method combined with and XGBoost algorithm (SMOTE-XGBoost) to predict potential kinase-specific phosphorylation sites in proteins. Moreover, the computed correlations and predictions of kinase-phosphosite interconnections were integrated into the eKPI database (https://ekpi.omicsbio.info/). In summary, our study could provide helpful information and facilitate further research on the regulatory relationship between kinases and phosphosites.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 2","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143802513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
deepTAD: an approach for identifying topologically associated domains based on convolutional neural network and transformer model. deepTAD:一种基于卷积神经网络和变压器模型的拓扑关联域识别方法。
IF 6.8 2区 生物学
Briefings in bioinformatics Pub Date : 2025-03-04 DOI: 10.1093/bib/bbaf127
Xiaoyan Wang, Junwei Luo, Lili Wu, Huimin Luo, Fei Guo
{"title":"deepTAD: an approach for identifying topologically associated domains based on convolutional neural network and transformer model.","authors":"Xiaoyan Wang, Junwei Luo, Lili Wu, Huimin Luo, Fei Guo","doi":"10.1093/bib/bbaf127","DOIUrl":"10.1093/bib/bbaf127","url":null,"abstract":"<p><strong>Motivation: </strong>Topologically associated domains (TADs) play a key role in the 3D organization and function of genomes, and accurate detection of TADs is essential for revealing the relationship between genomic structure and function. Most current methods are developed to extract features in Hi-C interaction matrix to identify TADs. However, due to complexities in Hi-C contact matrices, it is difficult to directly extract features associated with TADs, which prevents current methods from identifying accurate TADs.</p><p><strong>Results: </strong>In this paper, a novel method is proposed, deepTAD, which is developed based on a convolutional neural network (CNN) and transformer model. First, based on Hi-C contact matrix, deepTAD utilizes CNN to directly extract features associated with TAD boundaries. Next, deepTAD takes advantage of the transformer model to analyze the variation features around TAD boundaries and determines the TAD boundaries. Second, deepTAD uses the Wilcoxon rank-sum test to further identify false-positive boundaries. Finally, deepTAD computes cosine similarity among identified TAD boundaries and assembles TAD boundaries to obtain hierarchical TADs. The experimental results show that TAD boundaries identified by deepTAD have a significant enrichment of biological features, including structural proteins, histone modifications, and transcription start site loci. Additionally, when evaluating the completeness and accuracy of identified TADs, deepTAD has a good performance compared with other methods. The source code of deepTAD is available at https://github.com/xiaoyan-wang99/deepTAD.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 2","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11934553/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143699512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信