Identification of DNase I hypersensitive sites in the human genome by multiple sequence descriptors

IF 4.2 3区生物学 Q1 BIOCHEMICAL RESEARCH METHODS

Methods Pub Date : 2024-07-02 DOI:10.1016/j.ymeth.2024.06.012

Yan-Ting Jin , Yang Tan , Zhong-Hua Gan , Yu-Duo Hao , Tian-Yu Wang , Hao Lin , Bo Tang

{"title":"Identification of DNase I hypersensitive sites in the human genome by multiple sequence descriptors","authors":"Yan-Ting Jin , Yang Tan , Zhong-Hua Gan , Yu-Duo Hao , Tian-Yu Wang , Hao Lin , Bo Tang","doi":"10.1016/j.ymeth.2024.06.012","DOIUrl":null,"url":null,"abstract":"<div><p>DNase I hypersensitive sites (DHSs) are chromatin regions highly sensitive to DNase I enzymes. Studying DHSs is crucial for understanding complex transcriptional regulation mechanisms and localizing <em>cis</em>-regulatory elements (CREs). Numerous studies have indicated that disease-related loci are often enriched in DHSs regions, underscoring the importance of identifying DHSs. Although wet experiments exist for DHSs identification, they are often labor-intensive. Therefore, there is a strong need to develop computational methods for this purpose. In this study, we used experimental data to construct a benchmark dataset. Seven feature extraction methods were employed to capture information about human DHSs. The <em>F</em>-score was applied to filter the features. By comparing the prediction performance of various classification algorithms through five-fold cross-validation, random forest was proposed to perform the final model construction. The model could produce an overall prediction accuracy of 0.859 with an AUC value of 0.837. We hope that this model can assist scholars conducting DNase research in identifying these sites.</p></div>","PeriodicalId":390,"journal":{"name":"Methods","volume":"229 ","pages":"Pages 125-132"},"PeriodicalIF":4.2000,"publicationDate":"2024-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Methods","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1046202324001622","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

DNase I hypersensitive sites (DHSs) are chromatin regions highly sensitive to DNase I enzymes. Studying DHSs is crucial for understanding complex transcriptional regulation mechanisms and localizing cis-regulatory elements (CREs). Numerous studies have indicated that disease-related loci are often enriched in DHSs regions, underscoring the importance of identifying DHSs. Although wet experiments exist for DHSs identification, they are often labor-intensive. Therefore, there is a strong need to develop computational methods for this purpose. In this study, we used experimental data to construct a benchmark dataset. Seven feature extraction methods were employed to capture information about human DHSs. The F-score was applied to filter the features. By comparing the prediction performance of various classification algorithms through five-fold cross-validation, random forest was proposed to perform the final model construction. The model could produce an overall prediction accuracy of 0.859 with an AUC value of 0.837. We hope that this model can assist scholars conducting DNase research in identifying these sites.

查看原文本刊更多论文

通过多重序列描述符识别人类基因组中的 DNase I 超敏位点。

DNase I超敏位点（DHSs）是对DNase I酶高度敏感的染色质区域。研究DHSs对了解复杂的转录调控机制和定位顺式调控元件（CREs）至关重要。大量研究表明，与疾病相关的基因座往往富集在 DHSs 区域，这凸显了鉴定 DHSs 的重要性。虽然有湿法实验可用于 DHSs 鉴定，但这些实验往往需要大量人力。因此，亟需为此开发计算方法。在本研究中，我们使用实验数据构建了一个基准数据集。我们采用了七种特征提取方法来获取人类 DHS 的信息。F 分数用于筛选特征。通过五倍交叉验证比较各种分类算法的预测性能，我们提出了随机森林来构建最终模型。该模型的总体预测准确率为 0.859，AUC 值为 0.837。我们希望该模型能帮助从事 DNase 研究的学者识别这些位点。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Methods 生物-生化研究方法

CiteScore

9.80

自引率

2.10%

发文量

222

审稿时长

11.3 weeks

期刊介绍： Methods focuses on rapidly developing techniques in the experimental biological and medical sciences. Each topical issue, organized by a guest editor who is an expert in the area covered, consists solely of invited quality articles by specialist authors, many of them reviews. Issues are devoted to specific technical approaches with emphasis on clear detailed descriptions of protocols that allow them to be reproduced easily. The background information provided enables researchers to understand the principles underlying the methods; other helpful sections include comparisons of alternative methods giving the advantages and disadvantages of particular methods, guidance on avoiding potential pitfalls, and suggestions for troubleshooting.