通过多重序列描述符识别人类基因组中的 DNase I 超敏位点。

IF 4.2 3区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS
Yan-Ting Jin , Yang Tan , Zhong-Hua Gan , Yu-Duo Hao , Tian-Yu Wang , Hao Lin , Bo Tang
{"title":"通过多重序列描述符识别人类基因组中的 DNase I 超敏位点。","authors":"Yan-Ting Jin ,&nbsp;Yang Tan ,&nbsp;Zhong-Hua Gan ,&nbsp;Yu-Duo Hao ,&nbsp;Tian-Yu Wang ,&nbsp;Hao Lin ,&nbsp;Bo Tang","doi":"10.1016/j.ymeth.2024.06.012","DOIUrl":null,"url":null,"abstract":"<div><p>DNase I hypersensitive sites (DHSs) are chromatin regions highly sensitive to DNase I enzymes. Studying DHSs is crucial for understanding complex transcriptional regulation mechanisms and localizing <em>cis</em>-regulatory elements (CREs). Numerous studies have indicated that disease-related loci are often enriched in DHSs regions, underscoring the importance of identifying DHSs. Although wet experiments exist for DHSs identification, they are often labor-intensive. Therefore, there is a strong need to develop computational methods for this purpose. In this study, we used experimental data to construct a benchmark dataset. Seven feature extraction methods were employed to capture information about human DHSs. The <em>F</em>-score was applied to filter the features. By comparing the prediction performance of various classification algorithms through five-fold cross-validation, random forest was proposed to perform the final model construction. The model could produce an overall prediction accuracy of 0.859 with an AUC value of 0.837. We hope that this model can assist scholars conducting DNase research in identifying these sites.</p></div>","PeriodicalId":390,"journal":{"name":"Methods","volume":"229 ","pages":"Pages 125-132"},"PeriodicalIF":4.2000,"publicationDate":"2024-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Identification of DNase I hypersensitive sites in the human genome by multiple sequence descriptors\",\"authors\":\"Yan-Ting Jin ,&nbsp;Yang Tan ,&nbsp;Zhong-Hua Gan ,&nbsp;Yu-Duo Hao ,&nbsp;Tian-Yu Wang ,&nbsp;Hao Lin ,&nbsp;Bo Tang\",\"doi\":\"10.1016/j.ymeth.2024.06.012\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>DNase I hypersensitive sites (DHSs) are chromatin regions highly sensitive to DNase I enzymes. Studying DHSs is crucial for understanding complex transcriptional regulation mechanisms and localizing <em>cis</em>-regulatory elements (CREs). Numerous studies have indicated that disease-related loci are often enriched in DHSs regions, underscoring the importance of identifying DHSs. Although wet experiments exist for DHSs identification, they are often labor-intensive. Therefore, there is a strong need to develop computational methods for this purpose. In this study, we used experimental data to construct a benchmark dataset. Seven feature extraction methods were employed to capture information about human DHSs. The <em>F</em>-score was applied to filter the features. By comparing the prediction performance of various classification algorithms through five-fold cross-validation, random forest was proposed to perform the final model construction. The model could produce an overall prediction accuracy of 0.859 with an AUC value of 0.837. We hope that this model can assist scholars conducting DNase research in identifying these sites.</p></div>\",\"PeriodicalId\":390,\"journal\":{\"name\":\"Methods\",\"volume\":\"229 \",\"pages\":\"Pages 125-132\"},\"PeriodicalIF\":4.2000,\"publicationDate\":\"2024-07-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Methods\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1046202324001622\",\"RegionNum\":3,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Methods","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1046202324001622","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

摘要

DNase I超敏位点(DHSs)是对DNase I酶高度敏感的染色质区域。研究DHSs对了解复杂的转录调控机制和定位顺式调控元件(CREs)至关重要。大量研究表明,与疾病相关的基因座往往富集在 DHSs 区域,这凸显了鉴定 DHSs 的重要性。虽然有湿法实验可用于 DHSs 鉴定,但这些实验往往需要大量人力。因此,亟需为此开发计算方法。在本研究中,我们使用实验数据构建了一个基准数据集。我们采用了七种特征提取方法来获取人类 DHS 的信息。F 分数用于筛选特征。通过五倍交叉验证比较各种分类算法的预测性能,我们提出了随机森林来构建最终模型。该模型的总体预测准确率为 0.859,AUC 值为 0.837。我们希望该模型能帮助从事 DNase 研究的学者识别这些位点。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Identification of DNase I hypersensitive sites in the human genome by multiple sequence descriptors

DNase I hypersensitive sites (DHSs) are chromatin regions highly sensitive to DNase I enzymes. Studying DHSs is crucial for understanding complex transcriptional regulation mechanisms and localizing cis-regulatory elements (CREs). Numerous studies have indicated that disease-related loci are often enriched in DHSs regions, underscoring the importance of identifying DHSs. Although wet experiments exist for DHSs identification, they are often labor-intensive. Therefore, there is a strong need to develop computational methods for this purpose. In this study, we used experimental data to construct a benchmark dataset. Seven feature extraction methods were employed to capture information about human DHSs. The F-score was applied to filter the features. By comparing the prediction performance of various classification algorithms through five-fold cross-validation, random forest was proposed to perform the final model construction. The model could produce an overall prediction accuracy of 0.859 with an AUC value of 0.837. We hope that this model can assist scholars conducting DNase research in identifying these sites.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Methods
Methods 生物-生化研究方法
CiteScore
9.80
自引率
2.10%
发文量
222
审稿时长
11.3 weeks
期刊介绍: Methods focuses on rapidly developing techniques in the experimental biological and medical sciences. Each topical issue, organized by a guest editor who is an expert in the area covered, consists solely of invited quality articles by specialist authors, many of them reviews. Issues are devoted to specific technical approaches with emphasis on clear detailed descriptions of protocols that allow them to be reproduced easily. The background information provided enables researchers to understand the principles underlying the methods; other helpful sections include comparisons of alternative methods giving the advantages and disadvantages of particular methods, guidance on avoiding potential pitfalls, and suggestions for troubleshooting.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信