{"title":"通过机器学习对 ATAC-Seq 和 RNA-Seq 进行整合分析,确定了乳腺癌内在亚型的 10 个特征基因。","authors":"Jeong-Woon Park, Je-Keun Rhee","doi":"10.3390/biology13100799","DOIUrl":null,"url":null,"abstract":"<p><p>Breast cancer is a heterogeneous disease composed of various biologically distinct subtypes, each characterized by unique molecular features. Its formation and progression involve a complex, multistep process that includes the accumulation of numerous genetic and epigenetic alterations. Although integrating RNA-seq transcriptome data with ATAC-seq epigenetic information provides a more comprehensive understanding of gene regulation and its impact across different conditions, no classification model has yet been developed for breast cancer intrinsic subtypes based on such integrative analyses. In this study, we employed machine learning algorithms to predict intrinsic subtypes through the integrative analysis of ATAC-seq and RNA-seq data. We identified 10 signature genes (<i>CDH3</i>, <i>ERBB2</i>, <i>TYMS</i>, <i>GREB1</i>, <i>OSR1</i>, <i>MYBL2</i>, <i>FAM83D</i>, <i>ESR1</i>, <i>FOXC1</i>, and <i>NAT1</i>) using recursive feature elimination with cross-validation (RFECV) and a support vector machine (SVM) based on SHAP (SHapley Additive exPlanations) feature importance. Furthermore, we found that these genes were primarily associated with immune responses, hormone signaling, cancer progression, and cellular proliferation.</p>","PeriodicalId":48624,"journal":{"name":"Biology-Basel","volume":"13 10","pages":""},"PeriodicalIF":3.6000,"publicationDate":"2024-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11505269/pdf/","citationCount":"0","resultStr":"{\"title\":\"Integrative Analysis of ATAC-Seq and RNA-Seq through Machine Learning Identifies 10 Signature Genes for Breast Cancer Intrinsic Subtypes.\",\"authors\":\"Jeong-Woon Park, Je-Keun Rhee\",\"doi\":\"10.3390/biology13100799\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Breast cancer is a heterogeneous disease composed of various biologically distinct subtypes, each characterized by unique molecular features. Its formation and progression involve a complex, multistep process that includes the accumulation of numerous genetic and epigenetic alterations. Although integrating RNA-seq transcriptome data with ATAC-seq epigenetic information provides a more comprehensive understanding of gene regulation and its impact across different conditions, no classification model has yet been developed for breast cancer intrinsic subtypes based on such integrative analyses. In this study, we employed machine learning algorithms to predict intrinsic subtypes through the integrative analysis of ATAC-seq and RNA-seq data. We identified 10 signature genes (<i>CDH3</i>, <i>ERBB2</i>, <i>TYMS</i>, <i>GREB1</i>, <i>OSR1</i>, <i>MYBL2</i>, <i>FAM83D</i>, <i>ESR1</i>, <i>FOXC1</i>, and <i>NAT1</i>) using recursive feature elimination with cross-validation (RFECV) and a support vector machine (SVM) based on SHAP (SHapley Additive exPlanations) feature importance. Furthermore, we found that these genes were primarily associated with immune responses, hormone signaling, cancer progression, and cellular proliferation.</p>\",\"PeriodicalId\":48624,\"journal\":{\"name\":\"Biology-Basel\",\"volume\":\"13 10\",\"pages\":\"\"},\"PeriodicalIF\":3.6000,\"publicationDate\":\"2024-10-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11505269/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Biology-Basel\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.3390/biology13100799\",\"RegionNum\":3,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biology-Basel","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.3390/biology13100799","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOLOGY","Score":null,"Total":0}
Integrative Analysis of ATAC-Seq and RNA-Seq through Machine Learning Identifies 10 Signature Genes for Breast Cancer Intrinsic Subtypes.
Breast cancer is a heterogeneous disease composed of various biologically distinct subtypes, each characterized by unique molecular features. Its formation and progression involve a complex, multistep process that includes the accumulation of numerous genetic and epigenetic alterations. Although integrating RNA-seq transcriptome data with ATAC-seq epigenetic information provides a more comprehensive understanding of gene regulation and its impact across different conditions, no classification model has yet been developed for breast cancer intrinsic subtypes based on such integrative analyses. In this study, we employed machine learning algorithms to predict intrinsic subtypes through the integrative analysis of ATAC-seq and RNA-seq data. We identified 10 signature genes (CDH3, ERBB2, TYMS, GREB1, OSR1, MYBL2, FAM83D, ESR1, FOXC1, and NAT1) using recursive feature elimination with cross-validation (RFECV) and a support vector machine (SVM) based on SHAP (SHapley Additive exPlanations) feature importance. Furthermore, we found that these genes were primarily associated with immune responses, hormone signaling, cancer progression, and cellular proliferation.
期刊介绍:
Biology (ISSN 2079-7737) is an international, peer-reviewed, quick-refereeing open access journal of Biological Science published by MDPI online. It publishes reviews, research papers and communications in all areas of biology and at the interface of related disciplines. Our aim is to encourage scientists to publish their experimental and theoretical results in as much detail as possible. There is no restriction on the length of the papers. The full experimental details must be provided so that the results can be reproduced. Electronic files regarding the full details of the experimental procedure, if unable to be published in a normal way, can be deposited as supplementary material.