{"title":"识别与乳腺癌相关的化学物质的基因预测因子:MCF7细胞转录组筛选数据的机器学习分析。","authors":"Lauren E Koval, Richard Judson, Julia E Rager","doi":"10.1002/em.70034","DOIUrl":null,"url":null,"abstract":"<p><p>Breast cancer is the most prevalent cancer in women and has been linked to exposure to environmental chemicals. However, many chemicals have not been evaluated for relationships with this outcome. In this study, we analyzed RNA sequencing data from human breast cancer-derived MCF7 cells exposed to hundreds of individual chemicals. These chemicals were binned into three categories: (1) chemicals with known associations to breast cancer (BCs); (2) chemicals with a lack of relationship to breast cancer (NBCs); and (3) chemicals that remain understudied for breast cancer risk (UCs). Machine learning models were trained to discriminate between BCs and NBCs based on transcriptomic and physicochemical property data. The best model yielded a balanced accuracy of 80% and was applied to the UCs. A total of 170 genes were found to contribute to model performance, including Claspin (CLSPN), Runt-related Transcription Factor 2 (RUNX2), and Ubinuclein 2 (UBN2). These genes further informed enriched pathways relevant to inflammation, ferroptosis signaling, and cell proliferation. Additionally, 97 UCs were predicted to be more analogous to BCs, including select biocides and dyes. To ground results in human population data, expression profiles for the 170 genes were assessed in tumor samples from The Cancer Genome Atlas, yielding overlap in human cancer-relevant alterations and in vitro chemical-induced alterations. Collectively, this study addresses a gap related to understanding which chemicals may be of interest for further characterization of breast cancer risk by prioritizing chemicals and underlying mechanisms using high-throughput transcriptomic screening data.</p>","PeriodicalId":11791,"journal":{"name":"Environmental and Molecular Mutagenesis","volume":" ","pages":""},"PeriodicalIF":2.3000,"publicationDate":"2025-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Identifying Gene Predictors of Chemicals Linked With Breast Cancer: A Machine Learning Analysis of MCF7 Cellular Transcriptomic Screening Data.\",\"authors\":\"Lauren E Koval, Richard Judson, Julia E Rager\",\"doi\":\"10.1002/em.70034\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Breast cancer is the most prevalent cancer in women and has been linked to exposure to environmental chemicals. However, many chemicals have not been evaluated for relationships with this outcome. In this study, we analyzed RNA sequencing data from human breast cancer-derived MCF7 cells exposed to hundreds of individual chemicals. These chemicals were binned into three categories: (1) chemicals with known associations to breast cancer (BCs); (2) chemicals with a lack of relationship to breast cancer (NBCs); and (3) chemicals that remain understudied for breast cancer risk (UCs). Machine learning models were trained to discriminate between BCs and NBCs based on transcriptomic and physicochemical property data. The best model yielded a balanced accuracy of 80% and was applied to the UCs. A total of 170 genes were found to contribute to model performance, including Claspin (CLSPN), Runt-related Transcription Factor 2 (RUNX2), and Ubinuclein 2 (UBN2). These genes further informed enriched pathways relevant to inflammation, ferroptosis signaling, and cell proliferation. Additionally, 97 UCs were predicted to be more analogous to BCs, including select biocides and dyes. To ground results in human population data, expression profiles for the 170 genes were assessed in tumor samples from The Cancer Genome Atlas, yielding overlap in human cancer-relevant alterations and in vitro chemical-induced alterations. Collectively, this study addresses a gap related to understanding which chemicals may be of interest for further characterization of breast cancer risk by prioritizing chemicals and underlying mechanisms using high-throughput transcriptomic screening data.</p>\",\"PeriodicalId\":11791,\"journal\":{\"name\":\"Environmental and Molecular Mutagenesis\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":2.3000,\"publicationDate\":\"2025-09-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Environmental and Molecular Mutagenesis\",\"FirstCategoryId\":\"93\",\"ListUrlMain\":\"https://doi.org/10.1002/em.70034\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"ENVIRONMENTAL SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Environmental and Molecular Mutagenesis","FirstCategoryId":"93","ListUrlMain":"https://doi.org/10.1002/em.70034","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
Identifying Gene Predictors of Chemicals Linked With Breast Cancer: A Machine Learning Analysis of MCF7 Cellular Transcriptomic Screening Data.
Breast cancer is the most prevalent cancer in women and has been linked to exposure to environmental chemicals. However, many chemicals have not been evaluated for relationships with this outcome. In this study, we analyzed RNA sequencing data from human breast cancer-derived MCF7 cells exposed to hundreds of individual chemicals. These chemicals were binned into three categories: (1) chemicals with known associations to breast cancer (BCs); (2) chemicals with a lack of relationship to breast cancer (NBCs); and (3) chemicals that remain understudied for breast cancer risk (UCs). Machine learning models were trained to discriminate between BCs and NBCs based on transcriptomic and physicochemical property data. The best model yielded a balanced accuracy of 80% and was applied to the UCs. A total of 170 genes were found to contribute to model performance, including Claspin (CLSPN), Runt-related Transcription Factor 2 (RUNX2), and Ubinuclein 2 (UBN2). These genes further informed enriched pathways relevant to inflammation, ferroptosis signaling, and cell proliferation. Additionally, 97 UCs were predicted to be more analogous to BCs, including select biocides and dyes. To ground results in human population data, expression profiles for the 170 genes were assessed in tumor samples from The Cancer Genome Atlas, yielding overlap in human cancer-relevant alterations and in vitro chemical-induced alterations. Collectively, this study addresses a gap related to understanding which chemicals may be of interest for further characterization of breast cancer risk by prioritizing chemicals and underlying mechanisms using high-throughput transcriptomic screening data.
期刊介绍:
Environmental and Molecular Mutagenesis publishes original research manuscripts, reviews and commentaries on topics related to six general areas, with an emphasis on subject matter most suited for the readership of EMM as outlined below. The journal is intended for investigators in fields such as molecular biology, biochemistry, microbiology, genetics and epigenetics, genomics and epigenomics, cancer research, neurobiology, heritable mutation, radiation biology, toxicology, and molecular & environmental epidemiology.