识别与乳腺癌相关的化学物质的基因预测因子:MCF7细胞转录组筛选数据的机器学习分析。

IF 2.3 4区 医学 Q3 ENVIRONMENTAL SCIENCES
Lauren E Koval, Richard Judson, Julia E Rager
{"title":"识别与乳腺癌相关的化学物质的基因预测因子:MCF7细胞转录组筛选数据的机器学习分析。","authors":"Lauren E Koval, Richard Judson, Julia E Rager","doi":"10.1002/em.70034","DOIUrl":null,"url":null,"abstract":"<p><p>Breast cancer is the most prevalent cancer in women and has been linked to exposure to environmental chemicals. However, many chemicals have not been evaluated for relationships with this outcome. In this study, we analyzed RNA sequencing data from human breast cancer-derived MCF7 cells exposed to hundreds of individual chemicals. These chemicals were binned into three categories: (1) chemicals with known associations to breast cancer (BCs); (2) chemicals with a lack of relationship to breast cancer (NBCs); and (3) chemicals that remain understudied for breast cancer risk (UCs). Machine learning models were trained to discriminate between BCs and NBCs based on transcriptomic and physicochemical property data. The best model yielded a balanced accuracy of 80% and was applied to the UCs. A total of 170 genes were found to contribute to model performance, including Claspin (CLSPN), Runt-related Transcription Factor 2 (RUNX2), and Ubinuclein 2 (UBN2). These genes further informed enriched pathways relevant to inflammation, ferroptosis signaling, and cell proliferation. Additionally, 97 UCs were predicted to be more analogous to BCs, including select biocides and dyes. To ground results in human population data, expression profiles for the 170 genes were assessed in tumor samples from The Cancer Genome Atlas, yielding overlap in human cancer-relevant alterations and in vitro chemical-induced alterations. Collectively, this study addresses a gap related to understanding which chemicals may be of interest for further characterization of breast cancer risk by prioritizing chemicals and underlying mechanisms using high-throughput transcriptomic screening data.</p>","PeriodicalId":11791,"journal":{"name":"Environmental and Molecular Mutagenesis","volume":" ","pages":""},"PeriodicalIF":2.3000,"publicationDate":"2025-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Identifying Gene Predictors of Chemicals Linked With Breast Cancer: A Machine Learning Analysis of MCF7 Cellular Transcriptomic Screening Data.\",\"authors\":\"Lauren E Koval, Richard Judson, Julia E Rager\",\"doi\":\"10.1002/em.70034\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Breast cancer is the most prevalent cancer in women and has been linked to exposure to environmental chemicals. However, many chemicals have not been evaluated for relationships with this outcome. In this study, we analyzed RNA sequencing data from human breast cancer-derived MCF7 cells exposed to hundreds of individual chemicals. These chemicals were binned into three categories: (1) chemicals with known associations to breast cancer (BCs); (2) chemicals with a lack of relationship to breast cancer (NBCs); and (3) chemicals that remain understudied for breast cancer risk (UCs). Machine learning models were trained to discriminate between BCs and NBCs based on transcriptomic and physicochemical property data. The best model yielded a balanced accuracy of 80% and was applied to the UCs. A total of 170 genes were found to contribute to model performance, including Claspin (CLSPN), Runt-related Transcription Factor 2 (RUNX2), and Ubinuclein 2 (UBN2). These genes further informed enriched pathways relevant to inflammation, ferroptosis signaling, and cell proliferation. Additionally, 97 UCs were predicted to be more analogous to BCs, including select biocides and dyes. To ground results in human population data, expression profiles for the 170 genes were assessed in tumor samples from The Cancer Genome Atlas, yielding overlap in human cancer-relevant alterations and in vitro chemical-induced alterations. Collectively, this study addresses a gap related to understanding which chemicals may be of interest for further characterization of breast cancer risk by prioritizing chemicals and underlying mechanisms using high-throughput transcriptomic screening data.</p>\",\"PeriodicalId\":11791,\"journal\":{\"name\":\"Environmental and Molecular Mutagenesis\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":2.3000,\"publicationDate\":\"2025-09-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Environmental and Molecular Mutagenesis\",\"FirstCategoryId\":\"93\",\"ListUrlMain\":\"https://doi.org/10.1002/em.70034\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"ENVIRONMENTAL SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Environmental and Molecular Mutagenesis","FirstCategoryId":"93","ListUrlMain":"https://doi.org/10.1002/em.70034","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
引用次数: 0

摘要

乳腺癌是女性中最常见的癌症,与接触环境中的化学物质有关。然而,许多化学物质与这一结果的关系尚未得到评估。在这项研究中,我们分析了暴露于数百种单独化学物质的人类乳腺癌来源的MCF7细胞的RNA测序数据。这些化学物质被分为三类:(1)已知与乳腺癌相关的化学物质;(2)与乳腺癌无关的化学物质(NBCs);(3)与乳腺癌风险有关的化学物质(UCs)仍未得到充分研究。机器学习模型被训练来区分基于转录组学和物理化学性质数据的bc和nbc。最好的模型产生了80%的平衡精度,并应用于uc。共发现170个基因对模型性能有影响,包括Claspin (CLSPN)、runt相关转录因子2 (RUNX2)和Ubinuclein 2 (UBN2)。这些基因进一步丰富了与炎症、铁下垂信号和细胞增殖相关的通路。此外,97种UCs与bc更相似,包括选择的杀菌剂和染料。为了获得人类群体数据的基础结果,我们在癌症基因组图谱的肿瘤样本中评估了170个基因的表达谱,得出了人类癌症相关改变和体外化学诱导改变的重叠。总的来说,本研究通过使用高通量转录组筛选数据优先考虑化学物质和潜在机制,解决了与了解哪些化学物质可能对进一步表征乳腺癌风险感兴趣相关的空白。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Identifying Gene Predictors of Chemicals Linked With Breast Cancer: A Machine Learning Analysis of MCF7 Cellular Transcriptomic Screening Data.

Breast cancer is the most prevalent cancer in women and has been linked to exposure to environmental chemicals. However, many chemicals have not been evaluated for relationships with this outcome. In this study, we analyzed RNA sequencing data from human breast cancer-derived MCF7 cells exposed to hundreds of individual chemicals. These chemicals were binned into three categories: (1) chemicals with known associations to breast cancer (BCs); (2) chemicals with a lack of relationship to breast cancer (NBCs); and (3) chemicals that remain understudied for breast cancer risk (UCs). Machine learning models were trained to discriminate between BCs and NBCs based on transcriptomic and physicochemical property data. The best model yielded a balanced accuracy of 80% and was applied to the UCs. A total of 170 genes were found to contribute to model performance, including Claspin (CLSPN), Runt-related Transcription Factor 2 (RUNX2), and Ubinuclein 2 (UBN2). These genes further informed enriched pathways relevant to inflammation, ferroptosis signaling, and cell proliferation. Additionally, 97 UCs were predicted to be more analogous to BCs, including select biocides and dyes. To ground results in human population data, expression profiles for the 170 genes were assessed in tumor samples from The Cancer Genome Atlas, yielding overlap in human cancer-relevant alterations and in vitro chemical-induced alterations. Collectively, this study addresses a gap related to understanding which chemicals may be of interest for further characterization of breast cancer risk by prioritizing chemicals and underlying mechanisms using high-throughput transcriptomic screening data.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
5.40
自引率
10.70%
发文量
52
审稿时长
12-24 weeks
期刊介绍: Environmental and Molecular Mutagenesis publishes original research manuscripts, reviews and commentaries on topics related to six general areas, with an emphasis on subject matter most suited for the readership of EMM as outlined below. The journal is intended for investigators in fields such as molecular biology, biochemistry, microbiology, genetics and epigenetics, genomics and epigenomics, cancer research, neurobiology, heritable mutation, radiation biology, toxicology, and molecular & environmental epidemiology.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信