Kylie L. King , Hamed Abdollahi , Zoe Dinkel , Alannah Akins , Homayoun Valafar , Heather Dunn
{"title":"Pilot study: Initial investigation suggests differences in EMT-associated gene expression in breast tumor regions","authors":"Kylie L. King , Hamed Abdollahi , Zoe Dinkel , Alannah Akins , Homayoun Valafar , Heather Dunn","doi":"10.1016/j.csbj.2025.01.027","DOIUrl":null,"url":null,"abstract":"<div><div>Triple negative breast cancer (TNBC) is the most aggressive subtype and disproportionately affects African American women. The development of breast cancer is highly associated with interactions between tumor cells and the extracellular matrix (ECM), and recent research suggests that cellular components of the ECM vary between racial groups. This pilot study aimed to evaluate gene expression in TNBC samples from patients who identified as African American and Caucasian using traditional statistical methods and emerging Machine Learning (ML) approaches. ML enables the analysis of complex datasets and the extraction of useful information from small datasets. We selected four regions of interest from tumor biopsy samples and used laser microdissection to extract tissue for gene expression characterization via RT-qPCR. Both parametric and non-parametric statistical analyses identified genes differentially expressed between the two ethnic groups. Out of 40 genes analyzed, 4 were differentially expressed in the edge of tumor (ET) region and 8 in the ECM adjacent to the tumor (ECMT) region. In addition to statistical approach, ML was used to generate decision trees (DT) for a broader analysis of gene expression and ethnicity. Our DT models achieved 83.33 % accuracy and identified the most significant genes, including <em>CD29</em> and <em>EGF</em> from the ET region and <em>SNAI1</em> and <em>CHD2</em> from the ECMT region. All significant genes were analyzed for pathway enrichment using MSigDB and Gene Ontology databases, most notably the epithelial to mesenchymal transition and cell motility pathways. This pilot study highlights key genes of interest that are differentially expressed in African American and Caucasian TNBC samples.</div></div>","PeriodicalId":10715,"journal":{"name":"Computational and structural biotechnology journal","volume":"27 ","pages":"Pages 548-555"},"PeriodicalIF":4.4000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational and structural biotechnology journal","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2001037025000273","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Triple negative breast cancer (TNBC) is the most aggressive subtype and disproportionately affects African American women. The development of breast cancer is highly associated with interactions between tumor cells and the extracellular matrix (ECM), and recent research suggests that cellular components of the ECM vary between racial groups. This pilot study aimed to evaluate gene expression in TNBC samples from patients who identified as African American and Caucasian using traditional statistical methods and emerging Machine Learning (ML) approaches. ML enables the analysis of complex datasets and the extraction of useful information from small datasets. We selected four regions of interest from tumor biopsy samples and used laser microdissection to extract tissue for gene expression characterization via RT-qPCR. Both parametric and non-parametric statistical analyses identified genes differentially expressed between the two ethnic groups. Out of 40 genes analyzed, 4 were differentially expressed in the edge of tumor (ET) region and 8 in the ECM adjacent to the tumor (ECMT) region. In addition to statistical approach, ML was used to generate decision trees (DT) for a broader analysis of gene expression and ethnicity. Our DT models achieved 83.33 % accuracy and identified the most significant genes, including CD29 and EGF from the ET region and SNAI1 and CHD2 from the ECMT region. All significant genes were analyzed for pathway enrichment using MSigDB and Gene Ontology databases, most notably the epithelial to mesenchymal transition and cell motility pathways. This pilot study highlights key genes of interest that are differentially expressed in African American and Caucasian TNBC samples.
期刊介绍:
Computational and Structural Biotechnology Journal (CSBJ) is an online gold open access journal publishing research articles and reviews after full peer review. All articles are published, without barriers to access, immediately upon acceptance. The journal places a strong emphasis on functional and mechanistic understanding of how molecular components in a biological process work together through the application of computational methods. Structural data may provide such insights, but they are not a pre-requisite for publication in the journal. Specific areas of interest include, but are not limited to:
Structure and function of proteins, nucleic acids and other macromolecules
Structure and function of multi-component complexes
Protein folding, processing and degradation
Enzymology
Computational and structural studies of plant systems
Microbial Informatics
Genomics
Proteomics
Metabolomics
Algorithms and Hypothesis in Bioinformatics
Mathematical and Theoretical Biology
Computational Chemistry and Drug Discovery
Microscopy and Molecular Imaging
Nanotechnology
Systems and Synthetic Biology