Interpreting Lung Cancer Health Disparity between African American Males and European American Males.

Masrur Sobhan, Md Mezbahul Islam, Ananda Mohan Mondal
{"title":"Interpreting Lung Cancer Health Disparity between African American Males and European American Males.","authors":"Masrur Sobhan, Md Mezbahul Islam, Ananda Mohan Mondal","doi":"10.1109/bibm62325.2024.10822014","DOIUrl":null,"url":null,"abstract":"<p><p>Lung cancer remains a predominant cause of cancer-related deaths, with notable disparities in incidence and outcomes across racial and gender groups. This study addresses these disparities by developing a computational framework leveraging explainable artificial intelligence (XAI) to identify both patient- and cohort-specific biomarker genes in lung cancer. Specifically, we focus on two lung cancer subtypes, Lung Adenocarcinoma (LUAD) and Lung Squamous Cell Carcinoma (LUSC), examining distinct racial and sex-specific cohorts: African American males (AAMs) and European American males (EAMs). This study innovatively structures classification tasks based on disease conditions rather than racial labels to avoid race-specific imbalance. We constructed four classification tasks- one three-class problem (LUAD-LUSC-HEALTHY) and three two-class problems (LUAD-LUSC, LUAD-HEALTHY, LUSC-HEALTHY)- to interpret the disease behavior of the patients in terms of genes and pathways. This methodology allows a LUAD or LUSC patient to be analyzed via multiple classifications, yielding robust disparity information for every patient. This preliminary work reports the disparity information for LUAD only. Utilizing Transcriptome data from The Cancer Genome Atlas (TCGA) and Genotype-Tissue Expression (GTEx) projects, we processed samples for LUAD, LUSC, and HEALTHY cohorts. We applied machine learning models, including convolutional neural network (CNN), logistic regression (LR), naïve Bayesian classifier (NB), support vector machine (SVM), random forest (RF), and extreme gradient boosting (XGBoost) for the classification. The SHapley Additive exPlanation (SHAP)-based interpretation of the best performing classification model uncovered cohort-specific genes and pathways related to health disparities between LUAD-AAM and LUAD-EAM cohorts.</p>","PeriodicalId":74563,"journal":{"name":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","volume":"2024 ","pages":"7141-7143"},"PeriodicalIF":0.0000,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11753458/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/bibm62325.2024.10822014","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Lung cancer remains a predominant cause of cancer-related deaths, with notable disparities in incidence and outcomes across racial and gender groups. This study addresses these disparities by developing a computational framework leveraging explainable artificial intelligence (XAI) to identify both patient- and cohort-specific biomarker genes in lung cancer. Specifically, we focus on two lung cancer subtypes, Lung Adenocarcinoma (LUAD) and Lung Squamous Cell Carcinoma (LUSC), examining distinct racial and sex-specific cohorts: African American males (AAMs) and European American males (EAMs). This study innovatively structures classification tasks based on disease conditions rather than racial labels to avoid race-specific imbalance. We constructed four classification tasks- one three-class problem (LUAD-LUSC-HEALTHY) and three two-class problems (LUAD-LUSC, LUAD-HEALTHY, LUSC-HEALTHY)- to interpret the disease behavior of the patients in terms of genes and pathways. This methodology allows a LUAD or LUSC patient to be analyzed via multiple classifications, yielding robust disparity information for every patient. This preliminary work reports the disparity information for LUAD only. Utilizing Transcriptome data from The Cancer Genome Atlas (TCGA) and Genotype-Tissue Expression (GTEx) projects, we processed samples for LUAD, LUSC, and HEALTHY cohorts. We applied machine learning models, including convolutional neural network (CNN), logistic regression (LR), naïve Bayesian classifier (NB), support vector machine (SVM), random forest (RF), and extreme gradient boosting (XGBoost) for the classification. The SHapley Additive exPlanation (SHAP)-based interpretation of the best performing classification model uncovered cohort-specific genes and pathways related to health disparities between LUAD-AAM and LUAD-EAM cohorts.

非裔美国男性和欧裔美国男性肺癌健康差异的解释
肺癌仍然是癌症相关死亡的主要原因,不同种族和性别群体在发病率和结局方面存在显著差异。本研究通过开发利用可解释人工智能(XAI)的计算框架来识别肺癌患者和队列特异性生物标志物基因,从而解决了这些差异。具体来说,我们关注两种肺癌亚型,肺腺癌(LUAD)和肺鳞状细胞癌(LUSC),检查不同的种族和性别特异性队列:非洲裔美国男性(AAMs)和欧洲裔美国男性(EAMs)。本研究创新地根据疾病状况而不是种族标签来构建分类任务,以避免种族特异性失衡。我们构建了四个分类任务,一个三级问题(LUAD-LUSC- healthy)和三个二级问题(LUAD-LUSC, LUAD-HEALTHY, LUSC-HEALTHY),从基因和途径来解释患者的疾病行为。该方法允许通过多种分类分析LUAD或LUSC患者,为每个患者提供可靠的差异信息。这项初步工作仅报告了LUAD的差异信息。利用来自癌症基因组图谱(TCGA)和基因型-组织表达(GTEx)项目的转录组数据,我们处理了LUAD、LUSC和健康队列的样本。我们应用了机器学习模型,包括卷积神经网络(CNN)、逻辑回归(LR)、naïve贝叶斯分类器(NB)、支持向量机(SVM)、随机森林(RF)和极端梯度增强(XGBoost)进行分类。基于SHapley加性解释(SHAP)的最佳分类模型解释揭示了与LUAD-AAM和LUAD-EAM队列之间健康差异相关的队列特异性基因和途径。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信