通过机器学习和统计方法利用新型血浆细胞因子进行吸烟分类。

Seema Singh Saharan, Pankaj Nagar, Kate Townsend Creasy, Eveline O Stock, James Feng, Mary J Malloy, John P Kane
{"title":"通过机器学习和统计方法利用新型血浆细胞因子进行吸烟分类。","authors":"Seema Singh Saharan, Pankaj Nagar, Kate Townsend Creasy, Eveline O Stock, James Feng, Mary J Malloy, John P Kane","doi":"10.1109/csci62032.2023.00118","DOIUrl":null,"url":null,"abstract":"<p><p>Smoking is a major cause of premature and preventable death. Tobacco exposure has a detrimental effect on many organs and contributes to multiple diseases including chronic obstructive pulmonary disease (COPD), cardiovascular disease, cancer, and diabetes. Cytokines are inflammatory biomarkers that are mechanistically associated with smoking. Machine Learning algorithms allow for the quantitative assessment of the contributions of individual cytokines to tobacco-related diseases. The mapping of cytokines to disease can facilitate and direct treatment modalities. By the application of k Nearest Neighbor (k-NN) and Random Forest machine learning algorithms on 63 plasma cytokines we have demonstrated the classification of smoking. To ensure optimal results, performance improvement techniques such as k-fold cross validation and hyper parameter tuning are employed. Separability efficiency achieved by the models is evaluated using the Area Under the Receiver Operating Characteristic (AUROC) metric. The most significant cytokines that enabled the classification are identified and presented. The statistically significant difference for AUROC score of k-NN and Random Forest has been ascertained using the 2-sample independent t test. A reasonably good classification performance was achieved by k-NN algorithm with an AUROC metric of .87, and a 95% CI of (.823,.917). Random forest exceeded k-NN algorithm's performance, with a perfect AUROC score of 1 and a 95% CI of (1,1). From among the ten most prominent cytokines that contributed to the classification, the ones common to both algorithms are: LIF, IL22, G-CSF/CSF-3, TRAIL. AUROC scores for k-NN and Random Forest are significantly different (p-value = 5.105e-16). The discovery and transference of biomarkers such as cytokines from the platform of molecular investigation to clinical practice, can facilitate precision medicine-based therapeutic interventions.</p>","PeriodicalId":93614,"journal":{"name":"Proceedings. International Conference on Computational Science and Computational Intelligence","volume":"2023 ","pages":"686-694"},"PeriodicalIF":0.0000,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11500790/pdf/","citationCount":"0","resultStr":"{\"title\":\"Smoking Classification Using Novel Plasma Cytokines by implementing Machine Learning and Statistical Methods.\",\"authors\":\"Seema Singh Saharan, Pankaj Nagar, Kate Townsend Creasy, Eveline O Stock, James Feng, Mary J Malloy, John P Kane\",\"doi\":\"10.1109/csci62032.2023.00118\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Smoking is a major cause of premature and preventable death. Tobacco exposure has a detrimental effect on many organs and contributes to multiple diseases including chronic obstructive pulmonary disease (COPD), cardiovascular disease, cancer, and diabetes. Cytokines are inflammatory biomarkers that are mechanistically associated with smoking. Machine Learning algorithms allow for the quantitative assessment of the contributions of individual cytokines to tobacco-related diseases. The mapping of cytokines to disease can facilitate and direct treatment modalities. By the application of k Nearest Neighbor (k-NN) and Random Forest machine learning algorithms on 63 plasma cytokines we have demonstrated the classification of smoking. To ensure optimal results, performance improvement techniques such as k-fold cross validation and hyper parameter tuning are employed. Separability efficiency achieved by the models is evaluated using the Area Under the Receiver Operating Characteristic (AUROC) metric. The most significant cytokines that enabled the classification are identified and presented. The statistically significant difference for AUROC score of k-NN and Random Forest has been ascertained using the 2-sample independent t test. A reasonably good classification performance was achieved by k-NN algorithm with an AUROC metric of .87, and a 95% CI of (.823,.917). Random forest exceeded k-NN algorithm's performance, with a perfect AUROC score of 1 and a 95% CI of (1,1). From among the ten most prominent cytokines that contributed to the classification, the ones common to both algorithms are: LIF, IL22, G-CSF/CSF-3, TRAIL. AUROC scores for k-NN and Random Forest are significantly different (p-value = 5.105e-16). The discovery and transference of biomarkers such as cytokines from the platform of molecular investigation to clinical practice, can facilitate precision medicine-based therapeutic interventions.</p>\",\"PeriodicalId\":93614,\"journal\":{\"name\":\"Proceedings. International Conference on Computational Science and Computational Intelligence\",\"volume\":\"2023 \",\"pages\":\"686-694\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11500790/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings. International Conference on Computational Science and Computational Intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/csci62032.2023.00118\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/7/19 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. International Conference on Computational Science and Computational Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/csci62032.2023.00118","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/7/19 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

吸烟是导致过早死亡和可预防死亡的主要原因。烟草暴露会对许多器官产生有害影响,并导致多种疾病,包括慢性阻塞性肺病(COPD)、心血管疾病、癌症和糖尿病。细胞因子是一种炎症生物标志物,从机理上讲与吸烟有关。通过机器学习算法,可以定量评估单个细胞因子对烟草相关疾病的影响。细胞因子与疾病的映射可促进和指导治疗模式。通过对 63 种血浆细胞因子应用 k Nearest Neighbor(k-NN)和随机森林(Random Forest)机器学习算法,我们对吸烟进行了分类。为确保获得最佳结果,我们采用了 k 倍交叉验证和超参数调整等性能改进技术。使用接收者操作特征下面积(AUROC)指标对模型实现的可分离性效率进行了评估。确定并展示了能够进行分类的最重要细胞因子。使用双样本独立 t 检验确定了 k-NN 和随机森林的 AUROC 分数在统计学上的显著差异。k-NN 算法取得了相当不错的分类性能,其 AUROC 指标为 0.87,95% CI 为(.823,.917)。随机森林的性能超过了 k-NN 算法,AUROC 满分为 1,95% CI 为(1,1)。在对分类做出贡献的 10 种最重要的细胞因子中,两种算法的共同点如下:LIF、IL22、G-CSF/CSF-3、TRIT。k-NN 和随机森林的 AUROC 分数有显著差异(p 值 = 5.105e-16)。发现细胞因子等生物标记物并将其从分子研究平台转移到临床实践中,可促进基于精准医学的治疗干预。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Smoking Classification Using Novel Plasma Cytokines by implementing Machine Learning and Statistical Methods.

Smoking is a major cause of premature and preventable death. Tobacco exposure has a detrimental effect on many organs and contributes to multiple diseases including chronic obstructive pulmonary disease (COPD), cardiovascular disease, cancer, and diabetes. Cytokines are inflammatory biomarkers that are mechanistically associated with smoking. Machine Learning algorithms allow for the quantitative assessment of the contributions of individual cytokines to tobacco-related diseases. The mapping of cytokines to disease can facilitate and direct treatment modalities. By the application of k Nearest Neighbor (k-NN) and Random Forest machine learning algorithms on 63 plasma cytokines we have demonstrated the classification of smoking. To ensure optimal results, performance improvement techniques such as k-fold cross validation and hyper parameter tuning are employed. Separability efficiency achieved by the models is evaluated using the Area Under the Receiver Operating Characteristic (AUROC) metric. The most significant cytokines that enabled the classification are identified and presented. The statistically significant difference for AUROC score of k-NN and Random Forest has been ascertained using the 2-sample independent t test. A reasonably good classification performance was achieved by k-NN algorithm with an AUROC metric of .87, and a 95% CI of (.823,.917). Random forest exceeded k-NN algorithm's performance, with a perfect AUROC score of 1 and a 95% CI of (1,1). From among the ten most prominent cytokines that contributed to the classification, the ones common to both algorithms are: LIF, IL22, G-CSF/CSF-3, TRAIL. AUROC scores for k-NN and Random Forest are significantly different (p-value = 5.105e-16). The discovery and transference of biomarkers such as cytokines from the platform of molecular investigation to clinical practice, can facilitate precision medicine-based therapeutic interventions.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信