Seema Singh Saharan, Pankaj Nagar, Kate Townsend Creasy, Eveline O Stock, James Feng, Mary J Malloy, John P Kane
{"title":"通过机器学习和统计方法利用新型血浆细胞因子进行吸烟分类。","authors":"Seema Singh Saharan, Pankaj Nagar, Kate Townsend Creasy, Eveline O Stock, James Feng, Mary J Malloy, John P Kane","doi":"10.1109/csci62032.2023.00118","DOIUrl":null,"url":null,"abstract":"<p><p>Smoking is a major cause of premature and preventable death. Tobacco exposure has a detrimental effect on many organs and contributes to multiple diseases including chronic obstructive pulmonary disease (COPD), cardiovascular disease, cancer, and diabetes. Cytokines are inflammatory biomarkers that are mechanistically associated with smoking. Machine Learning algorithms allow for the quantitative assessment of the contributions of individual cytokines to tobacco-related diseases. The mapping of cytokines to disease can facilitate and direct treatment modalities. By the application of k Nearest Neighbor (k-NN) and Random Forest machine learning algorithms on 63 plasma cytokines we have demonstrated the classification of smoking. To ensure optimal results, performance improvement techniques such as k-fold cross validation and hyper parameter tuning are employed. Separability efficiency achieved by the models is evaluated using the Area Under the Receiver Operating Characteristic (AUROC) metric. The most significant cytokines that enabled the classification are identified and presented. The statistically significant difference for AUROC score of k-NN and Random Forest has been ascertained using the 2-sample independent t test. A reasonably good classification performance was achieved by k-NN algorithm with an AUROC metric of .87, and a 95% CI of (.823,.917). Random forest exceeded k-NN algorithm's performance, with a perfect AUROC score of 1 and a 95% CI of (1,1). From among the ten most prominent cytokines that contributed to the classification, the ones common to both algorithms are: LIF, IL22, G-CSF/CSF-3, TRAIL. AUROC scores for k-NN and Random Forest are significantly different (p-value = 5.105e-16). The discovery and transference of biomarkers such as cytokines from the platform of molecular investigation to clinical practice, can facilitate precision medicine-based therapeutic interventions.</p>","PeriodicalId":93614,"journal":{"name":"Proceedings. International Conference on Computational Science and Computational Intelligence","volume":"2023 ","pages":"686-694"},"PeriodicalIF":0.0000,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11500790/pdf/","citationCount":"0","resultStr":"{\"title\":\"Smoking Classification Using Novel Plasma Cytokines by implementing Machine Learning and Statistical Methods.\",\"authors\":\"Seema Singh Saharan, Pankaj Nagar, Kate Townsend Creasy, Eveline O Stock, James Feng, Mary J Malloy, John P Kane\",\"doi\":\"10.1109/csci62032.2023.00118\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Smoking is a major cause of premature and preventable death. Tobacco exposure has a detrimental effect on many organs and contributes to multiple diseases including chronic obstructive pulmonary disease (COPD), cardiovascular disease, cancer, and diabetes. Cytokines are inflammatory biomarkers that are mechanistically associated with smoking. Machine Learning algorithms allow for the quantitative assessment of the contributions of individual cytokines to tobacco-related diseases. The mapping of cytokines to disease can facilitate and direct treatment modalities. By the application of k Nearest Neighbor (k-NN) and Random Forest machine learning algorithms on 63 plasma cytokines we have demonstrated the classification of smoking. To ensure optimal results, performance improvement techniques such as k-fold cross validation and hyper parameter tuning are employed. Separability efficiency achieved by the models is evaluated using the Area Under the Receiver Operating Characteristic (AUROC) metric. The most significant cytokines that enabled the classification are identified and presented. The statistically significant difference for AUROC score of k-NN and Random Forest has been ascertained using the 2-sample independent t test. A reasonably good classification performance was achieved by k-NN algorithm with an AUROC metric of .87, and a 95% CI of (.823,.917). Random forest exceeded k-NN algorithm's performance, with a perfect AUROC score of 1 and a 95% CI of (1,1). From among the ten most prominent cytokines that contributed to the classification, the ones common to both algorithms are: LIF, IL22, G-CSF/CSF-3, TRAIL. AUROC scores for k-NN and Random Forest are significantly different (p-value = 5.105e-16). The discovery and transference of biomarkers such as cytokines from the platform of molecular investigation to clinical practice, can facilitate precision medicine-based therapeutic interventions.</p>\",\"PeriodicalId\":93614,\"journal\":{\"name\":\"Proceedings. International Conference on Computational Science and Computational Intelligence\",\"volume\":\"2023 \",\"pages\":\"686-694\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11500790/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings. International Conference on Computational Science and Computational Intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/csci62032.2023.00118\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/7/19 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. International Conference on Computational Science and Computational Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/csci62032.2023.00118","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/7/19 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
吸烟是导致过早死亡和可预防死亡的主要原因。烟草暴露会对许多器官产生有害影响,并导致多种疾病,包括慢性阻塞性肺病(COPD)、心血管疾病、癌症和糖尿病。细胞因子是一种炎症生物标志物,从机理上讲与吸烟有关。通过机器学习算法,可以定量评估单个细胞因子对烟草相关疾病的影响。细胞因子与疾病的映射可促进和指导治疗模式。通过对 63 种血浆细胞因子应用 k Nearest Neighbor(k-NN)和随机森林(Random Forest)机器学习算法,我们对吸烟进行了分类。为确保获得最佳结果,我们采用了 k 倍交叉验证和超参数调整等性能改进技术。使用接收者操作特征下面积(AUROC)指标对模型实现的可分离性效率进行了评估。确定并展示了能够进行分类的最重要细胞因子。使用双样本独立 t 检验确定了 k-NN 和随机森林的 AUROC 分数在统计学上的显著差异。k-NN 算法取得了相当不错的分类性能,其 AUROC 指标为 0.87,95% CI 为(.823,.917)。随机森林的性能超过了 k-NN 算法,AUROC 满分为 1,95% CI 为(1,1)。在对分类做出贡献的 10 种最重要的细胞因子中,两种算法的共同点如下:LIF、IL22、G-CSF/CSF-3、TRIT。k-NN 和随机森林的 AUROC 分数有显著差异(p 值 = 5.105e-16)。发现细胞因子等生物标记物并将其从分子研究平台转移到临床实践中,可促进基于精准医学的治疗干预。
Smoking Classification Using Novel Plasma Cytokines by implementing Machine Learning and Statistical Methods.
Smoking is a major cause of premature and preventable death. Tobacco exposure has a detrimental effect on many organs and contributes to multiple diseases including chronic obstructive pulmonary disease (COPD), cardiovascular disease, cancer, and diabetes. Cytokines are inflammatory biomarkers that are mechanistically associated with smoking. Machine Learning algorithms allow for the quantitative assessment of the contributions of individual cytokines to tobacco-related diseases. The mapping of cytokines to disease can facilitate and direct treatment modalities. By the application of k Nearest Neighbor (k-NN) and Random Forest machine learning algorithms on 63 plasma cytokines we have demonstrated the classification of smoking. To ensure optimal results, performance improvement techniques such as k-fold cross validation and hyper parameter tuning are employed. Separability efficiency achieved by the models is evaluated using the Area Under the Receiver Operating Characteristic (AUROC) metric. The most significant cytokines that enabled the classification are identified and presented. The statistically significant difference for AUROC score of k-NN and Random Forest has been ascertained using the 2-sample independent t test. A reasonably good classification performance was achieved by k-NN algorithm with an AUROC metric of .87, and a 95% CI of (.823,.917). Random forest exceeded k-NN algorithm's performance, with a perfect AUROC score of 1 and a 95% CI of (1,1). From among the ten most prominent cytokines that contributed to the classification, the ones common to both algorithms are: LIF, IL22, G-CSF/CSF-3, TRAIL. AUROC scores for k-NN and Random Forest are significantly different (p-value = 5.105e-16). The discovery and transference of biomarkers such as cytokines from the platform of molecular investigation to clinical practice, can facilitate precision medicine-based therapeutic interventions.