{"title":"Lasso algorithm and support vector machine strategy to screen pulmonary arterial hypertension gene diagnostic markers.","authors":"Chenyang Jiang, Weidong Jiang","doi":"10.1177/00369330221132158","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>This study employs machine learning strategy algorithms to screen the optimal gene signature of pulmonary arterial hypertension (PAH) under big data in the medical field.</p><p><strong>Methods: </strong>The public database Gene Expression Omnibus (GEO) was used to analyze datasets of 32 normal controls and 37 PAH disease samples. The enrichment analysis was performed after selecting the differentially expressed genes. Two machine learning methods, the least absolute shrinkage and selection operator (LASSO) and support vector machine (SVM), were used to identify the candidate genes. The external validation data set further tests the expression level and diagnostic value of candidate diagnostic genes. The diagnostic effectiveness was evaluated by obtaining the receiver operating characteristic curve (ROC). The convolution tool CIBERSORT was used to estimate the composition pattern of the immune cell subtypes and to perform correlation analysis based on the combined training dataset.</p><p><strong>Results: </strong>A total of 564 differentially expressed genes (DEGs) were screened in normal control and pulmonary hypertension samples. The enrichment analysis results were found to be closely related to cardiovascular diseases, inflammatory diseases, and immune-related pathways. The LASSO and SVM algorithms in machine learning used 5 × cross-validation to identify 9 and 7 characteristic genes. The two machine learning algorithms shared Caldesmon 1 (<i>CALD1</i>) and Solute Carrier Family 7 Member 11 (<i>SLC7A11</i>) as genetic signals highly correlated with PAH. The results showed that the area under ROC (AUC) of the specific characteristic diagnostic genes were <i>CALD1</i> (AUC = 0.924) and <i>SLC7A11</i> (AUC = 0.962), indicating that the two diagnostic genes have high diagnostic value.</p><p><strong>Conclusion: </strong><i>CALD1</i> and <i>SLC7A11</i> can be used as diagnostic markers of PAH to obtain new insights for the further study of the immune mechanism involved in PAH.</p>","PeriodicalId":21683,"journal":{"name":"Scottish Medical Journal","volume":"68 1","pages":"21-31"},"PeriodicalIF":1.4000,"publicationDate":"2023-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Scottish Medical Journal","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1177/00369330221132158","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICINE, GENERAL & INTERNAL","Score":null,"Total":0}
引用次数: 1
Abstract
Background: This study employs machine learning strategy algorithms to screen the optimal gene signature of pulmonary arterial hypertension (PAH) under big data in the medical field.
Methods: The public database Gene Expression Omnibus (GEO) was used to analyze datasets of 32 normal controls and 37 PAH disease samples. The enrichment analysis was performed after selecting the differentially expressed genes. Two machine learning methods, the least absolute shrinkage and selection operator (LASSO) and support vector machine (SVM), were used to identify the candidate genes. The external validation data set further tests the expression level and diagnostic value of candidate diagnostic genes. The diagnostic effectiveness was evaluated by obtaining the receiver operating characteristic curve (ROC). The convolution tool CIBERSORT was used to estimate the composition pattern of the immune cell subtypes and to perform correlation analysis based on the combined training dataset.
Results: A total of 564 differentially expressed genes (DEGs) were screened in normal control and pulmonary hypertension samples. The enrichment analysis results were found to be closely related to cardiovascular diseases, inflammatory diseases, and immune-related pathways. The LASSO and SVM algorithms in machine learning used 5 × cross-validation to identify 9 and 7 characteristic genes. The two machine learning algorithms shared Caldesmon 1 (CALD1) and Solute Carrier Family 7 Member 11 (SLC7A11) as genetic signals highly correlated with PAH. The results showed that the area under ROC (AUC) of the specific characteristic diagnostic genes were CALD1 (AUC = 0.924) and SLC7A11 (AUC = 0.962), indicating that the two diagnostic genes have high diagnostic value.
Conclusion: CALD1 and SLC7A11 can be used as diagnostic markers of PAH to obtain new insights for the further study of the immune mechanism involved in PAH.
期刊介绍:
A unique international information source for the latest news and issues concerning the Scottish medical community. Contributions are drawn from Scotland and its medical institutions, through an array of international authors. In addition to original papers, Scottish Medical Journal publishes commissioned educational review articles, case reports, historical articles, and sponsoring society abstracts.This journal is a member of the Committee on Publications Ethics (COPE).