Long COVID, a heterogeneous condition characterized by a range of physical and neuropsychiatric presentations, can be presented with a proportion of COVID-19-infected individuals.
Transcriptomic data sets of those within gene expression profiles of COVID-19, long COVID, and healthy controls were retrieved from the GEO database. Differentially expressed genes (DEGs) falling under COVID-19 and long COVID were identified with R packages, and contemporaneously conducted module detection was performed with the Modular Pharmacology Platform (http://112.86.129.72:48081/). The integration of both DEGs and differentially expressed module-genes (DEMGs) regarding long COVID and COVID-19 was intersected by following Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG), and Gene Set Enrichment Analysis (GSEA).
There were 11 and 62 differentially expressed modules, 1837 and 179 DEGs, as well as 103 and 508 DEMGs acquiring identified for both COVID-19 and long COVID, notably enriched in the immune-correlated signaling pathways. The immune infiltrating cells of long COVID and COVID-19 were comparatively and respectively assessed via CIBERSORT, ssGSEA, and xCell algorithms. Subsequently, the screening of hub genes involved employing the SVM-RFE, RF, XGBoost algorithms, and logistic regression analysis. Among the 67 candidate genes were processed with machine learning algorithms and logistic regression, a subgroup consisting of CEP55, CDCA2, MELK, and DEPDC1B, was at last identified as potential biomarkers for predicting the risk of the progression into long COVID after COVID-19 infections. The predicting performance of the potential biomarkers was quantified with a ROC value of 0.8762542, which proved the combination of potential biomarkers provided the highest performance.
In summary, we identified a subgroup of potential biomarkers for predicting the risk of the progression into long COVID after COVID-19 infection, which could be partly elucidation of the associated molecular mechanisms for long COVID.