Sohyun Youn, Dabin Jeong, Hwijun Kwon, Eonyong Han, Sun Kim, Inuk Jung
{"title":"基于密度的聚类方法识别SARS-CoV-2严重相关突变热点","authors":"Sohyun Youn, Dabin Jeong, Hwijun Kwon, Eonyong Han, Sun Kim, Inuk Jung","doi":"10.1186/s13040-025-00476-3","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>The immune response to SARS-CoV-2 varies greatly among individuals yielding highly varying severity levels among the patients. While there are various methods to spot severity associated biomarkers in COVID-19 patients, we investigated highly mutated regions, or mutation hotspots, within the SARS-CoV-2 genome that correlate with patient severity levels. SARS-CoV-2 mutation hotspots were searched in the GISAID database using a density based clustering algorithm, Mutclust, that searches for loci with high mutation density and diversity.</p><p><strong>Results: </strong>Using Mutclust, 477 mutation hotspots were searched in the SARS-CoV-2 genome, of which 28 showed significant association with severity levels in a multi-omics COVID-19 cohort comprised of 387 infected patients. The patients were further stratified into moderate and severe patient groups based on the 28 severity related mutation hotspots that showed distinctive cytokine and gene expression levels in both cytokine profile and single-cell RNA-seq samples. The effect of the SARS-CoV-2 mutation hotspots on human genes was further investigated by network propagation analysis, where two mutation hotspots specific to the severe group showed association with NK cell activity. One of them showed to decrease the affinity between the viral epitope of the hotspot region and its binding HLA when compared to the non-mutated epitope.</p><p><strong>Conclusion: </strong>Genes related to the immunological function of NK cells, especially the NK cell receptor and co-activating receptor genes, were significantly dysregulated in the severe patient group in both cytokine and single-cell levels. Collectively, mutation hotspots associated with severity and their related NK cell associated gene expression regulation were identified.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"61"},"PeriodicalIF":6.1000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12400602/pdf/","citationCount":"0","resultStr":"{\"title\":\"Identification of severity related mutation hotspots in SARS-CoV-2 using a density-based clustering approach.\",\"authors\":\"Sohyun Youn, Dabin Jeong, Hwijun Kwon, Eonyong Han, Sun Kim, Inuk Jung\",\"doi\":\"10.1186/s13040-025-00476-3\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>The immune response to SARS-CoV-2 varies greatly among individuals yielding highly varying severity levels among the patients. While there are various methods to spot severity associated biomarkers in COVID-19 patients, we investigated highly mutated regions, or mutation hotspots, within the SARS-CoV-2 genome that correlate with patient severity levels. SARS-CoV-2 mutation hotspots were searched in the GISAID database using a density based clustering algorithm, Mutclust, that searches for loci with high mutation density and diversity.</p><p><strong>Results: </strong>Using Mutclust, 477 mutation hotspots were searched in the SARS-CoV-2 genome, of which 28 showed significant association with severity levels in a multi-omics COVID-19 cohort comprised of 387 infected patients. The patients were further stratified into moderate and severe patient groups based on the 28 severity related mutation hotspots that showed distinctive cytokine and gene expression levels in both cytokine profile and single-cell RNA-seq samples. The effect of the SARS-CoV-2 mutation hotspots on human genes was further investigated by network propagation analysis, where two mutation hotspots specific to the severe group showed association with NK cell activity. One of them showed to decrease the affinity between the viral epitope of the hotspot region and its binding HLA when compared to the non-mutated epitope.</p><p><strong>Conclusion: </strong>Genes related to the immunological function of NK cells, especially the NK cell receptor and co-activating receptor genes, were significantly dysregulated in the severe patient group in both cytokine and single-cell levels. Collectively, mutation hotspots associated with severity and their related NK cell associated gene expression regulation were identified.</p>\",\"PeriodicalId\":48947,\"journal\":{\"name\":\"Biodata Mining\",\"volume\":\"18 1\",\"pages\":\"61\"},\"PeriodicalIF\":6.1000,\"publicationDate\":\"2025-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12400602/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Biodata Mining\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1186/s13040-025-00476-3\",\"RegionNum\":3,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MATHEMATICAL & COMPUTATIONAL BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biodata Mining","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s13040-025-00476-3","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
Identification of severity related mutation hotspots in SARS-CoV-2 using a density-based clustering approach.
Background: The immune response to SARS-CoV-2 varies greatly among individuals yielding highly varying severity levels among the patients. While there are various methods to spot severity associated biomarkers in COVID-19 patients, we investigated highly mutated regions, or mutation hotspots, within the SARS-CoV-2 genome that correlate with patient severity levels. SARS-CoV-2 mutation hotspots were searched in the GISAID database using a density based clustering algorithm, Mutclust, that searches for loci with high mutation density and diversity.
Results: Using Mutclust, 477 mutation hotspots were searched in the SARS-CoV-2 genome, of which 28 showed significant association with severity levels in a multi-omics COVID-19 cohort comprised of 387 infected patients. The patients were further stratified into moderate and severe patient groups based on the 28 severity related mutation hotspots that showed distinctive cytokine and gene expression levels in both cytokine profile and single-cell RNA-seq samples. The effect of the SARS-CoV-2 mutation hotspots on human genes was further investigated by network propagation analysis, where two mutation hotspots specific to the severe group showed association with NK cell activity. One of them showed to decrease the affinity between the viral epitope of the hotspot region and its binding HLA when compared to the non-mutated epitope.
Conclusion: Genes related to the immunological function of NK cells, especially the NK cell receptor and co-activating receptor genes, were significantly dysregulated in the severe patient group in both cytokine and single-cell levels. Collectively, mutation hotspots associated with severity and their related NK cell associated gene expression regulation were identified.
期刊介绍:
BioData Mining is an open access, open peer-reviewed journal encompassing research on all aspects of data mining applied to high-dimensional biological and biomedical data, focusing on computational aspects of knowledge discovery from large-scale genetic, transcriptomic, genomic, proteomic, and metabolomic data.
Topical areas include, but are not limited to:
-Development, evaluation, and application of novel data mining and machine learning algorithms.
-Adaptation, evaluation, and application of traditional data mining and machine learning algorithms.
-Open-source software for the application of data mining and machine learning algorithms.
-Design, development and integration of databases, software and web services for the storage, management, retrieval, and analysis of data from large scale studies.
-Pre-processing, post-processing, modeling, and interpretation of data mining and machine learning results for biological interpretation and knowledge discovery.