Ioannis Mouratidis, Maxwell Konnaris, Nikol Chantzi, Candace S.Y Chan, Michail Patsakis, Kimonas Provatas, Austin Montgomery, Fotis A. Baltoumas, Congzhou M. Sha, Manvita Mareboina, Georgios A. Pavlopoulos, Dionysios V. Chartoumpekis, Ilias Georgakopoulos-Soares
{"title":"Identification of the shortest species-specific oligonucleotide sequences","authors":"Ioannis Mouratidis, Maxwell Konnaris, Nikol Chantzi, Candace S.Y Chan, Michail Patsakis, Kimonas Provatas, Austin Montgomery, Fotis A. Baltoumas, Congzhou M. Sha, Manvita Mareboina, Georgios A. Pavlopoulos, Dionysios V. Chartoumpekis, Ilias Georgakopoulos-Soares","doi":"10.1101/gr.280070.124","DOIUrl":null,"url":null,"abstract":"Despite the exponential increase in sequencing information driven by massively parallel DNA sequencing technologies, universal and succinct genomic fingerprints for each organism are still missing. Identifying the shortest species-specific nucleic sequences offers insights into species evolution and holds potential practical applications in agriculture, wildlife conservation, and healthcare. We propose a new method for sequence analysis termed nucleic \"quasi-primes\", the shortest occurring sequences in each of 45,785 organismal reference genomes, present in one genome and absent from every other examined genome. In the human genome, we find that the genomic loci of nucleic quasi-primes are most enriched for genes associated with brain development and cognitive function. In a single-cell case study focusing on the human primary motor cortex, nucleic quasi-prime genes account for a significantly larger proportion of the variation based on average gene expression. Non-neuronal cell types, including astrocytes, endothelial cells, microglia perivascular-macrophages, oligodendrocytes, and vascular and leptomeningeal cells, exhibited significant activation of quasi-prime containing gene associations related to cancer, while simultaneously suppressing quasi-prime containing genes were associated with cognitive, mental, and developmental disorders. We also show that human disease-causing variants, eQTLs, mQTLs and sQTLs are 4.43-fold, 4.34-fold, 4.29-fold and 4.21-fold enriched at human quasi-prime loci, respectively. These findings indicate that nucleic quasi-primes are genomic loci linked to the evolution of species-specific traits and in humans they provide insights in the development of cognitive traits and human diseases, including neurodevelopmental disorders.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"134 1","pages":""},"PeriodicalIF":6.2000,"publicationDate":"2025-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genome research","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1101/gr.280070.124","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Despite the exponential increase in sequencing information driven by massively parallel DNA sequencing technologies, universal and succinct genomic fingerprints for each organism are still missing. Identifying the shortest species-specific nucleic sequences offers insights into species evolution and holds potential practical applications in agriculture, wildlife conservation, and healthcare. We propose a new method for sequence analysis termed nucleic "quasi-primes", the shortest occurring sequences in each of 45,785 organismal reference genomes, present in one genome and absent from every other examined genome. In the human genome, we find that the genomic loci of nucleic quasi-primes are most enriched for genes associated with brain development and cognitive function. In a single-cell case study focusing on the human primary motor cortex, nucleic quasi-prime genes account for a significantly larger proportion of the variation based on average gene expression. Non-neuronal cell types, including astrocytes, endothelial cells, microglia perivascular-macrophages, oligodendrocytes, and vascular and leptomeningeal cells, exhibited significant activation of quasi-prime containing gene associations related to cancer, while simultaneously suppressing quasi-prime containing genes were associated with cognitive, mental, and developmental disorders. We also show that human disease-causing variants, eQTLs, mQTLs and sQTLs are 4.43-fold, 4.34-fold, 4.29-fold and 4.21-fold enriched at human quasi-prime loci, respectively. These findings indicate that nucleic quasi-primes are genomic loci linked to the evolution of species-specific traits and in humans they provide insights in the development of cognitive traits and human diseases, including neurodevelopmental disorders.
期刊介绍:
Launched in 1995, Genome Research is an international, continuously published, peer-reviewed journal that focuses on research that provides novel insights into the genome biology of all organisms, including advances in genomic medicine.
Among the topics considered by the journal are genome structure and function, comparative genomics, molecular evolution, genome-scale quantitative and population genetics, proteomics, epigenomics, and systems biology. The journal also features exciting gene discoveries and reports of cutting-edge computational biology and high-throughput methodologies.
New data in these areas are published as research papers, or methods and resource reports that provide novel information on technologies or tools that will be of interest to a broad readership. Complete data sets are presented electronically on the journal''s web site where appropriate. The journal also provides Reviews, Perspectives, and Insight/Outlook articles, which present commentary on the latest advances published both here and elsewhere, placing such progress in its broader biological context.