Soonjoung Kim, Shintaro Yamada, Kaku Maekawa, Scott Keeney
{"title":"Optimized methods for mapping DNA double-strand-break ends and resection tracts and application to meiotic recombination in mouse spermatocytes","authors":"Soonjoung Kim, Shintaro Yamada, Kaku Maekawa, Scott Keeney","doi":"10.1101/2024.08.10.606181","DOIUrl":"https://doi.org/10.1101/2024.08.10.606181","url":null,"abstract":"DNA double-strand breaks (DSBs) made by SPO11 protein initiate homologous recombination during meiosis. Subsequent to DNA strand breakage, endo- and exo-nucleases process the DNA ends to resect the strands whose 5' termini are at the DSB, generating long 3'-terminal single-stranded tails that serve as substrates for strand exchange proteins. DSB resection is essential for meiotic recombination, but a detailed understanding of its molecular mechanism is currently lacking. Genomic approaches to mapping DSBs and resection endpoints, e.g., S1-sequencing (S1-seq) and similar methods, play a critical role in studies of meiotic DSB processing. In these methods, nuclease S1 or other enzymes that specifically degrade ssDNA are used to trim resected DSBs, allowing capture and sequencing of the ends of resection tracts. Here, we present optimization of S1-seq that improves its signal:noise ratio and allows its application to analysis of spermatocyte meiosis in adult mice. Furthermore, quantitative features of meiotic resection are evaluated for reproducibility, and we suggest approaches for analysis and interpretation of S1-seq data. We also compare S1-seq to variants that use exonuclease T and/or exonuclease VII from Escherichia coli instead of nuclease S1. Detailed step-by-step protocols and suggestions for troubleshooting are provided.","PeriodicalId":501161,"journal":{"name":"bioRxiv - Genomics","volume":"113 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141934856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Paule Valery Joseph, Malak Abbas, Gabriel Goodney, Ana Diallo, Amadou Gaye
{"title":"Co-localized SNPs Affecting the Expression of Taste Perception Genes are linked to Alzheimer's Disease","authors":"Paule Valery Joseph, Malak Abbas, Gabriel Goodney, Ana Diallo, Amadou Gaye","doi":"10.1101/2024.08.10.607452","DOIUrl":"https://doi.org/10.1101/2024.08.10.607452","url":null,"abstract":"Background\u0000While previous research has shown the potential links between taste perception pathways and brain-related conditions, the area involving Alzheimer's disease remains incompletely understood. Taste perception involves neurotransmitter signaling, including serotonin, glutamate, and dopamine. Disruptions in these pathways are implicated in neurodegenerative diseases. The integration of olfactory and taste signals in flavor perception may impact brain health, evident in olfactory dysfunction as an early symptom in neurodegenerative conditions. Shared immune response and inflammatory pathways may contribute to the association between altered taste perception and conditions like neurodegeneration, present in Alzheimer's disease.\u0000Methods\u0000This study consists of an exploration of expression-quantitative trait loci (eQTL), utilizing whole-blood transcriptome profiles, of 28 taste perception genes, from a combined cohort of 475 African American subjects. This comprehensive dataset was subsequently intersected with single-nucleotide polymorphisms (SNPs) identified in Genome-Wide Association Studies (GWAS) of Alzheimer's Disease (AD). Finally, the investigation delved into assessing the association between eQTLs reported in GWAS of AD and the profiles of 741 proteins from the Olink Neurological Panel.\u0000Results\u0000The eQTL analysis unveiled 3,547 statistically significant SNP-Gene associations, involving 412 distinct SNPs that spanned all 28 taste genes. In 17 GWAS studies encompassing various traits, a total of 14 SNPs associated with 12 genes were identified, with three SNPs consistently linked to Alzheimer's disease across four GWAS studies. All three SNPs demonstrated significant associations with the down-regulation of TAS2R41, and two of them were additionally associated with the down-regulation of TAS2R60. In the subsequent pQTL analysis, two of the SNPs linked to TAS2R41 and TAS2R60 genes (rs117771145 and rs10228407) were correlated with the upregulation of two proteins, namely EPHB6 and ADGRB3.\u0000Conclusions\u0000Our investigation introduces a new perspective to the understanding of Alzheimer's disease, emphasizing the significance of bitter taste receptor genes in its pathogenesis. These discoveries set the stage for subsequent research to delve into these receptors as promising avenues for both intervention and diagnosis. Nevertheless, the translation of these genetic insights into clinical practice requires a more profound understanding of the implicated pathways and their pertinence to the disease's progression across diverse populations.","PeriodicalId":501161,"journal":{"name":"bioRxiv - Genomics","volume":"14 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141968961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gerry Shipman, Reinnier Padilla, Cynthia Horth, Eric Bareke, Jacek Majewski
{"title":"H3K36 Methylation - a Guardian of Epigenome Integrity","authors":"Gerry Shipman, Reinnier Padilla, Cynthia Horth, Eric Bareke, Jacek Majewski","doi":"10.1101/2024.08.10.607446","DOIUrl":"https://doi.org/10.1101/2024.08.10.607446","url":null,"abstract":"H3K36 methylation is emerging as a key epigenetic modification with strong implications in genetic disease and cancer. However, the mechanisms through which H3K36me impacts the epigenome and asserts its functional consequences are far from understood. Here, we use mouse mesenchymal stem cell lines with successive knockouts of the H3K36 methyltransferases: NSD1, NSD2, SETD2, NSD3, and ASH1L, which result in progressive depletion of H3K36me and its complete absence in quintuple knockout cells, to finely dissect the role of H3K36me2 in shaping the epigenome and transcriptome. We show that H3K36me2, which targets active enhancers, is important for maintaining enhancer activity, and its depletion results in downregulation of enhancer-dependent genes. We demonstrate the roles of H3K36me2/3 in preventing the invasion of gene bodies by the repressive H3K27me modifications. Finally, we observe a previously undescribed relationship between H3K36me and H3K9me3: Following the depletion of H3K36me2, H3K9me3 is redistributed away from large heterochromatic domains and towards euchromatin. This results in a drastic decompartmentalization of the genome, weakening the boundaries between active and inactive compartments, and a catastrophic loss of long-range inter-compartment interactions. By studying cells totally devoid of H3K36 methyltransferase activity, we uncover a broad range of crucial functions of H3K36me in maintaining epigenome integrity.","PeriodicalId":501161,"journal":{"name":"bioRxiv - Genomics","volume":"41 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141934855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nicholas J. Eagles, Svitlana V. Bach, Madhavi Tippani, Prashanti Ravichandran, Yufeng Du, Ryan A. Miller, Thomas Hyde, Stephanie C. Page, Keri Martinowich, Leonardo Collado-Torres
{"title":"Integrating gene expression and imaging data across Visium capture areas with visiumStitched","authors":"Nicholas J. Eagles, Svitlana V. Bach, Madhavi Tippani, Prashanti Ravichandran, Yufeng Du, Ryan A. Miller, Thomas Hyde, Stephanie C. Page, Keri Martinowich, Leonardo Collado-Torres","doi":"10.1101/2024.08.08.607222","DOIUrl":"https://doi.org/10.1101/2024.08.08.607222","url":null,"abstract":"<strong>Background</strong> Visium is a widely-used spatially-resolved transcriptomics assay available from 10x Genomics. Standard Visium capture areas (6.5mm by 6.5mm) limit the survey of larger tissue structures, but combining overlapping images and associated gene expression data allow for more complex study designs. Current software can handle nested or partial image overlaps, but is designed for merging up to two capture areas, and cannot account for some technical scenarios related to capture area alignment. <strong>Results</strong>\u0000We generated Visium data from a postmortem human tissue sample such that two capture areas were partially overlapping and a third one was adjacent. We developed the R/Bioconductor package <em>visiumStitched</em>, which facilitates stitching the images together with <em>Fiji</em> (<em>ImageJ</em>), and constructing <em>SpatialExperiment</em> R objects with the stitched images and gene expression data. <em>visiumStitched</em> constructs an artificial hexagonal array grid which allows seamless downstream analyses such as spatially-aware clustering without discarding data from overlapping spots. Data stitched with <em>visiumStitched</em> can then be interactively visualized with <em>spatialLIBD</em>. <strong>Conclusions</strong>\u0000<em>visiumStitched</em> provides a simple, but flexible framework to handle various multi-capture area study design scenarios. Specifically, it resolves a data processing step without disrupting analysis workflows and without discarding data from overlapping spots. <em>visiumStiched</em> relies on affine transformations by <em>Fiji</em>, which have limitations and are less accurate when aligning against an atlas or other situations. <em>visiumStiched</em> provides an easy-to-use solution which expands possibilities for designing multi-capture area study designs.","PeriodicalId":501161,"journal":{"name":"bioRxiv - Genomics","volume":"195 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141934858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
David Chan-Rodriguez, Bian Wakimwayi Koboyi, Sirine Werghi, Bradley J Till, Julia Maksymiuk, Fatemeh Shoormij, Abuya Hilderlith, Anna Hawliczek, Maksymilian Krolik, Hanna Bolibok-Bragoszewska
{"title":"Phosphate transporter (Pht) gene families in rye (Secale cereale L.) – genome-wide identification and sequence diversity assessment","authors":"David Chan-Rodriguez, Bian Wakimwayi Koboyi, Sirine Werghi, Bradley J Till, Julia Maksymiuk, Fatemeh Shoormij, Abuya Hilderlith, Anna Hawliczek, Maksymilian Krolik, Hanna Bolibok-Bragoszewska","doi":"10.1101/2024.08.09.607312","DOIUrl":"https://doi.org/10.1101/2024.08.09.607312","url":null,"abstract":"Background: Phosphorus is a macronutrient indispensable for plant growth and development. Plants utilize specialized transporters (PHT) to take up inorganic phosphorus and distribute it throughout the plant. The PHT transporters are divided into five families: PHT1 to PHT5. Each PHT family has a particular physiological and cellular function. Rye (Secale cereale L.) is a member of Triticeae, and an important source of variation for wheat breeding. It is considered to have the highest tolerance of nutrient deficiency, among Triticeae. To date, there is no report about genes involved in response to phosphorus deficiency in rye. The aim of this study was to: (i) identify and characterize putative members of different phosphate transporter families in rye, (i) assess their sequence diversity in a collection of diverse rye accessions via low-coverage resequencing (DArTreseq), and (iii) evaluate the expression of putative rye Pht genes under phosphate-deficient conditions.\u0000Results: We identified 29 and 35 putative Pht transporter genes in the rye Lo7 and Weining reference genomes, respectively, representing all known Pht families. Phylogenetic analysis revealed a close relationship of rye PHT with previously characterized PHT proteins from other species. Quantitative RT PCR carried out on leaf and root samples of Lo7 plants grown in Pi-deficient and control condition demonstrated that ScPht1;6, ScPht2 and ScPht3;1 are Pi-deficiency responsive. Based on DArTreseq genotyping of 94 diverse rye accessions we identified 820 polymorphic sites within rye ScPht, including 12 variants with a putatively deleterious effect. SNP density varied markedly between ScPht genes.\u0000Conclusions: This report is the first step toward elucidating the mechanisms of rye response to Pi deficiency. Our findings point to multiple layers of adaptation to local environments, ranging from gene copy number variation to differences in level of polymorphism across Pht family members. DArTreseq genotyping permits for a quick and cost-effective assessment of polymorphism levels across genes/gene families and supports identification and prioritization of candidates for further studies. Collectively our findings provide the foundation for selecting most promising candidates for further functional characterization.","PeriodicalId":501161,"journal":{"name":"bioRxiv - Genomics","volume":"2010 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141934861","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Nearly complete genome assembly of a critically endangered pine illuminates evolution and conservation of conifers","authors":"Ren-Gang Zhang, Hui Liu, Heng Shu, De-Tuan Liu, Hong-Yun Shang, Kai-Hua Jia, Xiao-Quan Wang, Wei-Bang Sun, Wei Zhao, Yong-Peng Ma","doi":"10.1101/2024.08.07.607108","DOIUrl":"https://doi.org/10.1101/2024.08.07.607108","url":null,"abstract":"Conifers are dominant in most temperate and boreal forest ecosystems, and are the most widely distributed of the gymnosperms. Despite this, many conifer species are threatened with extinction, and in particular the genetic mechanisms underlying their endangerment remain largely unknown. Pinus squamata, which harbors an extremely large diploid genome and conservation significance, is among the 100 most endangered species (plants and animals) globally, and has been designated as 'Critically Endangered' on the IUCN Red List. In this study, we report an almost complete genome sequence for P. squamata generated by a suite of sequencing technologies, with an assembly of 29.2 Gb, a scaffold N50 length of 2.5 Gb, and a remarkable contig N50 length of 915.4 Mb. This represents the largest and most high-quality gymnosperm genome sequenced to date. The genome is characterized by an ultra-low rate of heterozygosity, is dominated by transposable elements, and contains 55,413 protein-coding genes. Our study provides the first detailed examination of chromosome organization in P. squamata, revealing Rabl configurations and distinctive centromere signatures. This genomic milestone not only deepens our understanding of gymnosperm genetics and evolution but also lays a solid foundation for the development of effective conservation measures, ensuring the survival of this rare species in the face of environmental challenges.","PeriodicalId":501161,"journal":{"name":"bioRxiv - Genomics","volume":"2 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141934923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Guy Leonard, Benjamin H Jenkins, Fiona R Savory, Estelle S Kilias, Finlay Maguire, Varun Varma, David S Milner, Thomas A Richards
{"title":"De novo genome sequence assembly of the RNAi-tractable endosymbiosis model system Paramecium bursaria 186b reveals factors shaping intron repertoire","authors":"Guy Leonard, Benjamin H Jenkins, Fiona R Savory, Estelle S Kilias, Finlay Maguire, Varun Varma, David S Milner, Thomas A Richards","doi":"10.1101/2024.08.09.607295","DOIUrl":"https://doi.org/10.1101/2024.08.09.607295","url":null,"abstract":"How two species engage in stable endosymbiosis is a biological quandary. The study of facultative endosymbiotic interactions has emerged as a useful approach to understand how endosymbiotic functions can arise. The ciliate protist Paramecium bursaria hosts green algae of the order Chlorellales in a facultative photo-endosymbiosis. We have recently reported RNAi as a tool for understanding gene function in Paramecium bursaria 186b, CCAP strain 1660/18 [1]. To complement this work, here we report a highly complete host genome and transcriptome sequence dataset, using both Illumina and PacBio sequencing methods to aid genome analysis and to enable the design of RNAi experiments. Our analyses demonstrate Paramecium bursaria, like other ciliates such as diverse species of Paramecia, possess numerous tiny introns. These data, combined with the alternative genetic code common to ciliates, makes gene identification and annotation challenging. To explore intron evolutionary dynamics further we show that alternative splicing leading to intron retention occurs at a higher frequency among the smaller number of longer introns, identifying a source of selection against longer introns. These data will aid the investigation of genome evolution in the Paramecia and provide additional source data for the exploration of endosymbiotic functions.","PeriodicalId":501161,"journal":{"name":"bioRxiv - Genomics","volume":"77 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141934859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Louis Kraft, Johannes Soeding, Martin Steinegger, Annika Jochheim, Antonio Fernandez-Guerra, Gabriel Renaud
{"title":"CarpeDeam: A De Novo Metagenome Assembler for Heavily Damaged Ancient Datasets","authors":"Louis Kraft, Johannes Soeding, Martin Steinegger, Annika Jochheim, Antonio Fernandez-Guerra, Gabriel Renaud","doi":"10.1101/2024.08.09.607291","DOIUrl":"https://doi.org/10.1101/2024.08.09.607291","url":null,"abstract":"De novo assembly of ancient metagenomic datasets is a challenging task. Ultra-short fragment size and characteristic postmortem damage patterns of sequenced ancient DNA molecules leave current tools ill-equipped for ideal assembly. We present CarpeDeam, a novel damage-aware de novo assembler designed specifically for ancient metagenomic samples. Utilizing maximum-likelihood frameworks that integrate sample-specific damage patterns, CarpeDeam recovers longer continuous sequences and more protein sequences from both simulated and empirical datasets compared to existing assemblers. As a pioneering ancient metagenome assembler, CarpeDeam opens the door for new opportunities in functional and taxonomic analyses of ancient microbial communities.","PeriodicalId":501161,"journal":{"name":"bioRxiv - Genomics","volume":"86 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141934860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
William DeGroat, Habiba Abdelhalim, Elizabeth Peker, Neev Sheth, Rishabh Narayanan, Saman Zeeshan, Bruce T. Liang, Zeeshan Ahmed
{"title":"Multimodal AI/ML for discovering novel biomarkers and predicting disease using multi-omics profiles of patients with cardiovascular diseases","authors":"William DeGroat, Habiba Abdelhalim, Elizabeth Peker, Neev Sheth, Rishabh Narayanan, Saman Zeeshan, Bruce T. Liang, Zeeshan Ahmed","doi":"10.1101/2024.08.07.607041","DOIUrl":"https://doi.org/10.1101/2024.08.07.607041","url":null,"abstract":"Cardiovascular diseases (CVDs) are multifactorial diseases, requiring personalized assessment and treatment. The advancements in multi-omics technologies, namely RNA-seq and whole genome sequencing, have offered translational researchers a comprehensive view of the human genome; utilizing this data, we can reveal novel biomarkers and segment patient populations based on personalized risk factors. Limitations in these technologies in failing to capture disease complexity can be accounted for by using an integrated approach, characterizing variants alongside expression related to emerging phenotypes. Designed and implemented data analytics methodology is based on a nexus of orthodox bioinformatics, classical statistics, and multimodal artificial intelligence and machine learning techniques. Our approach has the potential to reveal the intricate mechanisms of CVD that can facilitate patient-specific disease risk and response profiling. We sourced transcriptomic expression and variants from CVD and control subjects. By integrating these multi-omics datasets with clinical demographics, we generated patient-specific profiles. Utilizing a robust feature selection approach, we reported a signature of 27 transcripts and variants efficient at predicting CVD. Here, differential expression analysis and minimum redundancy maximum relevance feature selection elucidated biomarkers explanatory of the disease phenotype. We used Combination Annotation Dependent Depletion and allele frequencies to identify variants with pathogenic characteristics in CVD patients. Classification models trained on this signature demonstrated high-accuracy predictions for CVDs. Overall, we observed an XGBoost model hyperparameterized using Bayesian optimization perform the best (AUC 1.0). Using SHapley Additive exPlanations, we compiled risk assessments for patients capable of further contextualizing these predictions in a clinical setting. We discovered a 27-component signature explanatory of phenotypic differences in CVD patients and healthy controls using a feature selection approach prioritizing both biological relevance and efficiency in machine learning. Literature review revealed previous CVD associations in a majority of these diagnostic biomarkers. Classification models trained on this signature were able to predict CVD in patients with high accuracy. Here, we propose a framework generalizable to other diseases and disorders.","PeriodicalId":501161,"journal":{"name":"bioRxiv - Genomics","volume":"15 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141968964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shi Jie Samuel Tan, Huyen Trang Dang, Sarah Keim, Maja Bućan, Sara Mathieson
{"title":"Identity-by-descent (IBD) segment outlier detection in endogamous populations using pedigree cohorts","authors":"Shi Jie Samuel Tan, Huyen Trang Dang, Sarah Keim, Maja Bućan, Sara Mathieson","doi":"10.1101/2024.08.07.607051","DOIUrl":"https://doi.org/10.1101/2024.08.07.607051","url":null,"abstract":"Genomic segments that are inherited from a common ancestor are referred to as identical-by-descent (IBD). Because these segments are inherited, they not only allow us to study diseases, population characteristics, and the sharing of rare variants, but also understand hidden familial relationships within populations. Over the past two decades, various IBD finding algorithms have been developed using hidden Markov models (HMMs), hashing and extension, and Burrows-Wheeler Transform (BWT) approaches. In this study, we investigate the utility of pedigree information in IBD outlier detection methods for endogamous populations. With the increasing prevalence of computationally efficient sequencing technology and proper documentation of pedigree structures, we expect complete pedigree information to become readily available for more populations. While IBD segments have been used to <em>reconstruct</em> pedigrees, because we now have access to the pedigree, it is a natural question to ask if including pedigree information would substantially improve IBD segment finding for the purpose of studying inheritance. We propose an IBD pruning algorithm for reducing the number of false positives in IBD segments detected by existing software. While existing software already identify IBD segments with high success rates, our algorithm analyzes the familial relationships between cohorts of individuals who are initially hypothesized to share IBD segments to remove outliers. Our algorithm is inspired by a k-Nearest Neighbors (kNN) approach with a novel distance metric for pedigrees with loops. We apply our method to simulated genomic data under an Amish pedigree, but it could be applied to pedigrees from other human populations as well as domesticated animals such as dogs and cattle.","PeriodicalId":501161,"journal":{"name":"bioRxiv - Genomics","volume":"113 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141934924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}