Matthew M Hong, David Froelicher, Ricky Magner, Victoria Popic, Bonnie Berger, Hyunghoon Cho
{"title":"Secure Discovery of Genetic Relatives across Large-Scale and Distributed Genomic Datasets.","authors":"Matthew M Hong, David Froelicher, Ricky Magner, Victoria Popic, Bonnie Berger, Hyunghoon Cho","doi":"10.1007/978-1-0716-3989-4_19","DOIUrl":"10.1007/978-1-0716-3989-4_19","url":null,"abstract":"<p><p>Finding relatives within a study cohort is a necessary step in many genomic studies. However, when the cohort is distributed across multiple entities subject to data-sharing restrictions, performing this step often becomes infeasible. Developing a privacy-preserving solution for this task is challenging due to the significant burden of estimating kinship between all pairs of individuals across datasets. We introduce SF-Relate, a practical and secure federated algorithm for identifying genetic relatives across data silos. SF-Relate vastly reduces the number of individual pairs to compare while maintaining accurate detection through a novel locality-sensitive hashing approach. We assign individuals who are likely to be related together into buckets and then test relationships only between individuals in matching buckets across parties. To this end, we construct an effective hash function that captures identity-by-descent (IBD) segments in genetic sequences, which, along with a new bucketing strategy, enable accurate and practical private relative detection. To guarantee privacy, we introduce an efficient algorithm based on multiparty homomorphic encryption (MHE) to allow data holders to cooperatively compute the relatedness coefficients between individuals, and to further classify their degrees of relatedness, all without sharing any private data. We demonstrate the accuracy and practical runtimes of SF-Relate on the UK Biobank and All of Us datasets. On a dataset of 200K individuals split between two parties, SF-Relate detects 94.9% of third-degree relatives, and 99.9% of second-degree or closer relatives, within 15 hours of runtime. Our work enables secure identification of relatives across large-scale genomic datasets.</p>","PeriodicalId":74675,"journal":{"name":"Research in computational molecular biology : ... Annual International Conference, RECOMB ... : proceedings. RECOMB (Conference : 2005- )","volume":"14758 ","pages":"308-313"},"PeriodicalIF":0.0,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11257153/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141725223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Comparative Analysis of Alternative Splicing Events in Foliar Transcriptomes of Potato Plants Inoculated with Phytophthora Infestans","authors":"J. A. Lee, X. Min","doi":"10.5376/cmb.2023.13.0001","DOIUrl":"https://doi.org/10.5376/cmb.2023.13.0001","url":null,"abstract":"Alternative splicing (AS) is a common process during gene expression of plants in coping various biotic or abiotic stresses. The work reports identification and analysis of AS events in foliar samples of two potato lines, including a wild type line and a pathogen resistant transgenic line (+RB), inoculated with Phytophthora infestans . After combining all RNA-seq data collected from 36 samples, a total of 10,246 AS events were identified, including 1,563 exon skipping, 1,368 alternative donor sites, 3,091 alternative acceptor sites, 884 intron retention, and 3,340 complex events, which consisted of more than one basic event. These AS events were generated from 45,874 isoform transcripts expressed from 13,704 genes. It was estimated 30.2% of genes undergoing AS in this analysis. Furthermore, we identified 406 specific AS events, which were generated from 281 genes, and 766 differentially expressed transcripts (DETs) in the sample collected 24 hours after inoculation of P. infestans in +RB lines. These DETs were expressed from 763 genes, and among them, 338 genes were alternatively spliced. These results indicate that both AS and differential gene expression may contribute to the resistance against P. infestans in +RB line of potato plants.","PeriodicalId":74675,"journal":{"name":"Research in computational molecular biology : ... Annual International Conference, RECOMB ... : proceedings. RECOMB (Conference : 2005- )","volume":"25 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81074843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Computational Molecular Biology Interdisciplinary Technological Integration and New Advances","authors":"Jessi White, Garen Lee","doi":"10.5376/cmb.2023.13.0003","DOIUrl":"https://doi.org/10.5376/cmb.2023.13.0003","url":null,"abstract":"This review presents the development, fundamental techniques, applications, and future directions of Computational Molecular Biology. Computational Molecular Biology is an interdisciplinary field that integrates knowledge from computer science, statistics, and biology to study molecular biology problems. The review emphasizes the fundamental techniques in Computational Molecular Biology and discusses its applications in biomedical research. Through the use of Computational Molecular Biology approaches, researchers can gain better insights into the molecular structures and functions within organisms, leading to the design of more effective drugs and treatment strategies, as well as the discovery of new therapeutic targets and pathways. Lastly, the review explores the future directions of Computational Molecular Biology. As these techniques continue to evolve, Computational Molecular Biology will further expand its application scope, bringing about more innovations and breakthroughs in biomedical research.","PeriodicalId":74675,"journal":{"name":"Research in computational molecular biology : ... Annual International Conference, RECOMB ... : proceedings. RECOMB (Conference : 2005- )","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135701646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Agniva Chowdhury, Aritra Bose, Samson Zhou, David P Woodruff, Petros Drineas
{"title":"A Fast, Provably Accurate Approximation Algorithm for Sparse Principal Component Analysis Reveals Human Genetic Variation Across the World.","authors":"Agniva Chowdhury, Aritra Bose, Samson Zhou, David P Woodruff, Petros Drineas","doi":"10.1007/978-3-031-04749-7_6","DOIUrl":"10.1007/978-3-031-04749-7_6","url":null,"abstract":"<p><p>Principal component analysis (PCA) is a widely used dimensionality reduction technique in machine learning and multivariate statistics. To improve the interpretability of PCA, various approaches to obtain sparse principal direction loadings have been proposed, which are termed Sparse Principal Component Analysis (SPCA). In this paper, we present ThreSPCA, a provably accurate algorithm based on thresholding the Singular Value Decomposition for the SPCA problem, without imposing any restrictive assumptions on the input covariance matrix. Our thresholding algorithm is conceptually simple; much faster than current state-of-the-art; and performs well in practice. When applied to genotype data from the 1000 Genomes Project, ThreSPCA is faster than previous benchmarks, at least as accurate, and leads to a set of interpretable biomarkers, revealing genetic diversity across the world.</p>","PeriodicalId":74675,"journal":{"name":"Research in computational molecular biology : ... Annual International Conference, RECOMB ... : proceedings. RECOMB (Conference : 2005- )","volume":"13278 ","pages":"86-106"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9836035/pdf/nihms-1804098.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9099780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Transcription Factor-Centric Approach to Identify Non-Recurring Putative Regulatory Drivers in Cancer.","authors":"Jingkang Zhao, Vincentius Martin, Raluca Gordân","doi":"10.1007/978-3-031-04749-7_3","DOIUrl":"https://doi.org/10.1007/978-3-031-04749-7_3","url":null,"abstract":"<p><p>Recent efforts to sequence the genomes of thousands of matched normal-tumor samples have led to the identification of millions of somatic mutations, the majority of which are non-coding. Most of these mutations are believed to be passengers, but a small number of non-coding mutations could contribute to tumor initiation or progression, e.g. by leading to dysregulation of gene expression. Efforts to identify putative regulatory drivers rely primarily on information about the recurrence of mutations across tumor samples. However, in regulatory regions of the genome, individual mutations are rarely seen in more than one donor. Instead of using recurrence information, here we present a method to identify putative regulatory driver mutations based on the magnitude of their effects on transcription factor-DNA binding. For each gene, we integrate the effects of mutations across all its regulatory regions, and we ask whether these effects are larger than expected by chance, given the mutation spectra observed in regulatory DNA in the cohort of interest. We applied our approach to analyze mutations in a liver cancer data set with ample somatic mutation and gene expression data available. By combining the effects of mutations across all regulatory regions of each gene, we identified dozens of genes whose regulation in tumor cells is likely to be significantly perturbed by non-coding mutations. Overall, our results show that focusing on the functional effects of non-coding mutations, rather than their recurrence, has the potential to identify putative regulatory drivers and the genes they dysregulate in tumor cells.</p>","PeriodicalId":74675,"journal":{"name":"Research in computational molecular biology : ... Annual International Conference, RECOMB ... : proceedings. RECOMB (Conference : 2005- )","volume":"13278 ","pages":"36-51"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9740185/pdf/nihms-1855700.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10729957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
L. Lan, Si Xiuyang, Sun Lei, G. Peng, Li Yong Yong, Wang Xuezheng
{"title":"Identification and Expression Characteristic Analysis of CML Gene Family of Melon","authors":"L. Lan, Si Xiuyang, Sun Lei, G. Peng, Li Yong Yong, Wang Xuezheng","doi":"10.5376/cmb.2022.12.0005","DOIUrl":"https://doi.org/10.5376/cmb.2022.12.0005","url":null,"abstract":"","PeriodicalId":74675,"journal":{"name":"Research in computational molecular biology : ... Annual International Conference, RECOMB ... : proceedings. RECOMB (Conference : 2005- )","volume":"55 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80287190","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Research in Computational Molecular Biology: 26th Annual International Conference, RECOMB 2022, San Diego, CA, USA, May 22–25, 2022, Proceedings","authors":"","doi":"10.1007/978-3-031-04749-7","DOIUrl":"https://doi.org/10.1007/978-3-031-04749-7","url":null,"abstract":"","PeriodicalId":74675,"journal":{"name":"Research in computational molecular biology : ... Annual International Conference, RECOMB ... : proceedings. RECOMB (Conference : 2005- )","volume":"41 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82352633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}