Mara Thomas, Nuria Mackes, Asad Preuss-Dodhy, Thomas Wieland, Markus Bundschus
{"title":"Assessing Privacy Vulnerabilities in Genetic Data Sets: Scoping Review.","authors":"Mara Thomas, Nuria Mackes, Asad Preuss-Dodhy, Thomas Wieland, Markus Bundschus","doi":"10.2196/54332","DOIUrl":"https://doi.org/10.2196/54332","url":null,"abstract":"<p><strong>Background: </strong>Genetic data are widely considered inherently identifiable. However, genetic data sets come in many shapes and sizes, and the feasibility of privacy attacks depends on their specific content. Assessing the reidentification risk of genetic data is complex, yet there is a lack of guidelines or recommendations that support data processors in performing such an evaluation.</p><p><strong>Objective: </strong>This study aims to gain a comprehensive understanding of the privacy vulnerabilities of genetic data and create a summary that can guide data processors in assessing the privacy risk of genetic data sets.</p><p><strong>Methods: </strong>We conducted a 2-step search, in which we first identified 21 reviews published between 2017 and 2023 on the topic of genomic privacy and then analyzed all references cited in the reviews (n=1645) to identify 42 unique original research studies that demonstrate a privacy attack on genetic data. We then evaluated the type and components of genetic data exploited for these attacks as well as the effort and resources needed for their implementation and their probability of success.</p><p><strong>Results: </strong>From our literature review, we derived 9 nonmutually exclusive features of genetic data that are both inherent to any genetic data set and informative about privacy risk: biological modality, experimental assay, data format or level of processing, germline versus somatic variation content, content of single nucleotide polymorphisms, short tandem repeats, aggregated sample measures, structural variants, and rare single nucleotide variants.</p><p><strong>Conclusions: </strong>On the basis of our literature review, the evaluation of these 9 features covers the great majority of privacy-critical aspects of genetic data and thus provides a foundation and guidance for assessing genetic data risk.</p>","PeriodicalId":73552,"journal":{"name":"JMIR bioinformatics and biotechnology","volume":"5 ","pages":"e54332"},"PeriodicalIF":0.0,"publicationDate":"2024-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11165293/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141473269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Huong Thi Thu Bui, Quỳnh Nguyễn Thị Phương, Ho Cam Tu, Sinh Nguyen Phuong, Thuy Thi Pham, Thu Vu, Huyen Nguyen Thi Thu, Lam Khanh Ho, Dung Nguyen Tien
{"title":"The Roles of NOTCH3 p.R544C and Thrombophilia Genes in Vietnamese Patients With Ischemic Stroke: Study Involving a Hierarchical Cluster Analysis","authors":"Huong Thi Thu Bui, Quỳnh Nguyễn Thị Phương, Ho Cam Tu, Sinh Nguyen Phuong, Thuy Thi Pham, Thu Vu, Huyen Nguyen Thi Thu, Lam Khanh Ho, Dung Nguyen Tien","doi":"10.2196/56884","DOIUrl":"https://doi.org/10.2196/56884","url":null,"abstract":"\u0000 \u0000 The etiology of ischemic stroke is multifactorial. Several gene mutations have been identified as leading causes of cerebral autosomal dominant arteriopathy with subcortical infarcts and leukoencephalopathy (CADASIL), a hereditary disease that causes stroke and other neurological symptoms.\u0000 \u0000 \u0000 \u0000 We aimed to identify the variants of NOTCH3 and thrombophilia genes, and their complex interactions with other factors.\u0000 \u0000 \u0000 \u0000 We conducted a hierarchical cluster analysis (HCA) on the data of 100 patients diagnosed with ischemic stroke. The variants of NOTCH3 and thrombophilia genes were identified by polymerase chain reaction with confronting 2-pair primers and real-time polymerase chain reaction. The overall preclinical characteristics, cumulative cutpoint values, and factors associated with these somatic mutations were analyzed in unidimensional and multidimensional scaling models.\u0000 \u0000 \u0000 \u0000 We identified the following optimal cutpoints: creatinine, 83.67 (SD 9.19) µmol/L; age, 54 (SD 5) years; prothrombin (PT) time, 13.25 (SD 0.17) seconds; and international normalized ratio (INR), 1.02 (SD 0.03). Using the Nagelkerke method, cutpoint 50% values of the Glasgow Coma Scale score; modified Rankin scale score; and National Institutes of Health Stroke Scale scores at admission, after 24 hours, and at discharge were 12.77, 2.86 (SD 1.21), 9.83 (SD 2.85), 7.29 (SD 2.04), and 6.85 (SD 2.90), respectively.\u0000 \u0000 \u0000 \u0000 The variants of MTHFR (C677T and A1298C) and NOTCH3 p.R544C may influence the stroke severity under specific conditions of PT, creatinine, INR, and BMI, with risk ratios of 4.8 (95% CI 1.53-15.04) and 3.13 (95% CI 1.60-6.11), respectively (Pfisher<.05). It is interesting that although there are many genes linked to increased atrial fibrillation risk, not all of them are associated with ischemic stroke risk. With the detection of stroke risk loci, more information can be gained on their impacts and interconnections, especially in young patients.\u0000","PeriodicalId":73552,"journal":{"name":"JMIR bioinformatics and biotechnology","volume":"95 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141002322","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ChatGPT and Medicine: Together We Embrace the AI Renaissance","authors":"Sean Hacking","doi":"10.2196/52700","DOIUrl":"https://doi.org/10.2196/52700","url":null,"abstract":"The generative artificial intelligence (AI) model ChatGPT holds transformative prospects in medicine. The development of such models has signaled the beginning of a new era where complex biological data can be made more accessible and interpretable. ChatGPT is a natural language processing tool that can process, interpret, and summarize vast data sets. It can serve as a digital assistant for physicians and researchers, aiding in integrating medical imaging data with other multiomics data and facilitating the understanding of complex biological systems. The physician’s and AI’s viewpoints emphasize the value of such AI models in medicine, providing tangible examples of how this could enhance patient care. The editorial also discusses the rise of generative AI, highlighting its substantial impact in democratizing AI applications for modern medicine. While AI may not supersede health care professionals, practitioners incorporating AI into their practices could potentially have a competitive edge.","PeriodicalId":73552,"journal":{"name":"JMIR bioinformatics and biotechnology","volume":"32 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141003645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Noah Han, Rachel A Paul, Tanya Bardakjian, Daniel Kargilis, Angela R Bradbury, Alice Chen-Plotkin, Thomas F Tropea
{"title":"User and Usability Testing of a Web-Based Genetics Education Tool for Parkinson Disease: Mixed Methods Study.","authors":"Noah Han, Rachel A Paul, Tanya Bardakjian, Daniel Kargilis, Angela R Bradbury, Alice Chen-Plotkin, Thomas F Tropea","doi":"10.2196/45370","DOIUrl":"10.2196/45370","url":null,"abstract":"<p><strong>Background: </strong>Genetic testing is essential to identify research participants for clinical trials enrolling people with Parkinson disease (PD) carrying a variant in the glucocerebrosidase (GBA) or leucine-rich repeat kinase 2 (LRRK2) genes. The limited availability of professionals trained in neurogenetics or genetic counseling is a major barrier to increased testing. Telehealth solutions to increase access to genetics education can help address issues around counselor availability and offer options to patients and family members.</p><p><strong>Objective: </strong>As an alternative to pretest genetic counseling, we developed a web-based genetics education tool focused on GBA and LRRK2 testing for PD called the Interactive Multimedia Approach to Genetic Counseling to Inform and Educate in Parkinson's Disease (IMAGINE-PD) and conducted user testing and usability testing. The objective was to conduct user and usability testing to obtain stakeholder feedback to improve IMAGINE-PD.</p><p><strong>Methods: </strong>Genetic counselors and PD and neurogenetics subject matter experts developed content for IMAGINE-PD specifically focused on GBA and LRRK2 genetic testing. Structured interviews were conducted with 11 movement disorder specialists and 13 patients with PD to evaluate the content of IMAGINE-PD in user testing and with 12 patients with PD to evaluate the usability of a high-fidelity prototype according to the US Department of Health and Human Services Research-Based Web Design & Usability Guidelines. Qualitative data analysis informed changes to create a final version of IMAGINE-PD.</p><p><strong>Results: </strong>Qualitative data were reviewed by 3 evaluators. Themes were identified from feedback data of movement disorder specialists and patients with PD in user testing in 3 areas: content such as the topics covered, function such as website navigation, and appearance such as pictures and colors. Similarly, qualitative analysis of usability testing feedback identified additional themes in these 3 areas. Key points of feedback were determined by consensus among reviewers considering the importance of the comment and the frequency of similar comments. Refinements were made to IMAGINE-PD based on consensus recommendations by evaluators within each theme at both user testing and usability testing phases to create a final version of IMAGINE-PD.</p><p><strong>Conclusions: </strong>User testing for content review and usability testing have informed refinements to IMAGINE-PD to develop this focused, genetics education tool for GBA and LRRK2 testing. Comparison of this stakeholder-informed intervention to standard telegenetic counseling approaches is ongoing.</p>","PeriodicalId":73552,"journal":{"name":"JMIR bioinformatics and biotechnology","volume":"1 1","pages":"e45370"},"PeriodicalIF":0.0,"publicationDate":"2023-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11135229/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42812530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
H. Ahmadzia, Alexa C Dzienny, Mike Bopf, Jaclyn M Phillips, Jerome Jeffrey Federspiel, Richard Amdur, Madeline Murguia Rice, Laritza Rodriguez
{"title":"Machine Learning for Prediction of Maternal Hemorrhage and Transfusion (Preprint)","authors":"H. Ahmadzia, Alexa C Dzienny, Mike Bopf, Jaclyn M Phillips, Jerome Jeffrey Federspiel, Richard Amdur, Madeline Murguia Rice, Laritza Rodriguez","doi":"10.2196/52059","DOIUrl":"https://doi.org/10.2196/52059","url":null,"abstract":"Objectives: To improve PPH prediction and to compare machine learning and traditional statistical methods. Design: Cross-sectional Setting: Deliveries across US hospitals Population: Deliveries across 12 US hospitals from the 2002-2008 Consortium for Safe Labor dataset Method: We developed models using the Consortium for Safe Labor dataset. Fifty antepartum and intrapartum characteristics and hospital characteristics were included. Logistic regression, support vector machines, multi-layer perceptron, random forest","PeriodicalId":73552,"journal":{"name":"JMIR bioinformatics and biotechnology","volume":"29 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139349546","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Secure Comparisons of Single Nucleotide Polymorphisms Using Secure Multiparty Computation: Method Development.","authors":"Andrew Woods, Skyler T Kramer, Dong Xu, Wei Jiang","doi":"10.2196/44700","DOIUrl":"10.2196/44700","url":null,"abstract":"<p><strong>Background: </strong>While genomic variations can provide valuable information for health care and ancestry, the privacy of individual genomic data must be protected. Thus, a secure environment is desirable for a human DNA database such that the total data are queryable but not directly accessible to involved parties (eg, data hosts and hospitals) and that the query results are learned only by the user or authorized party.</p><p><strong>Objective: </strong>In this study, we provide efficient and secure computations on panels of single nucleotide polymorphisms (SNPs) from genomic sequences as computed under the following set operations: union, intersection, set difference, and symmetric difference.</p><p><strong>Methods: </strong>Using these operations, we can compute similarity metrics, such as the Jaccard similarity, which could allow querying a DNA database to find the same person and genetic relatives securely. We analyzed various security paradigms and show metrics for the protocols under several security assumptions, such as semihonest, malicious with honest majority, and malicious with a malicious majority.</p><p><strong>Results: </strong>We show that our methods can be used practically on realistically sized data. Specifically, we can compute the Jaccard similarity of two genomes when considering sets of SNPs, each with 400,000 SNPs, in 2.16 seconds with the assumption of a malicious adversary in an honest majority and 0.36 seconds under a semihonest model.</p><p><strong>Conclusions: </strong>Our methods may help adopt trusted environments for hosting individual genomic data with end-to-end data security.</p>","PeriodicalId":73552,"journal":{"name":"JMIR bioinformatics and biotechnology","volume":" ","pages":"e44700"},"PeriodicalIF":0.0,"publicationDate":"2023-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11135223/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49648411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Mutations of SARS-CoV-2 Structural Proteins in the Alpha, Beta, Gamma, and Delta Variants: Bioinformatics Analysis.","authors":"Saima Rehman Khetran, Roma Mustafa","doi":"10.2196/43906","DOIUrl":"10.2196/43906","url":null,"abstract":"<p><strong>Background: </strong>COVID-19 and Middle East Respiratory Syndrome are two pandemic respiratory diseases caused by coronavirus species. The novel disease COVID-19 caused by SARS-CoV-2 was first reported in Wuhan, Hubei Province, China, in December 2019, and became a pandemic within 2-3 months, affecting social and economic platforms worldwide. Despite the rapid development of vaccines, there have been obstacles to their distribution, including a lack of fundamental resources, poor immunization, and manual vaccine replication. Several variants of the original Wuhan strain have emerged in the last 3 years, which can pose a further challenge for control and vaccine development.</p><p><strong>Objective: </strong>The aim of this study was to comprehensively analyze mutations in SARS-CoV-2 variants of concern (VoCs) using a bioinformatics approach toward identifying novel mutations that may be helpful in developing new vaccines by targeting these sites.</p><p><strong>Methods: </strong>Reference sequences of the SARS-CoV-2 spike (YP_009724390) and nucleocapsid (YP_009724397) proteins were compared to retrieved sequences of isolates of four VoCs from 14 countries for mutational and evolutionary analyses. Multiple sequence alignment was performed and phylogenetic trees were constructed by the neighbor-joining method with 1000 bootstrap replicates using MEGA (version 6). Mutations in amino acid sequences were analyzed using the MultAlin online tool (version 5.4.1).</p><p><strong>Results: </strong>Among the four VoCs, a total of 143 nonsynonymous mutations and 8 deletions were identified in the spike and nucleocapsid proteins. Multiple sequence alignment and amino acid substitution analysis revealed new mutations, including G72W, M2101I, L139F, 209-211 deletion, G212S, P199L, P67S, I292T, and substitutions with unknown amino acid replacement, reported in Egypt (MW533289), the United Kingdom (MT906649), and other regions. The variants B.1.1.7 (Alpha variant) and B.1.617.2 (Delta variant), characterized by higher transmissibility and lethality, harbored the amino acid substitutions D614G, R203K, and G204R with higher prevalence rates in most sequences. Phylogenetic analysis among the novel SARS-CoV-2 variant proteins and some previously reported β-coronavirus proteins indicated that either the evolutionary clade was weakly supported or not supported at all by the β-coronavirus species.</p><p><strong>Conclusions: </strong>This study could contribute toward gaining a better understanding of the basic nature of SARS-CoV-2 and its four major variants. The numerous novel mutations detected could also provide a better understanding of VoCs and help in identifying suitable mutations for vaccine targets. Moreover, these data offer evidence for new types of mutations in VoCs, which will provide insight into the epidemiology of SARS-CoV-2.</p>","PeriodicalId":73552,"journal":{"name":"JMIR bioinformatics and biotechnology","volume":"4 ","pages":"e43906"},"PeriodicalIF":0.0,"publicationDate":"2023-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10353769/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9867153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Introducing JMIR Bioinformatics and Biotechnology: A Platform for Interdisciplinary Collaboration and Cutting-Edge Research.","authors":"Ece Dilber Gamsiz Uzun","doi":"10.2196/48631","DOIUrl":"10.2196/48631","url":null,"abstract":"<p><p>JMIR Bioinformatics and Biotechnology supports interdisciplinary research and welcomes contributions that push the boundaries of bioinformatics, genomics, artificial intelligence, and pathology informatics.</p>","PeriodicalId":73552,"journal":{"name":"JMIR bioinformatics and biotechnology","volume":" ","pages":"e48631"},"PeriodicalIF":0.0,"publicationDate":"2023-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11135224/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49364821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kritika M Garg, Vinita Lamba, Balaji Chattopadhyay
{"title":"Genomic Insights Into the Evolution and Demographic History of the SARS-CoV-2 Omicron Variant: Population Genomics Approach.","authors":"Kritika M Garg, Vinita Lamba, Balaji Chattopadhyay","doi":"10.2196/40673","DOIUrl":"10.2196/40673","url":null,"abstract":"<p><strong>Background: </strong>A thorough understanding of the patterns of genetic subdivision in a pathogen can provide crucial information that is necessary to prevent disease spread. For SARS-CoV-2, the availability of millions of genomes makes this task analytically challenging, and traditional methods for understanding genetic subdivision often fail.</p><p><strong>Objective: </strong>The aim of our study was to use population genomics methods to identify the subtle subdivisions and demographic history of the Omicron variant, in addition to those captured by the Pango lineage.</p><p><strong>Methods: </strong>We used a combination of an evolutionary network approach and multivariate statistical protocols to understand the subdivision and spread of the Omicron variant. We identified subdivisions within the BA.1 and BA.2 lineages and further identified the mutations associated with each cluster. We further characterized the overall genomic diversity of the Omicron variant and assessed the selection pressure for each of the genetic clusters identified.</p><p><strong>Results: </strong>We observed concordant results, using two different methods to understand genetic subdivision. The overall pattern of subdivision in the Omicron variant was in broad agreement with the Pango lineage definition. Further, 1 cluster of the BA.1 lineage and 3 clusters of the BA.2 lineage revealed statistically significant signatures of selection or demographic expansion (Tajima's D<-2), suggesting the role of microevolutionary processes in the spread of the virus.</p><p><strong>Conclusions: </strong>We provide an easy framework for assessing the genetic structure and demographic history of SARS-CoV-2, which can be particularly useful for understanding the local history of the virus. We identified important mutations that are advantageous to some lineages of Omicron and aid in the transmission of the virus. This is crucial information for policy makers, as preventive measures can be designed to mitigate further spread based on a holistic understanding of the variability of the virus and the evolutionary processes aiding its spread.</p>","PeriodicalId":73552,"journal":{"name":"JMIR bioinformatics and biotechnology","volume":"4 ","pages":"e40673"},"PeriodicalIF":0.0,"publicationDate":"2023-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10331448/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9815596","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Decision of the Optimal Rank of a Nonnegative Matrix Factorization Model for Gene Expression Data Sets Utilizing the Unit Invariant Knee Method: Development and Evaluation of the Elbow Method for Rank Selection.","authors":"Emine Guven","doi":"10.2196/43665","DOIUrl":"10.2196/43665","url":null,"abstract":"<p><strong>Background: </strong>There is a great need to develop a computational approach to analyze and exploit the information contained in gene expression data. The recent utilization of nonnegative matrix factorization (NMF) in computational biology has demonstrated the capability to derive essential details from a high amount of data in particular gene expression microarrays. A common problem in NMF is finding the proper number rank (r) of factors of the degraded demonstration, but no agreement exists on which technique is most appropriate to utilize for this purpose. Thus, various techniques have been suggested to select the optimal value of rank factorization (r).</p><p><strong>Objective: </strong>In this work, a new metric for rank selection is proposed based on the elbow method, which was methodically compared against the cophenetic metric.</p><p><strong>Methods: </strong>To decide the optimum number rank (r), this study focused on the unit invariant knee (UIK) method of the NMF on gene expression data sets. Since the UIK method requires an extremum distance estimator that is eventually employed for inflection and identification of a knee point, the proposed method finds the first inflection point of the curvature of the residual sum of squares of the proposed algorithms using the UIK method on gene expression data sets as a target matrix.</p><p><strong>Results: </strong>Computation was conducted for the UIK task using gene expression data of acute lymphoblastic leukemia and acute myeloid leukemia samples. Consequently, the distinct results of NMF were subjected to comparison on different algorithms. The proposed UIK method is easy to perform, fast, free of a priori rank value input, and does not require initial parameters that significantly influence the model's functionality.</p><p><strong>Conclusions: </strong>This study demonstrates that the elbow method provides a credible prediction for both gene expression data and for precisely estimating simulated mutational processes data with known dimensions. The proposed UIK method is faster than conventional methods, including metrics utilizing the consensus matrix as a criterion for rank selection, while achieving significantly better computational efficiency without visual inspection on the curvatives. Finally, the suggested rank tuning method based on the elbow method for gene expression data is arguably theoretically superior to the cophenetic measure.</p>","PeriodicalId":73552,"journal":{"name":"JMIR bioinformatics and biotechnology","volume":" ","pages":"e43665"},"PeriodicalIF":0.0,"publicationDate":"2023-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11135234/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48883023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}