Biodata MiningPub Date : 2025-06-13DOI: 10.1186/s13040-025-00456-7
Tue T Te, Alex A T Bui, Constance H Fung, Mary Regina Boland
{"title":"Geospatial analysis of short sleep duration and cognitive disability in US adults: a multi-state study using machine learning techniques.","authors":"Tue T Te, Alex A T Bui, Constance H Fung, Mary Regina Boland","doi":"10.1186/s13040-025-00456-7","DOIUrl":"10.1186/s13040-025-00456-7","url":null,"abstract":"<p><strong>Background: </strong>There is evidence of increased risk of cognitive disability due to short sleep duration and adverse Social Determinants of Health (SDoH). To determine whether spatial associations (correlation between spatially distributed variables within a given geographic area) exist between neighborhoods with short sleep duration and cognitive disability across the United States (US) after adjusting for other factors. We conducted a spatial analysis using a spatial lag model at the neighborhood-level with the census tract as unit-of-analysis within each state in the US. We aggregated our results nationally using a weighted analysis to adjust for the number of census tracts per state. This study used Centers for Disease Control and Prevention (CDC) data on short sleep duration, cognitive disability and other health factors. We used 2021-2022 neighborhood-level data from the CDC and US Census Bureau adjusting for social determinants of health (SDoH) and demographics, excluding Florida due to inconsistencies in data availability. Our exposure variable was self-reported short sleep defined by the CDC (\"sleep less than 7 hours per 24 hour period\"). Our outcome was self-reported cognitive disability defined by the CDC (\"difficulty concentrating, remembering, or making decision\"). We adjusted for other factors including 'health outcomes', 'preventive practices', and the CDC's Social Vulnerability Index.</p><p><strong>Results: </strong>The spatial analysis revealed a significant association between short sleep duration and an increased risk of cognitive disability across the US (estimate range [0.29; 1.27], p < 0.005) after adjustment. Notably, six Western states (New Mexico, Alaska, Arizona, Nevada, Idaho, and Oregon) were at increased risk of cognitive disability due to short sleep duration and this pattern was significant (p = 0.007).</p><p><strong>Conclusions: </strong>Our study highlights the importance of short sleep duration as a significant predictor of cognitive disability across the US after adjusting for other confounders. The association between short sleep and cognitive disability was especially strong in the Western region of the US providing a deeper understanding of how geographic context and local factors can shape health outcomes.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"41"},"PeriodicalIF":4.0,"publicationDate":"2025-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12166631/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144295129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Biodata MiningPub Date : 2025-06-12DOI: 10.1186/s13040-025-00455-8
Davide Chicco, Luca Oneto, Davide Cangelosi
{"title":"DBSCAN and DBCV application to open medical records heterogeneous data for identifying clinically significant clusters of patients with neuroblastoma.","authors":"Davide Chicco, Luca Oneto, Davide Cangelosi","doi":"10.1186/s13040-025-00455-8","DOIUrl":"10.1186/s13040-025-00455-8","url":null,"abstract":"<p><p>Neuroblastoma is a common pediatric cancer that affects thousands of infants worldwide, especially children under five years of age. Although recovery for patients with neuroblastoma is possible in 80% of cases, only 40% of those with high-risk stage four neuroblastoma survive. Electronic health records of patients with this disease contain valuable data on patients that can be analyzed using computational intelligence and statistical software by biomedical informatics researchers. Unsupervised machine learning methods, in particular, can identify clinically significant subgroups of patients, which can lead to new therapies or medical treatments for future patients belonging to the same subgroups. However, access to these datasets is often restricted, making it difficult to obtain them for independent research projects. In this study, we retrieved three open datasets containing data from patients diagnosed with neuroblastoma: the Genoa dataset and the Shanghai dataset from the Neuroblastoma Electronic Health Records Open Data Repository, and a dataset from the TARGET-NBL renowned program. We analyzed these datasets using several clustering techniques and measured the results with the DBCV (Density-Based Clustering Validation) index. Among these algorithms, DBSCAN (Density-Based Spatial Clustering of Applications with Noise) was the only one that produced meaningful results. We scrutinized the two clusters of patients' profiles identified by DBSCAN in the three datasets and recognized several relevant clinical variables that clearly partitioned the patients into the two clusters that have clinical meaning in the neuroblastoma literature. Our results can have a significant impact on health informatics, because any computational analyst wishing to cluster small data of patients of a rare disease can choose to use DBSCAN and DBCV rather than utilizing more common methods such as k-Means and Silhouette coefficient.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"40"},"PeriodicalIF":4.0,"publicationDate":"2025-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12164137/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144286933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Biodata MiningPub Date : 2025-06-11DOI: 10.1186/s13040-025-00454-9
David Vidmar, Jessica De Freitas, Will Thompson, John M Pfeifer, Brandon K Fornwalt, Noah Zimmerman, Riccardo Miotto, Ruijun Chen
{"title":"A probabilistic approach for building disease phenotypes across electronic health records.","authors":"David Vidmar, Jessica De Freitas, Will Thompson, John M Pfeifer, Brandon K Fornwalt, Noah Zimmerman, Riccardo Miotto, Ruijun Chen","doi":"10.1186/s13040-025-00454-9","DOIUrl":"10.1186/s13040-025-00454-9","url":null,"abstract":"","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"39"},"PeriodicalIF":4.0,"publicationDate":"2025-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12153169/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144276400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Biodata MiningPub Date : 2025-06-05DOI: 10.1186/s13040-025-00453-w
Tom Tubbesing, Andreas Schlüter, Alexander Sczyrba
{"title":"subMG automates data submission for metagenomics studies.","authors":"Tom Tubbesing, Andreas Schlüter, Alexander Sczyrba","doi":"10.1186/s13040-025-00453-w","DOIUrl":"10.1186/s13040-025-00453-w","url":null,"abstract":"<p><strong>Background: </strong>Publicly available metagenomics datasets are crucial for ensuring the reproducibility of scientific findings and supporting contemporary large-scale studies. However, submitting a comprehensive metagenomics dataset is both cumbersome and time-consuming. It requires including sample information, sequencing reads, assemblies, binned contigs, metagenome-assembled genomes (MAGs), and appropriate metadata. As a result, metagenomics studies are often published with incomplete datasets or, in some cases, without any data at all. subMG addresses this challenge by simplifying and automating the data submission process, thereby encouraging broader and more consistent data sharing.</p><p><strong>Results: </strong>subMG streamlines the process of submitting metagenomics study results to the European Nucleotide Archive (ENA) by allowing researchers to input files and metadata from their studies in a single form and automating downstream tasks that otherwise require extensive manual effort and expertise. The tool comes with comprehensive documentation as well as example data tailored for different use cases and can be operated via the command-line or a graphical user interface (GUI), making it easily deployable to a wide range of potential users.</p><p><strong>Conclusions: </strong>By simplifying the submission of genome-resolved metagenomics study datasets, subMG significantly reduces the time, effort, and expertise required from researchers, thus paving the way for more numerous and comprehensive data submissions in the future. An increased availability of well-documented and FAIR data can benefit future research, particularly in meta-analyses and comparative studies.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"38"},"PeriodicalIF":4.0,"publicationDate":"2025-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12142852/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144235707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Biodata MiningPub Date : 2025-05-27DOI: 10.1186/s13040-025-00452-x
Rachit Kumar, Joseph D Romano, Marylyn D Ritchie
{"title":"Network-based analyses of multiomics data in biomedicine.","authors":"Rachit Kumar, Joseph D Romano, Marylyn D Ritchie","doi":"10.1186/s13040-025-00452-x","DOIUrl":"10.1186/s13040-025-00452-x","url":null,"abstract":"<p><p>Network representations of data are designed to encode relationships between concepts as sets of edges between nodes. Human biology is inherently complex and is represented by data that often exists in a hierarchical nature. One canonical example is the relationship that exists within and between various -omics datasets, including genomics, transcriptomics, and proteomics, among others. Encoding such data in a network-based or graph-based representation allows the explicit incorporation of such relationships into various biomedical big data tasks, including (but not limited to) disease subtyping, interaction prediction, biomarker identification, and patient classification. This review will present various existing approaches in using network representations and analysis of data in multiomics in the framework of deep learning and machine learning approaches, subdivided into supervised and unsupervised approaches, to identify benefits and drawbacks of various approaches as well as the possible next steps for the field.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"37"},"PeriodicalIF":6.1,"publicationDate":"2025-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12117783/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144161878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Biodata MiningPub Date : 2025-05-22DOI: 10.1186/s13040-025-00451-y
Suruthy Sivanathan, Ting Hu
{"title":"Correction: Learning the therapeutic targets of acute myeloid leukemia through multiscale human interactome network and community analysis.","authors":"Suruthy Sivanathan, Ting Hu","doi":"10.1186/s13040-025-00451-y","DOIUrl":"10.1186/s13040-025-00451-y","url":null,"abstract":"","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"36"},"PeriodicalIF":4.0,"publicationDate":"2025-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12096567/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144127755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Biodata MiningPub Date : 2025-05-13DOI: 10.1186/s13040-025-00450-z
Berit Hunsdieck, Christian Bender, Katja Ickstadt, Johanna Mielke
{"title":"Joint models in big data: simulation-based guidelines for required data quality in longitudinal electronic health records.","authors":"Berit Hunsdieck, Christian Bender, Katja Ickstadt, Johanna Mielke","doi":"10.1186/s13040-025-00450-z","DOIUrl":"10.1186/s13040-025-00450-z","url":null,"abstract":"<p><strong>Background: </strong>Over the past decade an increase in usage of electronic health data (EHR) by office-based physicians and hospitals has been reported. However, these data types come with challenge regarding completeness and data quality and it is, especially for more complex models, unclear how these characteristics influence the performance.</p><p><strong>Methods: </strong>In this paper, we focus on joint models which combines longitudinal modelling with survival modelling to incorporate all available information. The aim of this paper is to establish simulation-based guidelines for the necessary quality of longitudinal EHR data so that joint models perform better than cox models. We conducted an extensive simulation study by systematically and transparently varying different characteristics of data quality, e.g., measurement frequency, noise, and heterogeneity between patients. We apply the joint models and evaluate their performance relative to traditional Cox survival modelling techniques.</p><p><strong>Results: </strong>Key findings suggest that biomarker changes before disease onset must be consistent within similar patient groups. With increasing noise and a higher measurement density, the joint model surpasses the traditional Cox regression model in terms of model performance. We illustrate the usefulness and limitations of the guidelines with two real-world examples, namely the influence of serum bilirubin on primary biliary liver cirrhosis and the influence of the estimated glomerular filtration rate on chronic kidney disease.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"35"},"PeriodicalIF":4.0,"publicationDate":"2025-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12070788/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143993927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Biodata MiningPub Date : 2025-05-12DOI: 10.1186/s13040-025-00449-6
Ya-Ting Liang, Charlotte Wang
{"title":"Correction: Motif clustering and digital biomarker extraction for free-living physical activity analysis.","authors":"Ya-Ting Liang, Charlotte Wang","doi":"10.1186/s13040-025-00449-6","DOIUrl":"10.1186/s13040-025-00449-6","url":null,"abstract":"","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"34"},"PeriodicalIF":4.0,"publicationDate":"2025-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12067653/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144008381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Biodata MiningPub Date : 2025-05-06DOI: 10.1186/s13040-025-00447-8
Sulaiman Mohammed Alnasser
{"title":"Revisiting the approaches to DNA damage detection in genetic toxicology: insights and regulatory implications.","authors":"Sulaiman Mohammed Alnasser","doi":"10.1186/s13040-025-00447-8","DOIUrl":"https://doi.org/10.1186/s13040-025-00447-8","url":null,"abstract":"<p><p>Genetic toxicology is crucial for evaluating the potential risks of chemicals and drugs to human health and the environment. The emergence of high-throughput technologies has transformed this field, providing more efficient, cost-effective, and ethically sound methods for genotoxicity testing. It utilizes advanced screening techniques, including automated in vitro assays and computational models to rapidly assess the genotoxic potential of thousands of compounds simultaneously. This review explores the transformation of traditional in vitro and in vivo methods into computational models for genotoxicity assessment. By leveraging advances in machine learning, artificial intelligence, and high-throughput screening, computational approaches are increasingly replacing conventional methods. Coupling conventional screening with artificial intelligence (AI) and machine learning (ML) models has significantly enhanced their predictive capabilities, enabling the identification of genotoxicity signatures tied to molecular structures and biological pathways. Regulatory agencies increasingly support such methodologies as humane alternatives to traditional animal models, provided they are validated and exhibit strong predictive power. Standardization efforts, including the establishment of common endpoints across testing approaches, are pivotal for enhancing comparability and fostering consensus in toxicological assessments. Initiatives like ToxCast exemplify the successful incorporation of HTS data into regulatory decision-making, demonstrating that well-interpreted in vitro results can align with in vivo outcomes. Innovations in testing methodologies, global data sharing, and real-time monitoring continue to refine the precision and personalization of risk assessments, promising a transformative impact on safety evaluations and regulatory frameworks.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"33"},"PeriodicalIF":4.0,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12054138/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144051469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Biodata MiningPub Date : 2025-05-02DOI: 10.1186/s13040-025-00444-x
Suruthy Sivanathan, Ting Hu
{"title":"Learning the therapeutic targets of acute myeloid leukemia through multiscale human interactome network and community analysis.","authors":"Suruthy Sivanathan, Ting Hu","doi":"10.1186/s13040-025-00444-x","DOIUrl":"10.1186/s13040-025-00444-x","url":null,"abstract":"<p><p>Acute myeloid leukemia (AML) is caused by proliferation of mutated myeloid progenitor cells. The standard chemotherapy regimen does not efficiently cause remission as there is a high relapse rate. Resistance acquired by leukemic stem cells is suggested to be one of the root causes of relapse. Therefore, there is an urgency to develop new drugs for therapy. Repurposing approved drugs for AML can provide a cost-friendly, time-efficient, and affordable alternative. The multiscale interactome network is a computational tool that can identify potential therapeutic candidates by comparing mechanisms of the drug and disease. Communities that could be potentially experimentally validated are detected in the multiscale interactome network using the algorithm CRank. The results are evaluated through literature search and Gene Ontology (GO) enrichment analysis. In this research, we identify therapeutic candidates for AML and their mechanisms from the interactome, and isolate prioritized communities that are dominant in the therapeutic mechanism that could potentially be used as a prompt for pre-clinical/translational research (e.g. bioinformatics, laboratory research) to focus on biological functions and mechanisms that are associated with the disease and drug. This method may allow for an efficient and accelerated discovery of potential candidates for AML, a rapidly progressing disease.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"32"},"PeriodicalIF":4.0,"publicationDate":"2025-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12049071/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144052657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}