Biodata MiningPub Date : 2025-08-01DOI: 10.1186/s13040-025-00468-3
Petr Ryšavý, Alikhan Anuarbekov, Michaela Dostálová Merkerová, Jiří Kléma
{"title":"circGPAcorr: an integrative tool for functional annotation of circular RNAs using expression data.","authors":"Petr Ryšavý, Alikhan Anuarbekov, Michaela Dostálová Merkerová, Jiří Kléma","doi":"10.1186/s13040-025-00468-3","DOIUrl":"10.1186/s13040-025-00468-3","url":null,"abstract":"<p><p>Circular RNAs play a crucial role in cell development and serve as biomarkers in many diseases. Nevertheless, the function of many circular RNAs remains unknown. This function can be inferred from sponging and silencing interactions with micro RNAs and messenger RNAs. We recently proposed a network-based circRNA functional annotation tool, circGPA. However, validation data for RNA interactions are often sparse and predicted interactions contain many false positives. To address this issue, we propose an extended algorithm named circGPAcorr, which uses expression data to weight the interactions, resulting in more precise functional annotation. To assess the significance of the results, the p-value is calculated using reduction to circGPA, a generating-polynomial-based method. We show that the problem is #P-hard, and thus computationally difficult. The circGPAcorr algorithm is tested on publicly available myelodysplastic syndromes expression data, providing gene ontology annotations that align with the literature on myelodysplastic syndromes. At the same time, we demonstrate its performance in the circRNA-disease annotation task.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"50"},"PeriodicalIF":6.1,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12317645/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144765669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Biodata MiningPub Date : 2025-07-29DOI: 10.1186/s13040-025-00467-4
Muhammad Arshad, Chengliang Wang, Muhammad Wajeeh Us Sima, Jamshed Ali Shaikh, Hanen Karamti, Raed Alharthi, Julius Selecky
{"title":"BioAug-Net: a bioimage sensor-driven attention-augmented segmentation framework with physiological coupling for early prostate cancer detection in T2-weighted MRI.","authors":"Muhammad Arshad, Chengliang Wang, Muhammad Wajeeh Us Sima, Jamshed Ali Shaikh, Hanen Karamti, Raed Alharthi, Julius Selecky","doi":"10.1186/s13040-025-00467-4","DOIUrl":"10.1186/s13040-025-00467-4","url":null,"abstract":"<p><p>Accurate segmentation of the prostate peripheral zone (PZ) in T2-weighted MRI is critical for the early detection of prostate cancer. Existing segmentation methods are hindered by significant inter-observer variability (37.4 ± 5.6%), poor boundary localization, and the presence of motion artifacts, along with challenges in clinical integration. In this study, we propose BioAug-Net, a novel framework that integrates real-time physiological signal feedback with MRI data, leveraging transformer-based attention mechanisms and a probabilistic clinical decision support system (PCDSS). BioAug-Net features a dual-branch asymmetric attention mechanism: one branch processes spatial MRI features, while the other incorporates temporal sensor signals through a BiGRU-driven adaptive masking module. Additionally, a Markov Decision Process-based PCDSS maps segmentation outputs to clinical PI-RADS scores, with uncertainty quantification. We validated BioAug-Net on a multi-institutional dataset (n=1,542) and demonstrated state-of-the-art performance, achieving a Dice Similarity Coefficient of 89.7% (p < 0.001), sensitivity of 91.2% (p < 0.001), specificity of 88.4% (p < 0.001), and HD95 of 2.14 mm (p < 0.001), outperforming U-Net, Attention U-Net, and TransUNet. Sensor integration improved segmentation accuracy by 12.6% (p < 0.001) and reduced inter-observer variation by 48.3% (p < 0.001). Radiologist evaluations (n=3) confirmed a 15.0% reduction in diagnosis time (p = 0.003) and an increase in inter-reader agreement from K = 0.68 to K = 0.82 (p = 0.001). Our results show that BioAug-Net offers a clinically viable solution for early prostate cancer detection through enhanced physiological coupling and explainable AI diagnostics.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"49"},"PeriodicalIF":6.1,"publicationDate":"2025-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12309236/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144745615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Biodata MiningPub Date : 2025-07-24DOI: 10.1186/s13040-025-00463-8
Stefan Lenz, Arsenij Ustjanzew, Marco Jeray, Meike Ressing, Torsten Panholzer
{"title":"Can open source large language models be used for tumor documentation in Germany?-An evaluation on urological doctors' notes.","authors":"Stefan Lenz, Arsenij Ustjanzew, Marco Jeray, Meike Ressing, Torsten Panholzer","doi":"10.1186/s13040-025-00463-8","DOIUrl":"10.1186/s13040-025-00463-8","url":null,"abstract":"<p><strong>Background: </strong>Tumor documentation in Germany is currently a largely manual process. It involves reading the textual patient documentation and filling in forms in dedicated databases to obtain structured data. Advances in information extraction techniques that build on large language models (LLMs) could have the potential for enhancing the efficiency and reliability of this process. Evaluating LLMs in the German medical domain, especially their ability to interpret specialized language, is essential to determine their suitability for the use in clinical documentation. Due to data protection regulations, only locally deployed open source LLMs are generally suitable for this application.</p><p><strong>Methods: </strong>The evaluation employs eleven different open source LLMs with sizes ranging from 1 to 70 billion model parameters. Three basic tasks were selected as representative examples for the tumor documentation process: identifying tumor diagnoses, assigning ICD-10 codes, and extracting the date of first diagnosis. For evaluating the LLMs on these tasks, a dataset of annotated text snippets based on anonymized doctors' notes from urology was prepared. Different prompting strategies were used to investigate the effect of the number of examples in few-shot prompting and to explore the capabilities of the LLMs in general.</p><p><strong>Results: </strong>The models Llama 3.1 8B, Mistral 7B, and Mistral NeMo 12 B performed comparably well in the tasks. Models with less extensive training data or having fewer than 7 billion parameters showed notably lower performance, while larger models did not display performance gains. Examples from a different medical domain than urology could also improve the outcome in few-shot prompting, which demonstrates the ability of LLMs to handle tasks needed for tumor documentation.</p><p><strong>Conclusions: </strong>Open source LLMs show a strong potential for automating tumor documentation. Models from 7-12 billion parameters could offer an optimal balance between performance and resource efficiency. With tailored fine-tuning and well-designed prompting, these models might become important tools for clinical documentation in the future. The code for the evaluation is available from https://github.com/stefan-m-lenz/UroLlmEval . We also release the data set under https://huggingface.co/datasets/stefan-m-lenz/UroLlmEvalSet providing a valuable resource that addresses the shortage of authentic and easily accessible benchmarks in German-language medical NLP.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"48"},"PeriodicalIF":6.1,"publicationDate":"2025-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12291363/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144709599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Biodata MiningPub Date : 2025-07-11DOI: 10.1186/s13040-025-00461-w
Mohamed Mustaf Ahmed, Olalekan John Okesanya, Majd Oweidat, Zhinya Kawa Othman, Shuaibu Saidu Musa, Don Eliseo Lucero-Prisno Iii
{"title":"The ethics of data mining in healthcare: challenges, frameworks, and future directions.","authors":"Mohamed Mustaf Ahmed, Olalekan John Okesanya, Majd Oweidat, Zhinya Kawa Othman, Shuaibu Saidu Musa, Don Eliseo Lucero-Prisno Iii","doi":"10.1186/s13040-025-00461-w","DOIUrl":"10.1186/s13040-025-00461-w","url":null,"abstract":"<p><p>Data mining in healthcare offers transformative insights yet surfaces multilayered ethical and governance challenges that extend beyond privacy alone. Privacy and consent concerns remain paramount when handling sensitive medical data, particularly as healthcare organizations increasingly share patient information with large digital platforms. The risks of data breaches and unauthorized access are stark: 725 reportable incidents in 2023 alone exposed more than 133 million patient records, and hacking-related breaches surged by 239% since 2018. Algorithmic bias further threatens equity; models trained on historically prejudiced data can reinforce health disparities across protected groups. Therefore, transparency must span three levels-dataset documentation, model interpretability, and post-deployment audit logging-to make algorithmic reasoning and failures traceable. Security vulnerabilities in the Internet of Medical Things (IoMT) and cloud-based health platforms amplify these risks, while corporate data-sharing deals complicate questions of data ownership and patient autonomy. A comprehensive response requires (i) dataset-level artifacts such as \"datasheets,\" (ii) model-cards that disclose fairness metrics, and (iii) continuous logging of predictions and LIME/SHAP explanations for independent audits. Technical safeguards must blend differential privacy (with empirically validated noise budgets), homomorphic encryption for high-value queries, and federated learning to maintain the locality of raw data. Governance frameworks must also mandate routine bias and robust audits and harmonized penalties for non-compliance. Regular reassessments, thorough documentation, and active engagement with clinicians, patients, and regulators are critical to accountability. This paper synthesizes current evidence, from a 2019 European re-identification study demonstrating 99.98% uniqueness with 15 quasi-identifiers to recent clinical audits that trimmed false-negative rates via threshold recalibration, and proposes an integrated set of fairness, privacy, and security controls aligned with SPIRIT-AI, CONSORT-AI, and emerging PROBAST-AI guidelines. Implementing these solutions will help healthcare systems harness the benefits of data mining while safeguarding patient rights and sustaining public trust.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"47"},"PeriodicalIF":4.0,"publicationDate":"2025-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12255135/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144620971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Biodata MiningPub Date : 2025-07-01DOI: 10.1186/s13040-025-00462-9
Jason H Moore, Nicholas Tatonetti
{"title":"Vibe coding: a new paradigm for biomedical software development.","authors":"Jason H Moore, Nicholas Tatonetti","doi":"10.1186/s13040-025-00462-9","DOIUrl":"10.1186/s13040-025-00462-9","url":null,"abstract":"","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"46"},"PeriodicalIF":4.0,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12217882/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144545739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Biodata MiningPub Date : 2025-06-20DOI: 10.1186/s13040-025-00460-x
Namareq Widatalla, Sona Al Younis, Ahsan Khandoker
{"title":"Heart rate transition patterns reveal autonomic dysfunction in heart failure with renal function decline: a symbolic and Markov model approach.","authors":"Namareq Widatalla, Sona Al Younis, Ahsan Khandoker","doi":"10.1186/s13040-025-00460-x","DOIUrl":"10.1186/s13040-025-00460-x","url":null,"abstract":"<p><p>Around half of heart failure (HF) patients develop chronic kidney disease (CKD) and early detection of renal impairment in HF remains a clinical challenge. Both HF and CKD are characterized by autonomic dysfunction, suggesting that early identification of autonomic dysregulation may assist in early diagnosis and intervention. Conventional heart rate variability (HRV) metrics serve as non-invasive markers of autonomic nervous system (ANS) function; however, they are limited in their ability to capture directional and nonlinear dynamics associated with autonomic impairment during renal function decline. In this study, we digitized heart rate (HR) changes from 5-minute electrocardiogram (ECG) recordings in 358 patients with chronic HF (CHF). We applied a first-order Markov model and motif pattern analyses to compare HR transition dynamics between patients with normal and reduced estimated glomerular filtration rate (eGFR). The results revealed decreased monotonic HR transitions and increased tonic fluctuations in patients with reduced eGFR. Building on these findings, we introduced a transition stability index (TSI), which was significantly lower in patients with reduced eGFR compared to those with normal eGFR (p < 0.05). These results suggest that TSI may serve as a novel indicator of autonomic dysfunction associated with renal decline. Motif analysis further supported these findings by identifying distinctive HR transition patterns in patients with low eGFR.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"45"},"PeriodicalIF":4.0,"publicationDate":"2025-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12180264/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144337166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Biodata MiningPub Date : 2025-06-19DOI: 10.1186/s13040-025-00459-4
Yasaman Fatapour, James P Brody
{"title":"A compact encoding of the genome suitable for machine learning prediction of traits and genetic risk scores.","authors":"Yasaman Fatapour, James P Brody","doi":"10.1186/s13040-025-00459-4","DOIUrl":"10.1186/s13040-025-00459-4","url":null,"abstract":"<p><p>Genotype to phenotype prediction is a central problem in biology and medicine. Machine learning is a natural tool to address this problem. However, a person's genotype is usually represented by a few million single-nucleotide polymorphisms and most datasets only have a few thousand patients. Thus, this problem typically has many more predictors than the number of samples (patients), making it unsuitable for machine learning. The objective of this paper is to examine the efficacy of a compact genotype representation, which employs a limited number of predictors, in predicting a person's phenotype through the application of machine learning. We characterized a person's genotype using chromosome-scale length variation, a measure that is computed as the average value of reported log R ratios across a portion of a chromosome. We computed these numbers from data collected by the NIH All of Us program. We used the AutoML function (h2o.ai) in binary classification mode to identify the best models to differentiate between male/female, Black/white, white/Asian, and Black/Asian. We also used the AutoML function in regression mode to predict the height of people based on their age and genotype. Our results showed that we could effectively classify a person, using only information from chromosomes 1-22, as Male/Female (AUC = 0.9988 ± 0.0001), White/Black (AUC = 0.970 ± 0.002), Asian/White (AUC = 0.877 ± 0.002), and Black/Asian (AUC = 0.966 ± 0.002). This approach also effectively predicted height. In conclusion, we have shown that this compact representation of a person's genotype, along with machine learning, can effectively predict a person's phenotype.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"44"},"PeriodicalIF":4.0,"publicationDate":"2025-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12180147/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144334213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Recent advances in deep learning for protein-protein interaction: a review.","authors":"Jiafu Cui, Siqi Yang, Litai Yi, Qilemuge Xi, Dezhi Yang, Yongchun Zuo","doi":"10.1186/s13040-025-00457-6","DOIUrl":"10.1186/s13040-025-00457-6","url":null,"abstract":"<p><p>Deep learning, a cornerstone of artificial intelligence, is driving rapid advancements in computational biology. Protein-protein interactions (PPIs) are fundamental regulators of biological functions. With the inclusion of deep learning in PPI research, the field is undergoing transformative changes. Therefore, there is an urgent need for a comprehensive review and assessment of recent developments to improve analytical methods and open up a wider range of biomedical applications. This review meticulously assesses deep learning progress in PPI prediction from 2021 to 2025. We evaluate core architectures (GNNs, CNNs, RNNs) and pioneering approaches-attention-driven Transformers, multi-task frameworks, multimodal integration of sequence and structural data, transfer learning via BERT and ESM, and autoencoders for interaction characterization. Moreover, we examined enhanced algorithms for dealing with data imbalances, variations, and high-dimensional feature sparsity, as well as industry challenges (including shifting protein interactions, interactions with non-model organisms, and rare or unannotated protein interactions), and offered perspectives on the future of the field. In summary, this review systematically summarizes the latest advances and existing challenges in deep learning in the field of protein interaction analysis, providing a valuable reference for researchers in the fields of computational biology and deep learning.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"43"},"PeriodicalIF":4.0,"publicationDate":"2025-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12168265/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144310649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Biodata MiningPub Date : 2025-06-14DOI: 10.1186/s13040-025-00458-5
Xiongbin Gui, Hanlin Lv, Xiao Wang, Longting Lv, Yi Xiao, Lei Wang
{"title":"Enhancing hepatopathy clinical trial efficiency: a secure, large language model-powered pre-screening pipeline.","authors":"Xiongbin Gui, Hanlin Lv, Xiao Wang, Longting Lv, Yi Xiao, Lei Wang","doi":"10.1186/s13040-025-00458-5","DOIUrl":"10.1186/s13040-025-00458-5","url":null,"abstract":"<p><strong>Background: </strong>Recruitment for cohorts involving complex liver diseases, such as hepatocellular carcinoma and liver cirrhosis, often requires interpreting semantically complex criteria. Traditional manual screening methods are time-consuming and prone to errors. While AI-powered pre-screening offers potential solutions, challenges remain regarding accuracy, efficiency, and data privacy.</p><p><strong>Methods: </strong>We developed a novel patient pre-screening pipeline that leverages clinical expertise to guide the precise, safe, and efficient application of large language models. The pipeline breaks down complex criteria into a series of composite questions and then employs two strategies to perform semantic question-answering through electronic health records: (1) Pathway A, Anthropomorphized Experts' Chain of Thought strategy; and (2) Pathway B, Preset Stances within an Agent Collaboration strategy, particularly in managing complex clinical reasoning scenarios. The pipeline is evaluated on key metrics including precision, recall, time consumption, and counterfactual inference-at both the question and criterion levels.</p><p><strong>Results: </strong>Our pipeline achieved a notable balance of high precision (e.g., 0.921, criteria level) and good overall recall (e.g., ~ 0.82, criteria level), alongside high efficiency (0.44s per task). Pathway B excelled in high-precision complex reasoning (while exhibiting a specific recall profile conducive to accuracy), whereas Pathway A was particularly effective for tasks requiring both robust precision and recall (e.g., direct data extraction), often with faster processing times. Both pathways achieved comparable overall precision while offering different strengths in the precision-recall trade-off. The pipeline showed promising precision-focused results in hepatocellular carcinoma (0.878) and cirrhosis trials (0.843).</p><p><strong>Conclusions: </strong>This data-secure and time-efficient pipeline shows high precision and achieves good recall in hepatopathy trials, providing promising solutions for streamlining clinical trial workflows. Its efficiency, adaptability, and balanced performance profile make it suitable for improving patient recruitment. And its capability to function in resource-constrained environments further enhances its utility in clinical settings.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"42"},"PeriodicalIF":4.0,"publicationDate":"2025-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12167571/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144295174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Biodata MiningPub Date : 2025-06-13DOI: 10.1186/s13040-025-00456-7
Tue T Te, Alex A T Bui, Constance H Fung, Mary Regina Boland
{"title":"Geospatial analysis of short sleep duration and cognitive disability in US adults: a multi-state study using machine learning techniques.","authors":"Tue T Te, Alex A T Bui, Constance H Fung, Mary Regina Boland","doi":"10.1186/s13040-025-00456-7","DOIUrl":"10.1186/s13040-025-00456-7","url":null,"abstract":"<p><strong>Background: </strong>There is evidence of increased risk of cognitive disability due to short sleep duration and adverse Social Determinants of Health (SDoH). To determine whether spatial associations (correlation between spatially distributed variables within a given geographic area) exist between neighborhoods with short sleep duration and cognitive disability across the United States (US) after adjusting for other factors. We conducted a spatial analysis using a spatial lag model at the neighborhood-level with the census tract as unit-of-analysis within each state in the US. We aggregated our results nationally using a weighted analysis to adjust for the number of census tracts per state. This study used Centers for Disease Control and Prevention (CDC) data on short sleep duration, cognitive disability and other health factors. We used 2021-2022 neighborhood-level data from the CDC and US Census Bureau adjusting for social determinants of health (SDoH) and demographics, excluding Florida due to inconsistencies in data availability. Our exposure variable was self-reported short sleep defined by the CDC (\"sleep less than 7 hours per 24 hour period\"). Our outcome was self-reported cognitive disability defined by the CDC (\"difficulty concentrating, remembering, or making decision\"). We adjusted for other factors including 'health outcomes', 'preventive practices', and the CDC's Social Vulnerability Index.</p><p><strong>Results: </strong>The spatial analysis revealed a significant association between short sleep duration and an increased risk of cognitive disability across the US (estimate range [0.29; 1.27], p < 0.005) after adjustment. Notably, six Western states (New Mexico, Alaska, Arizona, Nevada, Idaho, and Oregon) were at increased risk of cognitive disability due to short sleep duration and this pattern was significant (p = 0.007).</p><p><strong>Conclusions: </strong>Our study highlights the importance of short sleep duration as a significant predictor of cognitive disability across the US after adjusting for other confounders. The association between short sleep and cognitive disability was especially strong in the Western region of the US providing a deeper understanding of how geographic context and local factors can shape health outcomes.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"18 1","pages":"41"},"PeriodicalIF":4.0,"publicationDate":"2025-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12166631/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144295129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}