Health data sciencePub Date : 2024-09-06eCollection Date: 2024-01-01DOI: 10.34133/hds.0127
Zhiyun Zhang, Yining Hua, Peilin Zhou, Shixu Lin, Minghui Li, Yujie Zhang, Li Zhou, Yanhui Liao, Jie Yang
{"title":"Sexual and Gender-Diverse Individuals Face More Health Challenges during COVID-19: A Large-Scale Social Media Analysis with Natural Language Processing.","authors":"Zhiyun Zhang, Yining Hua, Peilin Zhou, Shixu Lin, Minghui Li, Yujie Zhang, Li Zhou, Yanhui Liao, Jie Yang","doi":"10.34133/hds.0127","DOIUrl":"10.34133/hds.0127","url":null,"abstract":"<p><p><b>Background:</b> The COVID-19 pandemic has caused a disproportionate impact on the sexual and gender-diverse (SGD) community. Compared with non-SGD populations, their social relations and health status are more vulnerable, whereas public health data regarding SGD are scarce. <b>Methods:</b> To analyze the concerns and health status of SGD individuals, this cohort study leveraged 471,371,477 tweets from 251,455 SGD and 22,644,411 non-SGD users, spanning from 2020 February 1 to 2022 April 30. The outcome measures comprised the distribution and dynamics of COVID-related topics, attitudes toward vaccines, and the prevalence of symptoms. <b>Results:</b> Topic analysis revealed that SGD users engaged more frequently in discussions related to \"friends and family\" (20.5% vs. 13.1%, <i>P</i> < 0.001) and \"wear masks\" (10.1% vs. 8.3%, <i>P</i> < 0.001) compared to non-SGD users. Additionally, SGD users exhibited a marked higher proportion of positive sentiment in tweets about vaccines, including Moderna, Pfizer, AstraZeneca, and Johnson & Johnson. Among 102,464 users who self-reported COVID-19 diagnoses, SGD users disclosed significantly higher frequencies of mentioning 61 out of 69 COVID-related symptoms than non-SGD users, encompassing both physical and mental health challenges. <b>Conclusion:</b> The results provide insights into an understanding of the unique needs and experiences of the SGD community during the pandemic, emphasizing the value of social media data in epidemiological and public health research.</p>","PeriodicalId":73207,"journal":{"name":"Health data science","volume":"4 ","pages":"0127"},"PeriodicalIF":0.0,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11378377/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142156847","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Integrating Machine Learning into Statistical Methods in Disease Risk Prediction Modeling: A Systematic Review.","authors":"Meng Zhang, Yongqi Zheng, Xiagela Maidaiti, Baosheng Liang, Yongyue Wei, Feng Sun","doi":"10.34133/hds.0165","DOIUrl":"https://doi.org/10.34133/hds.0165","url":null,"abstract":"<p><p><b>Background:</b> Disease prediction models often use statistical methods or machine learning, both with their own corresponding application scenarios, raising the risk of errors when used alone. Integrating machine learning into statistical methods may yield robust prediction models. This systematic review aims to comprehensively assess current development of global disease prediction integration models. <b>Methods:</b> PubMed, EMbase, Web of Science, CNKI, VIP, WanFang, and SinoMed databases were searched to collect studies on prediction models integrating machine learning into statistical methods from database inception to 2023 May 1. Information including basic characteristics of studies, integrating approaches, application scenarios, modeling details, and model performance was extracted. <b>Results:</b> A total of 20 eligible studies in English and 1 in Chinese were included. Five studies concentrated on diagnostic models, while 16 studies concentrated on predicting disease occurrence or prognosis. Integrating strategies of classification models included majority voting, weighted voting, stacking, and model selection (when statistical methods and machine learning disagreed). Regression models adopted strategies including simple statistics, weighted statistics, and stacking. AUROC of integration models surpassed 0.75 and performed better than statistical methods and machine learning in most studies. Stacking was used for situations with >100 predictors and needed relatively larger amount of training data. <b>Conclusion:</b> Research on integrating machine learning into statistical methods in prediction models remains limited, but some studies have exhibited great potential that integration models outperform single models. This study provides insights for the selection of integration methods for different scenarios. Future research could emphasize on the improvement and validation of integrating strategies.</p>","PeriodicalId":73207,"journal":{"name":"Health data science","volume":"4 ","pages":"0165"},"PeriodicalIF":0.0,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11266123/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141763065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Health data sciencePub Date : 2024-06-07eCollection Date: 2024-01-01DOI: 10.34133/hds.0112
{"title":"2023 Beijing Health Data Science Summit.","authors":"","doi":"10.34133/hds.0112","DOIUrl":"10.34133/hds.0112","url":null,"abstract":"<p><p>The 5th annual Beijing Health Data Science Summit, organized by the National Institute of Health Data Science at Peking University, recently concluded with resounding success. This year, the summit aimed to foster collaboration among researchers, practitioners, and stakeholders in the field of health data science to advance the use of data for better health outcomes. One significant highlight of this year's summit was the introduction of the Abstract Competition, organized by <i>Health Data Science</i>, a Science Partner Journal, which focused on the use of cutting-edge data science methodologies, particularly the application of artificial intelligence in the healthcare scenarios. The competition provided a platform for researchers to showcase their groundbreaking work and innovations. In total, the summit received 61 abstract submissions. Following a rigorous evaluation process by the Abstract Review Committee, eight exceptional abstracts were selected to compete in the final round and give presentations in the Abstract Competition. The winners of the Abstract Competition are as follows:•First Prize: \"Interpretable Machine Learning for Predicting Outcomes of Childhood Kawasaki Disease: Electronic Health Record Analysis\" presented by researchers from the Chinese Academy of Medical Sciences, Peking Union Medical College, and Chongqing Medical University (presenter Yifan Duan).•Second Prize: \"Survival Disparities among Mobility Patterns of Patients with Cancer: A Population-Based Study\" presented by a team from Peking University (presenter Fengyu Wen).•Third Prize: \"Deep Learning-Based Real-Time Predictive Model for the Development of Acute Stroke\" presented by researchers from Beijing Tiantan Hospital (presenter Lan Lan). We extend our heartfelt gratitude to the esteemed panel of judges whose expertise and dedication ensured the fairness and quality of the competition. The judging panel included Jiebo Luo from the University of Rochester (chair), Shenda Hong from Peking University, Xiaozhong Liu from Worcester Polytechnic Institute, Liu Yang from Hong Kong Baptist University, Ma Jianzhu from Tsinghua University, Ting Ma from Harbin Institute of Technology, and Jian Tang from Mila-Quebec Artificial Intelligence Institute. We wish to convey our deep appreciation to Zixuan He and Haoyang Hong for their invaluable assistance in the meticulous planning and execution of the event. As the 2023 Beijing Health Data Science Summit comes to a close, we look forward to welcoming all participants to join us in 2024. Together, we will continue to advance the frontiers of health data science and work toward a healthier future for all.</p>","PeriodicalId":73207,"journal":{"name":"Health data science","volume":"4 ","pages":"0112"},"PeriodicalIF":0.0,"publicationDate":"2024-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11157085/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141297495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yanlin Qu, Guanran Zhang, Zhenyu Wu, H. Luo, Renjie Chen, Huixun Jia, Xiaodong Sun
{"title":"Associations of Socioeconomic Status Inequity with Incident Age-related Macular Degeneration in Middle-aged and Elderly Population","authors":"Yanlin Qu, Guanran Zhang, Zhenyu Wu, H. Luo, Renjie Chen, Huixun Jia, Xiaodong Sun","doi":"10.34133/hds.0148","DOIUrl":"https://doi.org/10.34133/hds.0148","url":null,"abstract":"","PeriodicalId":73207,"journal":{"name":"Health data science","volume":"50 4","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141123592","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shaohua Yin, Yingying Yang, Qin Wang, Wei Guo, Qian He, Lei Yuan, Keyi Si
{"title":"Association between abortion and all-cause and cause-specific premature mortality: a prospective cohort study from the UK Biobank","authors":"Shaohua Yin, Yingying Yang, Qin Wang, Wei Guo, Qian He, Lei Yuan, Keyi Si","doi":"10.34133/hds.0147","DOIUrl":"https://doi.org/10.34133/hds.0147","url":null,"abstract":"","PeriodicalId":73207,"journal":{"name":"Health data science","volume":"116 41","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141124542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Han Lv, Na Zeng, Mengyi Li, Jing Sun, Ning Wu, Mingze Xu, Qian Chen, Xinyu Zhao, Shuohua Chen, Wenjuan Liu, Xiaoshuai Li, Pengfei Zhao, Max Wintermark, Ying Hui, Jing Li, Shouling Wu, Zhenchang Wang
{"title":"Association Between Body Mass Index and Brain Health in Adults: A 16-Year Population-Based Cohort and Mendelian Randomization Study","authors":"Han Lv, Na Zeng, Mengyi Li, Jing Sun, Ning Wu, Mingze Xu, Qian Chen, Xinyu Zhao, Shuohua Chen, Wenjuan Liu, Xiaoshuai Li, Pengfei Zhao, Max Wintermark, Ying Hui, Jing Li, Shouling Wu, Zhenchang Wang","doi":"10.34133/hds.0087","DOIUrl":"https://doi.org/10.34133/hds.0087","url":null,"abstract":"","PeriodicalId":73207,"journal":{"name":"Health data science","volume":"82 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140085080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Health data sciencePub Date : 2024-02-26eCollection Date: 2024-01-01DOI: 10.34133/hds.0116
Benson Shu Yan Lam, Amanda Man Ying Chu, Jacky Ngai Lam Chan, Mike Ka Pui So
{"title":"Do Scholars Respond Faster Than Google Trends in Discussing COVID-19 Issues? An Approach to Textual Big Data.","authors":"Benson Shu Yan Lam, Amanda Man Ying Chu, Jacky Ngai Lam Chan, Mike Ka Pui So","doi":"10.34133/hds.0116","DOIUrl":"10.34133/hds.0116","url":null,"abstract":"<p><p><b>Background:</b> The COVID-19 pandemic has posed various difficulties for policymakers, such as the identification of health issues, establishment of policy priorities, formulation of regulations, and promotion of economic competitiveness. Evidence-based practices and data-driven decision-making have been recognized as valuable tools for improving the policymaking process. Nevertheless, due to the abundance of data, there is a need to develop sophisticated analytical techniques and tools to efficiently extract and analyze the data. <b>Methods:</b> Using Oxford COVID-19 Government Response Tracker, we categorize the policy responses into 6 different categories: (a) containment and closure, (b) health systems, (c) vaccines, (d) economic, (e) country, and (f) others. We proposed a novel research framework to compare the response times of the scholars and the general public. To achieve this, we analyzed more than 400,000 research abstracts published over the past 2.5 years, along with text information from Google Trends as a proxy for topics of public concern. We introduced an innovative text-mining method: coherent topic clustering to analyze the huge number of abstracts. <b>Results:</b> Our results show that the research abstracts not only discussed almost all of the COVID-19 issues earlier than Google Trends did, but they also provided more in-depth coverage. This should help policymakers identify core COVID-19 issues and act earlier. Besides, our clustering method can better reflect the main messages of the abstracts than a recent advanced deep learning-based topic modeling tool. <b>Conclusion:</b> Scholars generally have a faster response in discussing COVID-19 issues than Google Trends.</p>","PeriodicalId":73207,"journal":{"name":"Health data science","volume":"4 ","pages":"0116"},"PeriodicalIF":0.0,"publicationDate":"2024-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10895931/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140133416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Health data sciencePub Date : 2024-02-23eCollection Date: 2024-01-01DOI: 10.34133/hds.0113
Yizhen Luo, Xing Yi Liu, Kai Yang, Kui Huang, Massimo Hong, Jiahuan Zhang, Yushuai Wu, Zaiqing Nie
{"title":"Toward Unified AI Drug Discovery with Multimodal Knowledge.","authors":"Yizhen Luo, Xing Yi Liu, Kai Yang, Kui Huang, Massimo Hong, Jiahuan Zhang, Yushuai Wu, Zaiqing Nie","doi":"10.34133/hds.0113","DOIUrl":"10.34133/hds.0113","url":null,"abstract":"<p><p><b>Background:</b> In real-world drug discovery, human experts typically grasp molecular knowledge of drugs and proteins from multimodal sources including molecular structures, structured knowledge from knowledge bases, and unstructured knowledge from biomedical literature. Existing multimodal approaches in AI drug discovery integrate either structured or unstructured knowledge independently, which compromises the holistic understanding of biomolecules. Besides, they fail to address the missing modality problem, where multimodal information is missing for novel drugs and proteins. <b>Methods:</b> In this work, we present KEDD, a unified, end-to-end deep learning framework that jointly incorporates both structured and unstructured knowledge for vast AI drug discovery tasks. The framework first incorporates independent representation learning models to extract the underlying characteristics from each modality. Then, it applies a feature fusion technique to calculate the prediction results. To mitigate the missing modality problem, we leverage sparse attention and a modality masking technique to reconstruct the missing features based on top relevant molecules. <b>Results:</b> Benefiting from structured and unstructured knowledge, our framework achieves a deeper understanding of biomolecules. KEDD outperforms state-of-the-art models by an average of 5.2% on drug-target interaction prediction, 2.6% on drug property prediction, 1.2% on drug-drug interaction prediction, and 4.1% on protein-protein interaction prediction. Through qualitative analysis, we reveal KEDD's promising potential in assisting real-world applications. <b>Conclusions:</b> By incorporating biomolecular expertise from multimodal knowledge, KEDD bears promise in accelerating drug discovery.</p>","PeriodicalId":73207,"journal":{"name":"Health data science","volume":"4 ","pages":"0113"},"PeriodicalIF":0.0,"publicationDate":"2024-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10886071/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140133417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chenhao Zhang, Yang Yang, Qinghua Cui, Dongyu Zhao, Chunmei Cui
{"title":"Identification and analysis of sex-biased copy number alterations","authors":"Chenhao Zhang, Yang Yang, Qinghua Cui, Dongyu Zhao, Chunmei Cui","doi":"10.34133/hds.0121","DOIUrl":"https://doi.org/10.34133/hds.0121","url":null,"abstract":"","PeriodicalId":73207,"journal":{"name":"Health data science","volume":"4 3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140442547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Health data sciencePub Date : 2024-01-10eCollection Date: 2025-01-01DOI: 10.34133/hds.0218
Jingjing Wang, Xinran Lu, Sing Bik Cindy Ngai, Lili Xie, Xiaoyun Liu, Yao Yao, Yinzi Jin
{"title":"Digital Exclusion and Depressive Symptoms among Older People: Findings from Five Aging Cohort Studies across 24 Countries.","authors":"Jingjing Wang, Xinran Lu, Sing Bik Cindy Ngai, Lili Xie, Xiaoyun Liu, Yao Yao, Yinzi Jin","doi":"10.34133/hds.0218","DOIUrl":"10.34133/hds.0218","url":null,"abstract":"<p><p><b>Background:</b> Digital exclusion is a global issue that disproportionately affects older individuals especially in low- and middle-income nations. However, there is a wide gap in current research regarding the impact of digital exclusion on the mental health of older adults in both high-income and low- and middle-income countries. <b>Methods:</b> We analyzed data from 5 longitudinal cohorts: the Health and Retirement Study (HRS), the English Longitudinal Study of Aging (ELSA), the Survey of Health, Ageing and Retirement in Europe (SHARE), the China Health and Retirement Longitudinal Study (CHARLS), and the Mexican Health and Aging Study (MHAS). These cohorts consisted of nationwide samples from 24 countries. Digital exclusion was defined as the self-reported lack of access to the internet. Depressive symptoms were assessed using comparable scales across all cohorts. We used generalized estimating equation models, fitting a Poisson model, to investigate the association between the digital exclusion and depressive symptoms. We adjusted for the causal directed acyclic graph (DAG) minimal sufficient adjustment set (MSAS), which includes gender, age, retirement status, education, household wealth, social activities, and weekly contact with their children. <b>Results:</b> During the study period (2010-2018), 122,242 participants underwent up to 5 rounds of follow-up. Digital exclusion varied greatly across countries, ranging from 21.1% in Denmark to 96.9% in China. The crude model revealed a significant association between digital exclusion and depressive symptoms. This association remained statistically significant in the MSAS-adjusted model across all cohorts: HRS [incidence rate ratio (IRR), 1.37; 95% confidence interval (CI), 1.28 to 1.47], ELSA (IRR, 1.32; 95% CI, 1.23 to 1.41), SHARE (IRR, 1.30; 95% CI, 1.27 to 1.33), CHARLS (IRR, 1.62; 95% CI, 1.38 to 1.91), and MHAS (IRR, 1.31; 95% CI, 1.26 to 1.37); all <i>P</i>s < 0.001. Notably, this association was consistently stronger in individuals living in lower wealth quintile households across all 5 cohorts and among those who do not regularly interact with their children, except for ELSA. <b>Conclusions:</b> Digital exclusion is globally widespread among older adults. Older individuals who are digitally excluded are at a higher risk of developing depressive symptoms, particularly those with limited communication with their offspring and individuals living in lower wealth quintile households. Prioritizing the provision of internet access to older populations may help reduce the risks of depression symptoms, especially among vulnerable groups with limited familial support and with lower income.</p>","PeriodicalId":73207,"journal":{"name":"Health data science","volume":"5 ","pages":"0218"},"PeriodicalIF":0.0,"publicationDate":"2024-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11717435/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142959721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}