{"title":"Integrating Machine Learning into Statistical Methods in Disease Risk Prediction Modeling: A Systematic Review.","authors":"Meng Zhang, Yongqi Zheng, Xiagela Maidaiti, Baosheng Liang, Yongyue Wei, Feng Sun","doi":"10.34133/hds.0165","DOIUrl":"https://doi.org/10.34133/hds.0165","url":null,"abstract":"<p><p><b>Background:</b> Disease prediction models often use statistical methods or machine learning, both with their own corresponding application scenarios, raising the risk of errors when used alone. Integrating machine learning into statistical methods may yield robust prediction models. This systematic review aims to comprehensively assess current development of global disease prediction integration models. <b>Methods:</b> PubMed, EMbase, Web of Science, CNKI, VIP, WanFang, and SinoMed databases were searched to collect studies on prediction models integrating machine learning into statistical methods from database inception to 2023 May 1. Information including basic characteristics of studies, integrating approaches, application scenarios, modeling details, and model performance was extracted. <b>Results:</b> A total of 20 eligible studies in English and 1 in Chinese were included. Five studies concentrated on diagnostic models, while 16 studies concentrated on predicting disease occurrence or prognosis. Integrating strategies of classification models included majority voting, weighted voting, stacking, and model selection (when statistical methods and machine learning disagreed). Regression models adopted strategies including simple statistics, weighted statistics, and stacking. AUROC of integration models surpassed 0.75 and performed better than statistical methods and machine learning in most studies. Stacking was used for situations with >100 predictors and needed relatively larger amount of training data. <b>Conclusion:</b> Research on integrating machine learning into statistical methods in prediction models remains limited, but some studies have exhibited great potential that integration models outperform single models. This study provides insights for the selection of integration methods for different scenarios. Future research could emphasize on the improvement and validation of integrating strategies.</p>","PeriodicalId":73207,"journal":{"name":"Health data science","volume":"4 ","pages":"0165"},"PeriodicalIF":0.0,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11266123/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141763065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Health data sciencePub Date : 2024-06-07eCollection Date: 2024-01-01DOI: 10.34133/hds.0112
{"title":"2023 Beijing Health Data Science Summit.","authors":"","doi":"10.34133/hds.0112","DOIUrl":"10.34133/hds.0112","url":null,"abstract":"<p><p>The 5th annual Beijing Health Data Science Summit, organized by the National Institute of Health Data Science at Peking University, recently concluded with resounding success. This year, the summit aimed to foster collaboration among researchers, practitioners, and stakeholders in the field of health data science to advance the use of data for better health outcomes. One significant highlight of this year's summit was the introduction of the Abstract Competition, organized by <i>Health Data Science</i>, a Science Partner Journal, which focused on the use of cutting-edge data science methodologies, particularly the application of artificial intelligence in the healthcare scenarios. The competition provided a platform for researchers to showcase their groundbreaking work and innovations. In total, the summit received 61 abstract submissions. Following a rigorous evaluation process by the Abstract Review Committee, eight exceptional abstracts were selected to compete in the final round and give presentations in the Abstract Competition. The winners of the Abstract Competition are as follows:•First Prize: \"Interpretable Machine Learning for Predicting Outcomes of Childhood Kawasaki Disease: Electronic Health Record Analysis\" presented by researchers from the Chinese Academy of Medical Sciences, Peking Union Medical College, and Chongqing Medical University (presenter Yifan Duan).•Second Prize: \"Survival Disparities among Mobility Patterns of Patients with Cancer: A Population-Based Study\" presented by a team from Peking University (presenter Fengyu Wen).•Third Prize: \"Deep Learning-Based Real-Time Predictive Model for the Development of Acute Stroke\" presented by researchers from Beijing Tiantan Hospital (presenter Lan Lan). We extend our heartfelt gratitude to the esteemed panel of judges whose expertise and dedication ensured the fairness and quality of the competition. The judging panel included Jiebo Luo from the University of Rochester (chair), Shenda Hong from Peking University, Xiaozhong Liu from Worcester Polytechnic Institute, Liu Yang from Hong Kong Baptist University, Ma Jianzhu from Tsinghua University, Ting Ma from Harbin Institute of Technology, and Jian Tang from Mila-Quebec Artificial Intelligence Institute. We wish to convey our deep appreciation to Zixuan He and Haoyang Hong for their invaluable assistance in the meticulous planning and execution of the event. As the 2023 Beijing Health Data Science Summit comes to a close, we look forward to welcoming all participants to join us in 2024. Together, we will continue to advance the frontiers of health data science and work toward a healthier future for all.</p>","PeriodicalId":73207,"journal":{"name":"Health data science","volume":"4 ","pages":"0112"},"PeriodicalIF":0.0,"publicationDate":"2024-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11157085/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141297495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yanlin Qu, Guanran Zhang, Zhenyu Wu, H. Luo, Renjie Chen, Huixun Jia, Xiaodong Sun
{"title":"Associations of Socioeconomic Status Inequity with Incident Age-related Macular Degeneration in Middle-aged and Elderly Population","authors":"Yanlin Qu, Guanran Zhang, Zhenyu Wu, H. Luo, Renjie Chen, Huixun Jia, Xiaodong Sun","doi":"10.34133/hds.0148","DOIUrl":"https://doi.org/10.34133/hds.0148","url":null,"abstract":"","PeriodicalId":73207,"journal":{"name":"Health data science","volume":"50 4","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141123592","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shaohua Yin, Yingying Yang, Qin Wang, Wei Guo, Qian He, Lei Yuan, Keyi Si
{"title":"Association between abortion and all-cause and cause-specific premature mortality: a prospective cohort study from the UK Biobank","authors":"Shaohua Yin, Yingying Yang, Qin Wang, Wei Guo, Qian He, Lei Yuan, Keyi Si","doi":"10.34133/hds.0147","DOIUrl":"https://doi.org/10.34133/hds.0147","url":null,"abstract":"","PeriodicalId":73207,"journal":{"name":"Health data science","volume":"116 41","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141124542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Han Lv, Na Zeng, Mengyi Li, Jing Sun, Ning Wu, Mingze Xu, Qian Chen, Xinyu Zhao, Shuohua Chen, Wenjuan Liu, Xiaoshuai Li, Pengfei Zhao, Max Wintermark, Ying Hui, Jing Li, Shouling Wu, Zhenchang Wang
{"title":"Association Between Body Mass Index and Brain Health in Adults: A 16-Year Population-Based Cohort and Mendelian Randomization Study","authors":"Han Lv, Na Zeng, Mengyi Li, Jing Sun, Ning Wu, Mingze Xu, Qian Chen, Xinyu Zhao, Shuohua Chen, Wenjuan Liu, Xiaoshuai Li, Pengfei Zhao, Max Wintermark, Ying Hui, Jing Li, Shouling Wu, Zhenchang Wang","doi":"10.34133/hds.0087","DOIUrl":"https://doi.org/10.34133/hds.0087","url":null,"abstract":"","PeriodicalId":73207,"journal":{"name":"Health data science","volume":"82 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140085080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Health data sciencePub Date : 2024-02-26eCollection Date: 2024-01-01DOI: 10.34133/hds.0116
Benson Shu Yan Lam, Amanda Man Ying Chu, Jacky Ngai Lam Chan, Mike Ka Pui So
{"title":"Do Scholars Respond Faster Than Google Trends in Discussing COVID-19 Issues? An Approach to Textual Big Data.","authors":"Benson Shu Yan Lam, Amanda Man Ying Chu, Jacky Ngai Lam Chan, Mike Ka Pui So","doi":"10.34133/hds.0116","DOIUrl":"10.34133/hds.0116","url":null,"abstract":"<p><p><b>Background:</b> The COVID-19 pandemic has posed various difficulties for policymakers, such as the identification of health issues, establishment of policy priorities, formulation of regulations, and promotion of economic competitiveness. Evidence-based practices and data-driven decision-making have been recognized as valuable tools for improving the policymaking process. Nevertheless, due to the abundance of data, there is a need to develop sophisticated analytical techniques and tools to efficiently extract and analyze the data. <b>Methods:</b> Using Oxford COVID-19 Government Response Tracker, we categorize the policy responses into 6 different categories: (a) containment and closure, (b) health systems, (c) vaccines, (d) economic, (e) country, and (f) others. We proposed a novel research framework to compare the response times of the scholars and the general public. To achieve this, we analyzed more than 400,000 research abstracts published over the past 2.5 years, along with text information from Google Trends as a proxy for topics of public concern. We introduced an innovative text-mining method: coherent topic clustering to analyze the huge number of abstracts. <b>Results:</b> Our results show that the research abstracts not only discussed almost all of the COVID-19 issues earlier than Google Trends did, but they also provided more in-depth coverage. This should help policymakers identify core COVID-19 issues and act earlier. Besides, our clustering method can better reflect the main messages of the abstracts than a recent advanced deep learning-based topic modeling tool. <b>Conclusion:</b> Scholars generally have a faster response in discussing COVID-19 issues than Google Trends.</p>","PeriodicalId":73207,"journal":{"name":"Health data science","volume":"4 ","pages":"0116"},"PeriodicalIF":0.0,"publicationDate":"2024-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10895931/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140133416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Health data sciencePub Date : 2024-02-23eCollection Date: 2024-01-01DOI: 10.34133/hds.0113
Yizhen Luo, Xing Yi Liu, Kai Yang, Kui Huang, Massimo Hong, Jiahuan Zhang, Yushuai Wu, Zaiqing Nie
{"title":"Toward Unified AI Drug Discovery with Multimodal Knowledge.","authors":"Yizhen Luo, Xing Yi Liu, Kai Yang, Kui Huang, Massimo Hong, Jiahuan Zhang, Yushuai Wu, Zaiqing Nie","doi":"10.34133/hds.0113","DOIUrl":"10.34133/hds.0113","url":null,"abstract":"<p><p><b>Background:</b> In real-world drug discovery, human experts typically grasp molecular knowledge of drugs and proteins from multimodal sources including molecular structures, structured knowledge from knowledge bases, and unstructured knowledge from biomedical literature. Existing multimodal approaches in AI drug discovery integrate either structured or unstructured knowledge independently, which compromises the holistic understanding of biomolecules. Besides, they fail to address the missing modality problem, where multimodal information is missing for novel drugs and proteins. <b>Methods:</b> In this work, we present KEDD, a unified, end-to-end deep learning framework that jointly incorporates both structured and unstructured knowledge for vast AI drug discovery tasks. The framework first incorporates independent representation learning models to extract the underlying characteristics from each modality. Then, it applies a feature fusion technique to calculate the prediction results. To mitigate the missing modality problem, we leverage sparse attention and a modality masking technique to reconstruct the missing features based on top relevant molecules. <b>Results:</b> Benefiting from structured and unstructured knowledge, our framework achieves a deeper understanding of biomolecules. KEDD outperforms state-of-the-art models by an average of 5.2% on drug-target interaction prediction, 2.6% on drug property prediction, 1.2% on drug-drug interaction prediction, and 4.1% on protein-protein interaction prediction. Through qualitative analysis, we reveal KEDD's promising potential in assisting real-world applications. <b>Conclusions:</b> By incorporating biomolecular expertise from multimodal knowledge, KEDD bears promise in accelerating drug discovery.</p>","PeriodicalId":73207,"journal":{"name":"Health data science","volume":"4 ","pages":"0113"},"PeriodicalIF":0.0,"publicationDate":"2024-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10886071/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140133417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chenhao Zhang, Yang Yang, Qinghua Cui, Dongyu Zhao, Chunmei Cui
{"title":"Identification and analysis of sex-biased copy number alterations","authors":"Chenhao Zhang, Yang Yang, Qinghua Cui, Dongyu Zhao, Chunmei Cui","doi":"10.34133/hds.0121","DOIUrl":"https://doi.org/10.34133/hds.0121","url":null,"abstract":"","PeriodicalId":73207,"journal":{"name":"Health data science","volume":"4 3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140442547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Health data sciencePub Date : 2023-12-16eCollection Date: 2024-01-01DOI: 10.34133/hds.0216
Racha Gouareb, Alban Bornet, Dimitrios Proios, Sónia Gonçalves Pereira, Douglas Teodoro
{"title":"Erratum to \"Detection of Patients at Risk of Multidrug-Resistant Enterobacteriaceae Infection Using Graph Neural Networks: A Retrospective Study\".","authors":"Racha Gouareb, Alban Bornet, Dimitrios Proios, Sónia Gonçalves Pereira, Douglas Teodoro","doi":"10.34133/hds.0216","DOIUrl":"https://doi.org/10.34133/hds.0216","url":null,"abstract":"<p><p>[This corrects the article DOI: 10.34133/hds.0099.].</p>","PeriodicalId":73207,"journal":{"name":"Health data science","volume":"4 ","pages":"0216"},"PeriodicalIF":0.0,"publicationDate":"2023-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11649199/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142840458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Adeolu Z Ogunleye, Chayanit Piyawajanusorn, G. Ghislat, Pedro Ballester
{"title":"Large-scale machine learning analysis reveals DNA-methylation and gene-expression response signatures for gemcitabine-treated pancreatic cancer","authors":"Adeolu Z Ogunleye, Chayanit Piyawajanusorn, G. Ghislat, Pedro Ballester","doi":"10.34133/hds.0108","DOIUrl":"https://doi.org/10.34133/hds.0108","url":null,"abstract":"","PeriodicalId":73207,"journal":{"name":"Health data science","volume":"4 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139007094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}