{"title":"Multi-Modal CLIP-Informed Protein Editing.","authors":"Mingze Yin, Hanjing Zhou, Yiheng Zhu, Miao Lin, Yixuan Wu, Jialu Wu, Hongxia Xu, Chang-Yu Hsieh, Tingjun Hou, Jintai Chen, Jian Wu","doi":"10.34133/hds.0211","DOIUrl":"https://doi.org/10.34133/hds.0211","url":null,"abstract":"<p><p><b>Background:</b> Proteins govern most biological functions essential for life, and achieving controllable protein editing has made great advances in probing natural systems, creating therapeutic conjugates, and generating novel protein constructs. Recently, machine learning-assisted protein editing (MLPE) has shown promise in accelerating optimization cycles and reducing experimental workloads. However, current methods struggle with the vast combinatorial space of potential protein edits and cannot explicitly conduct protein editing using biotext instructions, limiting their interactivity with human feedback. <b>Methods:</b> To fill these gaps, we propose a novel method called ProtET for efficient CLIP-informed protein editing through multi-modality learning. Our approach comprises 2 stages: In the pretraining stage, contrastive learning aligns protein-biotext representations encoded by 2 large language models (LLMs). Subsequently, during the protein editing stage, the fused features from editing instruction texts and original protein sequences serve as the final editing condition for generating target protein sequences. <b>Results:</b> Comprehensive experiments demonstrated the superiority of ProtET in editing proteins to enhance human-expected functionality across multiple attribute domains, including enzyme catalytic activity, protein stability, and antibody-specific binding ability. ProtET improves the state-of-the-art results by a large margin, leading to substantial stability improvements of 16.67% and 16.90%. <b>Conclusions:</b> This capability positions ProtET to advance real-world artificial protein editing, potentially addressing unmet academic, industrial, and clinical needs.</p>","PeriodicalId":73207,"journal":{"name":"Health data science","volume":"4 ","pages":"0211"},"PeriodicalIF":0.0,"publicationDate":"2024-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11658819/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142866372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Burden of Type 2 Diabetes in Adolescents and Young Adults in China: A Secondary Analysis from the Global Burden of Disease Study 2021.","authors":"Junting Yang, Siwei Deng, Houyu Zhao, Feng Sun, Xiaotong Zou, Linong Ji, Siyan Zhan","doi":"10.34133/hds.0210","DOIUrl":"https://doi.org/10.34133/hds.0210","url":null,"abstract":"<p><p><b>Background:</b> Early-onset type 2 diabetes (T2D) is an increasingly serious public health issue, particularly in China. This study aimed to analyze the characteristics of disease burden, secular trend, and attributable risk factors of early-onset T2D in China. <b>Methods:</b> Using data from the Global Burden of Disease (GBD) 2021, we analyzed the age-standardized rate (ASR) of incidence, disability-adjusted life years (DALYs), and mortality rates of T2D among individuals aged 15 to 39 years in China from 1990 to 2021. Joinpoint regression analysis was employed to analyze secular trend, calculating the average annual percent change (AAPC). We also examined changes in the proportion of early-onset T2D within the total T2D burden and its attributable risk factors. <b>Results:</b> From 1990 to 2021, the ASR of incidence of early-onset T2D in China increased from 140.20 [95% uncertainty interval (UI): 89.14 to 204.74] to 315.97 (95% UI: 226.75 to 417.55) per 100,000, with an AAPC of 2.67% (95% CI: 2.60% to 2.75%, <i>P</i> < 0.001). DALYs rose from 116.29 (95% UI: 78.51 to 167.05) to 267.47 (95% UI: 171.08 to 387.38) per 100,000, with an AAPC of 2.75% (95% CI: 2.64% to 2.87%, <i>P</i> < 0.001). Mortality rates slightly decreased from 0.30 (95% UI: 0.24 to 0.38) to 0.28 (95% UI: 0.23 to 0.34) per 100,000, with an AAPC of -0.22% (95% CI: -0.33% to -0.11%, <i>P</i> < 0.001). The 15 to 19 years age group showed the fastest increase in incidence (AAPC: 4.08%, 95% CI: 3.93% to 4.29%, <i>P</i> < 0.001). The burden was consistently higher and increased more rapidly among males compared to females. The proportion of early-onset T2D within the total T2D burden fluctuated but remained higher than global levels. In 2021, high body mass index (BMI) was the primary attributable risk factor for DALYs of early-onset T2D (59.85%, 95% UI: 33.54% to 76.65%), and its contribution increased substantially from 40.08% (95% UI: 20.71% to 55.79%) in 1990, followed by ambient particulate matter pollution (14.77%, 95% UI: 8.24% to 21.24%) and diet high in red meat (9.33%, 95% UI: -1.42% to 20.06%). <b>Conclusion:</b> The disease burden of early-onset T2D in China is rapidly increasing, particularly among younger populations and males. Despite a slight decrease in mortality rates, the continued rapid increase in incidence and DALYs indicates a need for strengthened prevention and management strategies, especially interventions targeting younger age groups. High BMI and environmental pollution emerge as primary risk factors and should be prioritized in future interventions.</p>","PeriodicalId":73207,"journal":{"name":"Health data science","volume":"4 ","pages":"0210"},"PeriodicalIF":0.0,"publicationDate":"2024-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11651706/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142848616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Health data sciencePub Date : 2024-12-04eCollection Date: 2024-01-01DOI: 10.34133/hds.0196
Siqi Li, Di Miao, Qiming Wu, Chuan Hong, Danny D'Agostino, Xin Li, Yilin Ning, Yuqing Shang, Ziwen Wang, Molei Liu, Huazhu Fu, Marcus Eng Hock Ong, Hamed Haddadi, Nan Liu
{"title":"Federated Learning in Healthcare: A Benchmark Comparison of Engineering and Statistical Approaches for Structured Data Analysis.","authors":"Siqi Li, Di Miao, Qiming Wu, Chuan Hong, Danny D'Agostino, Xin Li, Yilin Ning, Yuqing Shang, Ziwen Wang, Molei Liu, Huazhu Fu, Marcus Eng Hock Ong, Hamed Haddadi, Nan Liu","doi":"10.34133/hds.0196","DOIUrl":"10.34133/hds.0196","url":null,"abstract":"<p><p><b>Background:</b> Federated learning (FL) holds promise for safeguarding data privacy in healthcare collaborations. While the term \"FL\" was originally coined by the engineering community, the statistical field has also developed privacy-preserving algorithms, though these are less recognized. Our goal was to bridge this gap with the first comprehensive comparison of FL frameworks from both domains. <b>Methods:</b> We assessed 7 FL frameworks, encompassing both engineering-based and statistical FL algorithms, and compared them against local and centralized modeling of logistic regression and least absolute shrinkage and selection operator (Lasso). Our evaluation utilized both simulated data and real-world emergency department data, focusing on comparing both estimated model coefficients and the performance of model predictions. <b>Results:</b> The findings reveal that statistical FL algorithms produce much less biased estimates of model coefficients. Conversely, engineering-based methods can yield models with slightly better prediction performance, occasionally outperforming both centralized and statistical FL models. <b>Conclusion:</b> This study underscores the relative strengths and weaknesses of both types of methods, providing recommendations for their selection based on distinct study characteristics. Furthermore, we emphasize the critical need to raise awareness of and integrate these methods into future applications of FL within the healthcare domain.</p>","PeriodicalId":73207,"journal":{"name":"Health data science","volume":"4 ","pages":"0196"},"PeriodicalIF":0.0,"publicationDate":"2024-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11615161/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142782014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Health data sciencePub Date : 2024-11-06eCollection Date: 2024-01-01DOI: 10.34133/hds.0197
Alireza Rafiei, Ronald Moore, Tilendra Choudhary, Curtis Marshall, Geoffrey Smith, John D Roback, Ravi M Patel, Cassandra D Josephson, Rishikesan Kamaleswaran
{"title":"Robust Meta-Model for Predicting the Likelihood of Receiving Blood Transfusion in Non-traumatic Intensive Care Unit Patients.","authors":"Alireza Rafiei, Ronald Moore, Tilendra Choudhary, Curtis Marshall, Geoffrey Smith, John D Roback, Ravi M Patel, Cassandra D Josephson, Rishikesan Kamaleswaran","doi":"10.34133/hds.0197","DOIUrl":"10.34133/hds.0197","url":null,"abstract":"<p><p><b>Background:</b> Blood transfusions, crucial in managing anemia and coagulopathy in intensive care unit (ICU) settings, require accurate prediction for effective resource allocation and patient risk assessment. However, existing clinical decision support systems have primarily targeted a particular patient demographic with unique medical conditions and focused on a single type of blood transfusion. This study aims to develop an advanced machine learning-based model to predict the probability of transfusion necessity over the next 24 h for a diverse range of non-traumatic ICU patients. <b>Methods:</b> We conducted a retrospective cohort study on 72,072 non-traumatic adult ICU patients admitted to a high-volume US metropolitan academic hospital between 2016 and 2020. We developed a meta-learner and various machine learning models to serve as predictors, training them annually with 4-year data and evaluating on the fifth, unseen year, iteratively over 5 years. <b>Results:</b> The experimental results revealed that the meta-model surpasses the other models in different development scenarios. It achieved notable performance metrics, including an area under the receiver operating characteristic curve of 0.97, an accuracy rate of 0.93, and an F1 score of 0.89 in the best scenario. <b>Conclusion:</b> This study pioneers the use of machine learning models for predicting the likelihood of blood transfusion receipt in a diverse cohort of critically ill patients. The findings of this evaluation confirm that our model not only effectively predicts transfusion reception but also identifies key biomarkers for making transfusion decisions.</p>","PeriodicalId":73207,"journal":{"name":"Health data science","volume":"4 ","pages":"0197"},"PeriodicalIF":0.0,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11538953/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142592448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Survival Disparities among Cancer Patients Based on Mobility Patterns: A Population-Based Study.","authors":"Fengyu Wen, Yike Zhang, Chao Yang, Pengfei Li, Qing Wang, Luxia Zhang","doi":"10.34133/hds.0198","DOIUrl":"10.34133/hds.0198","url":null,"abstract":"<p><p><b>Background:</b> Cancer is a major health problem worldwide. A growing number of cancer patients travel to hospitals outside their residential cities due to unbalanced medical resources. We aimed to evaluate the association between patterns of patient mobility and survival among patients with cancer. <b>Methods:</b> Data of patients hospitalized for cancer between January 2015 and December 2017 were collected from the regional data platform of an eastern coastal province of China. According to the cities of hospitalization and residency, 3 mobility patterns including intra-city, local center, and national center pattern were defined. Patients with intra-city pattern were sequentially matched to patients with the other 2 patterns on demographics, marital status, cancer type, comorbidity, and hospitalization frequency, using propensity score matching. We estimated 5-year survival and the associations between all-cause mortality and patient mobility. <b>Results:</b> Among 20,602 cancer patients, there were 17,035 (82.7%) patients with intra-city pattern, 2,974 (14.4%) patients with local center pattern, and 593 (2.9%) patients with national center pattern. Compared to patients with intra-city pattern, higher survival rates were observed in patients with local center pattern [5-year survival rate, 69.3% versus 65.4%; hazard ratio (HR), 0.85; 95% confidence interval (CI), 0.77 to 0.95] and in patients with national center pattern (5-year survival rate, 69.3% versus 64.5%; HR, 0.80; 95% CI, 0.67 to 0.97). <b>Conclusions:</b> We found significant survival disparities among different mobility patterns of patients with cancer. Improving the quality of cancer care is crucial, especially for cities with below-average healthcare resources.</p>","PeriodicalId":73207,"journal":{"name":"Health data science","volume":"10 ","pages":"0198"},"PeriodicalIF":0.0,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11535395/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142585159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Association of Smoking with Chronic Kidney Disease Stages 3 to 5: A Mendelian Randomization Study.","authors":"Zhilong Zhang, Feifei Zhang, Xiaomeng Zhang, Lanlan Lu, Luxia Zhang","doi":"10.34133/hds.0199","DOIUrl":"10.34133/hds.0199","url":null,"abstract":"<p><p><b>Background:</b> Previous studies suggested that smoking behavior (e.g., smoking status) was associated with an elevated risk of chronic kidney disease (CKD), yet whether this association is causal remains uncertain. <b>Methods:</b> We used data for half million participants aged 40 to 69 years from the UK Biobank cohort. In the traditional observational study, we used Cox proportional hazards models to calculate the associations between 2 smoking indices-smoking status and lifetime smoking index and incident CKD stages 3 to 5. Mendelian randomization (MR) approaches were used to estimate a potential causal effect. In one-sample MR, genetic variants associated with lifetime smoking index were used as instrument variables to examine the causal associations with CKD stages 3 to 5, among 344,255 UK Biobank participants with white British ancestry. We further validated our findings by a two-sample MR analysis using information from the Chronic Kidney Disease Genetics Consortium genome-wide association study. <b>Results:</b> In the traditional observational study, both smoking status [hazard ratio (HR): 1.26, 95% confidence interval (CI): 1.22 to 1.30] and lifetime smoking index (HR: 1.22, 95% CI: 1.20 to 1.24) were positively associated with a higher risk of incident CKD. However, both our one-sample and two-sample MR analyses showed no causal association between lifetime smoking index and CKD (all <i>P</i> > 0.05). The genetic instruments were validated by several statistical tests, and all sensitivity analyses showed similar results with the main model. <b>Conclusion:</b> Evidence from our analyses does not suggest a causal effect of smoking behavior on CKD risk. The positive association presented in the traditional observational study is possibly a result of confounding.</p>","PeriodicalId":73207,"journal":{"name":"Health data science","volume":"4 ","pages":"0199"},"PeriodicalIF":0.0,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11532587/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142577414","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deep Learning in Heart Sound Analysis: From Techniques to Clinical Applications.","authors":"Qinghao Zhao, Shijia Geng, Boya Wang, Yutong Sun, Wenchang Nie, Baochen Bai, Chao Yu, Feng Zhang, Gongzheng Tang, Deyun Zhang, Yuxi Zhou, Jian Liu, Shenda Hong","doi":"10.34133/hds.0182","DOIUrl":"10.34133/hds.0182","url":null,"abstract":"<p><p><b>Importance:</b> Heart sound auscultation is a routinely used physical examination in clinical practice to identify potential cardiac abnormalities. However, accurate interpretation of heart sounds requires specialized training and experience, which limits its generalizability. Deep learning, a subset of machine learning, involves training artificial neural networks to learn from large datasets and perform complex tasks with intricate patterns. Over the past decade, deep learning has been successfully applied to heart sound analysis, achieving remarkable results and accumulating substantial heart sound data for model training. Although several reviews have summarized deep learning algorithms for heart sound analysis, there is a lack of comprehensive summaries regarding the available heart sound data and the clinical applications. <b>Highlights:</b> This review will compile the commonly used heart sound datasets, introduce the fundamentals and state-of-the-art techniques in heart sound analysis and deep learning, and summarize the current applications of deep learning for heart sound analysis, along with their limitations and areas for future improvement. <b>Conclusions:</b> The integration of deep learning into heart sound analysis represents a significant advancement in clinical practice. The growing availability of heart sound datasets and the continuous development of deep learning techniques contribute to the improvement and broader clinical adoption of these models. However, ongoing research is needed to address existing challenges and refine these technologies for broader clinical use.</p>","PeriodicalId":73207,"journal":{"name":"Health data science","volume":"4 ","pages":"0182"},"PeriodicalIF":0.0,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11461928/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142395727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Health Co-Benefits of Environmental Changes in the Context of Carbon Peaking and Carbon Neutrality in China.","authors":"Feifei Zhang, Chao Yang, Fulin Wang, Pengfei Li, Luxia Zhang","doi":"10.34133/hds.0188","DOIUrl":"10.34133/hds.0188","url":null,"abstract":"<p><strong>Importance: </strong>Climate change mitigation policies aimed at limiting greenhouse gas (GHG) emissions would bring substantial health co-benefits by directly alleviating climate change or indirectly reducing air pollution. As one of the largest developing countries and GHG emitter globally, China's carbon-peaking and carbon neutrality goals would lead to substantial co-benefits on global environment and therefore on human health. This review summarized the key findings and gaps in studies on the impact of China's carbon mitigation strategies on human health.</p><p><strong>Highlights: </strong>There is a wide consensus that limiting the temperature rise well below 2 °C would markedly reduce the climate-related health impacts compared with high emission scenario, although heat-related mortalities, labor productivity reduction rates, and infectious disease morbidities would continue increasing over time as temperature rises. Further, hundreds of thousands of air pollutant-related mortalities (mainly due to PM<sub>2.5</sub> and O<sub>3</sub>) could be avoided per year compared with the reference scenario without climate policy. Carbon reduction policies can also alleviate morbidities due to acute exposure to PM<sub>2.5</sub>. Further research with respect to morbidities attributed to nonoptimal temperature and air pollution, and health impacts attributed to precipitation and extreme weather events under current carbon policy in China or its equivalent in other developing countries is needed to improve our understanding of the disease burden in the coming decades.</p><p><strong>Conclusions: </strong>This review provides up-to-date evidence of potential health co-benefits under Chinese carbon policies and highlights the importance of considering these co-benefits into future climate policy development in both China and other nations endeavoring carbon reductions.</p>","PeriodicalId":73207,"journal":{"name":"Health data science","volume":"4 ","pages":"0188"},"PeriodicalIF":0.0,"publicationDate":"2024-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11446102/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142367713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Health data sciencePub Date : 2024-10-01eCollection Date: 2024-01-01DOI: 10.34133/hds.0186
Chenyuan Qin, Qiao Liu, Yaping Wang, Jie Deng, Min Du, Min Liu, Jue Liu
{"title":"Disease Burden and Geographic Inequalities in 15 Types of Neonatal Infectious Diseases in 131 Low- and Middle-Income Countries and Territories.","authors":"Chenyuan Qin, Qiao Liu, Yaping Wang, Jie Deng, Min Du, Min Liu, Jue Liu","doi":"10.34133/hds.0186","DOIUrl":"10.34133/hds.0186","url":null,"abstract":"<p><p><b>Background:</b> The burden of neonatal infections in low- and middle-income countries and territories (LMICs) is a critical public health challenge, while our understanding of specific burden and secular trends remains limited. <b>Methods:</b> We gathered annual data on 15 types of neonatal infections in LMICs from 1990 to 2019 from the Global Burden of Disease 2019. Numbers, rates, percent changes, and estimated annual percentage changes of incidence and deaths were calculated. We also explored the association between disease burden, socio-demographic index (SDI), and universal health coverage index (UHCI). <b>Results:</b> Enteric infections and upper respiratory infections owned the top highest incidence rates for neonates in 2019. Neonatal sepsis and other neonatal infections, as well as otitis media, demonstrated an increasing trend of incidence across all 3 low- and middle-income regions. The top 3 causes of neonatal mortality in 2019 were neonatal sepsis and other neonatal infections, lower respiratory infections, and enteric infections. Between 1990 and 2019, all of the neonatal infection-related mortality rates suggested an overall decline. Sex differences could be found in the incidence and mortality of some neonatal infections, but most disease burdens decreased more rapidly in males. SDI and UHCI were both negatively associated with most of the disease burden, but there were exceptions. <b>Conclusions:</b> Our study serves as a vital exploration into the realities of neonatal infectious diseases in LMICs. The identified trends and disparities not only provide a foundation for future research but also underscore the critical need for targeted policy initiatives to alleviate on a global scale.</p>","PeriodicalId":73207,"journal":{"name":"Health data science","volume":"4 ","pages":"0186"},"PeriodicalIF":0.0,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11443844/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142360730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Health data sciencePub Date : 2024-09-06eCollection Date: 2024-01-01DOI: 10.34133/hds.0127
Zhiyun Zhang, Yining Hua, Peilin Zhou, Shixu Lin, Minghui Li, Yujie Zhang, Li Zhou, Yanhui Liao, Jie Yang
{"title":"Sexual and Gender-Diverse Individuals Face More Health Challenges during COVID-19: A Large-Scale Social Media Analysis with Natural Language Processing.","authors":"Zhiyun Zhang, Yining Hua, Peilin Zhou, Shixu Lin, Minghui Li, Yujie Zhang, Li Zhou, Yanhui Liao, Jie Yang","doi":"10.34133/hds.0127","DOIUrl":"10.34133/hds.0127","url":null,"abstract":"<p><p><b>Background:</b> The COVID-19 pandemic has caused a disproportionate impact on the sexual and gender-diverse (SGD) community. Compared with non-SGD populations, their social relations and health status are more vulnerable, whereas public health data regarding SGD are scarce. <b>Methods:</b> To analyze the concerns and health status of SGD individuals, this cohort study leveraged 471,371,477 tweets from 251,455 SGD and 22,644,411 non-SGD users, spanning from 2020 February 1 to 2022 April 30. The outcome measures comprised the distribution and dynamics of COVID-related topics, attitudes toward vaccines, and the prevalence of symptoms. <b>Results:</b> Topic analysis revealed that SGD users engaged more frequently in discussions related to \"friends and family\" (20.5% vs. 13.1%, <i>P</i> < 0.001) and \"wear masks\" (10.1% vs. 8.3%, <i>P</i> < 0.001) compared to non-SGD users. Additionally, SGD users exhibited a marked higher proportion of positive sentiment in tweets about vaccines, including Moderna, Pfizer, AstraZeneca, and Johnson & Johnson. Among 102,464 users who self-reported COVID-19 diagnoses, SGD users disclosed significantly higher frequencies of mentioning 61 out of 69 COVID-related symptoms than non-SGD users, encompassing both physical and mental health challenges. <b>Conclusion:</b> The results provide insights into an understanding of the unique needs and experiences of the SGD community during the pandemic, emphasizing the value of social media data in epidemiological and public health research.</p>","PeriodicalId":73207,"journal":{"name":"Health data science","volume":"4 ","pages":"0127"},"PeriodicalIF":0.0,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11378377/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142156847","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}