{"title":"Effects of Mobile Health Care App \"Asmile\" on Physical Activity of 80,689 Users in Osaka Prefecture, Japan: Longitudinal Observational Study.","authors":"Asuka Oyama, Kenshiro Taguchi, Hiroe Seto, Reiko Kanaya, Jun'ichi Kotoku, Miyae Yamakawa, Hiroshi Toki, Ryohei Yamamoto","doi":"10.2196/65943","DOIUrl":"10.2196/65943","url":null,"abstract":"<p><strong>Background: </strong>Lifestyle-related diseases can be controlled by improving individuals' lifestyles; however, improving and maintaining a healthy lifestyle is difficult. Mobile health (mHealth) applications have recently attracted attention as tools for maintaining and improving health, and their use may also increase physical activity.</p><p><strong>Objective: </strong>This study aimed to verify the effect of registration in Asmile, an mHealth application provided by the Osaka Prefectural Government, on step counts using a Causal Impact approach based on the step count data recorded in the Asmile application.</p><p><strong>Methods: </strong>This observational study included Osaka residents aged 20-79 years, newly registered to Asmile, between the fiscal years 2020 and 2023. Of these, 80,689 participants with step count records for 4 weeks before and after the day they registered to Asmile were included in the analysis. We used daily step counts that were automatically transferred from a standard smartphone health care app into Asmile. We used a Causal Impact model to estimate the increase in step count after registration to Asmile.</p><p><strong>Results: </strong>Of the 80,689 participants analyzed, 38.5% (31,082/80,689) were men, and the mean age was 51.6 (SD 13.2) years. The mean step count before registration was 5923 (SD 4860) steps per day, with the highest proportion of new users registered in spring (38,389/80,689, 47.6%) and in fiscal year 2020 (34,491/80,689, 42.7%). The analysis revealed that the effect of Asmile registration on step counts was 360 steps (95% CI 331-389) per day and 10,041 steps (95% CI 9632-10,450) over 4 weeks. Stratified analysis showed that the impact of increased step count was more pronounced in younger groups and groups with fewer step counts before registration. Conversely, the effect of registration on step count was relatively minor in the groups registered in summer or winter.</p><p><strong>Conclusions: </strong>This study demonstrates increased physical activity among users registered with the Asmile app. These findings suggest that mHealth apps such as Asmile can effectively promote healthier lifestyles and potentially reduce the risk of lifestyle-related diseases.</p>","PeriodicalId":16337,"journal":{"name":"Journal of Medical Internet Research","volume":"27 ","pages":"e65943"},"PeriodicalIF":5.8,"publicationDate":"2025-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144111047","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Association Between Excessive Internet Use Time, Internet Addiction, and Physical-Mental Multimorbidity Among Chinese Adolescents: Cross-Sectional Study.","authors":"Huiwen Gu, Bing Shi, Huanying He, Sumei Yuan, Jijiao Cai, Xiaofang Chen, Zhongxiao Wan","doi":"10.2196/69210","DOIUrl":"10.2196/69210","url":null,"abstract":"<p><strong>Background: </strong>In contemporary society, the lives of adolescents are profoundly influenced by the internet. While irrational internet use may have an impact on the physical and mental well-being of teenagers, the relationship between excessive internet use and physical-mental multimorbidity in adolescents remains unclear.</p><p><strong>Objective: </strong>The aim of this study was to examine the relationship between excessive internet use and physical-mental multimorbidity among adolescents in China.</p><p><strong>Methods: </strong>A total of 5842 students aged 13 to 18 years from Suzhou city in Eastern China were recruited. Four specific physical disorders and a mental disorder were considered to assess the physical-mental multimorbidity, that is, obesity, hypertension, myopia, dental caries, and depressive symptoms. Logistic regression models were used to evaluate the odds ratios (ORs) and 95% CIs between internet use time, internet addiction (IA) behaviors, and physical-mental multimorbidity. Mediation analyses were performed to explore the mediating effect of sleep duration, diet scores, and tobacco or alcohol consumption on the association between excessive internet use and physical-mental multimorbidity.</p><p><strong>Results: </strong>A total of 973 (16.7%) students exhibited physical-mental multimorbidity. Students with excessive internet use time (≥2 hours per day) were associated with 45% higher odds of physical-mental multimorbidity compared to their peers who reported internet use for <1 hour per day. Among children and adolescents, a significant J-shaped association was observed between internet use time and physical-mental multimorbidity (nonlinear P<.001). Diet score (16.3%) and tobacco or alcohol consumption (12.7%) partially mediated the relationship. Students who met 1 IA behavior (OR 2.44, 95% CI 2.00-2.98) or ≥2 IA behaviors (OR 5.80, 95% CI 4.90-6.86) were associated with higher odds of physical-mental multimorbidity. In the total population, a positive nonlinear correlation was identified between the number of IA behaviors and physical-mental multimorbidity (nonlinear P<.001). Sleep duration (2.3%), dietary scores (6.1%), and tobacco or alcohol consumption (6.2%) partially mediated the association.</p><p><strong>Conclusions: </strong>Excessive internet use is associated with increased odds of physical-mental multimorbidity among adolescents. Sleep duration, dietary quality, and tobacco or alcohol consumption may partially mediate this relationship. These findings highlight the need for monitoring and promoting healthy internet habits as well as addressing lifestyle factors in order to prevent and control physical-mental multimorbidity among adolescents. This research will also provide references for managing internet use and physical-mental health as well as for future longitudinal studies.</p>","PeriodicalId":16337,"journal":{"name":"Journal of Medical Internet Research","volume":"27 ","pages":"e69210"},"PeriodicalIF":5.8,"publicationDate":"2025-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144119947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Iliya Khakban, Shagun Jain, Joseph Gallab, Blossom Dharmaraj, Fangwen Zhou, Cynthia Lokker, Wael Abdelkader, Dena Zeraatkar, Jason W Busse
{"title":"Impact of the COVID-19 Pandemic and the 2021 National Institute for Health and Care Excellence Guidelines on Public Perspectives Toward Myalgic Encephalomyelitis/Chronic Fatigue Syndrome: Thematic and Sentiment Analysis on Twitter (Rebranded as X).","authors":"Iliya Khakban, Shagun Jain, Joseph Gallab, Blossom Dharmaraj, Fangwen Zhou, Cynthia Lokker, Wael Abdelkader, Dena Zeraatkar, Jason W Busse","doi":"10.2196/65087","DOIUrl":"https://doi.org/10.2196/65087","url":null,"abstract":"<p><strong>Background: </strong>Myalgic encephalomyelitis (ME), also referred to as chronic fatigue syndrome (CFS), is a complex illness that typically presents with disabling fatigue and cognitive and functional impairment. The etiology and management of ME/CFS remain contentious and patients often describe their experiences through social media.</p><p><strong>Objective: </strong>We explored public discourse on Twitter (rebranded as X) to understand the concerns and priorities of individuals living with ME/CFS, with a focus on (1) the COVID-19 pandemic and (2) publication of the 2021 UK National Institute for Health and Care Excellence (NICE) guidelines on the diagnosis and management of ME/CFS.</p><p><strong>Methods: </strong>We used the Twitter application programming interface to collect tweets related to ME/CFS posted between January 1, 2010, and January 30, 2024. Tweets were sorted into 3 chronological periods (pre-COVID-19 pandemic, post-COVID-19 pandemic, and post-UK 2021 NICE Guidelines publication). A Robustly Optimized Bidirectional Embedding Representations from Transformers Pretraining Approach (RoBERTa) language processing model was used to categorize the sentiment of tweets as positive, negative, or neutral. We identified tweets that mentioned COVID-19, the UK NICE guidelines, and key themes identified through latent Dirichlet allocation (ie, fibromyalgia, research, and treatment). We sampled 1000 random tweets from each theme to identify subthemes and representative quotes.</p><p><strong>Results: </strong>We retrieved 906,404 tweets, of which 427,824 (47.2%) were neutral, 369,371 (40.75%) were negative, and 109,209 (12.05%) were positive. Over time, both the proportion of negative and positive tweets increased, and the proportion of neutral tweets decreased (P<.001 for all changes). Tweets mentioning fibromyalgia acknowledged similarities with ME/CFS, stigmatization associated with both disorders, and lack of effective treatments. Treatment-related tweets often described frustration with ME/CFS labeled as mental illness, dismissal of concerns by health care providers, and the need to seek out \"good physicians\" who viewed ME/CFS as a physical disorder. Tweets on research typically praised studies of biomarkers and biomedical therapies, called for greater investment in biomedical research, and expressed frustration with studies suggesting a biopsychosocial etiology for ME/CFS or supporting management with psychotherapy or graduated activity. Tweets about the UK NICE guidelines expressed frustration with the 2007 version that recommended cognitive behavioral therapy and graded exercise therapy, and a prolonged campaign by advocacy organizations to influence subsequent versions. Tweets showed high acceptance of the 2021 UK NICE guidelines, which were seen to validate ME/CFS as a biomedical disease and recommended against graded exercise therapy. Tweets about COVID-19 often noted overlaps between post-COVID-19 condition and ME/CFS, inc","PeriodicalId":16337,"journal":{"name":"Journal of Medical Internet Research","volume":"27 ","pages":"e65087"},"PeriodicalIF":5.8,"publicationDate":"2025-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144119965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Elena Lammila-Escalera, Geva Greenfield, Reham Aldakhil, Hei Ming Mak, Himani Sehgal, Ana Luisa Neves, Mark J Harmon, Azeem Majeed, Benedict Hayhoe
{"title":"Safety and Efficacy of Digital Check-in and Triage Kiosks in Emergency Departments: Systematic Review.","authors":"Elena Lammila-Escalera, Geva Greenfield, Reham Aldakhil, Hei Ming Mak, Himani Sehgal, Ana Luisa Neves, Mark J Harmon, Azeem Majeed, Benedict Hayhoe","doi":"10.2196/69528","DOIUrl":"10.2196/69528","url":null,"abstract":"<p><strong>Background: </strong>Emergency departments (EDs) globally face unprecedented pressures due to aging populations, multimorbidity, and staff shortages. In response, health systems are adopting technological solutions such as digital kiosks to reduce wait times, improve patient flow, and alleviate overcrowding. These tools can automate patient check-in and assist with triage, helping to reduce variability in assessments and identify individuals with urgent needs sooner. However, it remains unclear whether the potential time-saving benefits of these innovations translate into improved patient outcomes and safety.</p><p><strong>Objective: </strong>This systematic review aims to summarize the safety and efficacy impacts of digital check-in and triage kiosks compared with traditional nurse-led triage methods in EDs.</p><p><strong>Methods: </strong>Comprehensive searches were conducted in MEDLINE, EMBASE, and Web of Science. A narrative synthesis was carried out to evaluate the impact on patient safety (eg, agreement rate, accuracy, sensitivity, and specificity) and efficacy (eg, operational efficiency and patient flow). The quality of the studies was assessed using the National Heart, Lung, and Blood Institute quality assessment tools.</p><p><strong>Results: </strong>A total of 5 studies, comprising 47,778 patients and 310,249 ED visits, were included. Out of these 5 studies, 3 focused on self-check-in kiosks, one on self-triage kiosks, and another on technology combining both. Among 5 studies, 2 evaluated safety, reporting high sensitivity for predicting high-acuity outcomes (up to 88.5%) and low under-triage rates (8.0%-10.1%) but poor agreement with nurse-assigned triage scores (27.0%-30.7%). Specificity for low-acuity cases was variable, with one study reporting as low as 27.2% accuracy. Of the 5 studies, 4 examined efficacy, reporting high over-triage rates (59.2%-65.0%) and mixed impacts on waiting times. While 2 studies found significant reductions in time-to-physician and time-to-triage, others reported no significant improvements following adjustments. Kiosks demonstrated high usability, with one study reporting 97% uptake among ED attendees.</p><p><strong>Conclusions: </strong>Evidence on the safety and efficacy of digital check-in and triage kiosks remains sparse. Based on the limited number of studies available, digital kiosks appear effective in accurately identifying high-acuity patients; however, their impact on operational efficiency measures is unclear. High over-triage rates and poor concordance with nurse-assigned triage scores may limit their practical application in busy ED settings. Further research is required to evaluate long-term outcomes, implementation across diverse health care contexts, and integration into ED workflows to better understand how digital kiosks can safely and effectively help address the growing demand for EDs.</p><p><strong>Trial registration: </strong>PROSPERO CRD42024481506; https://www.crd.york.","PeriodicalId":16337,"journal":{"name":"Journal of Medical Internet Research","volume":"27 ","pages":"e69528"},"PeriodicalIF":5.8,"publicationDate":"2025-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144111052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Christian Schaaff, Manvir Bains, Sophie Davis, Trinity Amalraj, Abby Frank, Marika Waselewski, Tammy Chang, Andrew Wong
{"title":"Youth Perspectives on Generative AI and Its Use in Health Care.","authors":"Christian Schaaff, Manvir Bains, Sophie Davis, Trinity Amalraj, Abby Frank, Marika Waselewski, Tammy Chang, Andrew Wong","doi":"10.2196/72197","DOIUrl":"10.2196/72197","url":null,"abstract":"<p><strong>Unlabelled: </strong>A nationwide survey of youth aged 14 to 24 years on generative artificial intelligence (GAI) found that many youths are wary about the use of GAI in health care, suggesting that health professionals should acknowledge concerns about AI health tools and address them with adolescent patients as they become more pervasive.</p>","PeriodicalId":16337,"journal":{"name":"Journal of Medical Internet Research","volume":"27 ","pages":"e72197"},"PeriodicalIF":5.8,"publicationDate":"2025-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12118938/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144119921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sirirat Anutrakulchai, Sajja Tatiyanupanwong, Sarassawan Kananuraks, Eakalak Lukkanalikitkul, Sawinee Kongpetch, Wijittra Chotmongkol, Michael G Morley, Wilaiphorn Thinkhamrop, Bandit Thinkhamrop, Chadarat Kleebchaiyaphum, Krongsin Khianchanach, Theenatchar Chunghom, Katharine E Morley
{"title":"Effect of the Chronic Kidney Disease-Peritoneal Dialysis (CKD-PD) App on Improvement of Overhydration Treatment in Patients on Peritoneal Dialysis: Randomized Controlled Trial.","authors":"Sirirat Anutrakulchai, Sajja Tatiyanupanwong, Sarassawan Kananuraks, Eakalak Lukkanalikitkul, Sawinee Kongpetch, Wijittra Chotmongkol, Michael G Morley, Wilaiphorn Thinkhamrop, Bandit Thinkhamrop, Chadarat Kleebchaiyaphum, Krongsin Khianchanach, Theenatchar Chunghom, Katharine E Morley","doi":"10.2196/70641","DOIUrl":"https://doi.org/10.2196/70641","url":null,"abstract":"<p><strong>Background: </strong>Overhydration is associated with increased morbidity and mortality in patients on peritoneal dialysis (PD). Early detection of overhydration is possible by monitoring hydration metrics, but the critical gap for treatment is obtaining timely and actionable data.</p><p><strong>Objective: </strong>This study compares the detection of overhydration and clinical outcomes in patients on PD using the Chronic Kidney Disease-Peritoneal Dialysis (CKD-PD) smartphone app with standard monitoring and management.</p><p><strong>Methods: </strong>An open-label randomized controlled trial was conducted at 3 hospitals in northeast Thailand. Enrolled participants from PD clinics were randomized into 2 equal groups: CKD-PD (App users) and usual management (No-App). Participants or their caregivers in the App group recorded hydration metrics in the CKD-PD app, which were uploaded to a central database monitored by nephrology staff. The No-App group used a handwritten logbook. Both groups had bimonthly clinic visits. The primary outcome was the incidence rate ratio (IRR) for clinical interventions for overhydration. Secondary outcomes included hospitalizations, technique failure, and death.</p><p><strong>Results: </strong>A total of 208 participants were randomized into App (N=103) and No-App (N=105) groups with the median follow-up time of 11.2 months. Hydration metric upload compliance in the App group was 85.7% (IQR 71.4-95.6). The IRR of overall interventions for overhydration was 2.51 times higher in the App group (95% CI 2.18-2.89; P<.001). Types of clinical interventions for overhydration differed between groups with dietary change and prescription of antihypertensive drugs more frequent in App users and diuretics and change of dialysis prescription more frequent in the No-App group. Hospitalizations were significantly higher in the No-App group due to any cause (adjusted IRR 1.58) and volume overload (adjusted IRR 4.07). There was no significant difference in survival analysis and technique failure between the 2 groups.</p><p><strong>Conclusions: </strong>Use of the CKD-PD app improved early detection of overhydration and early treatment interventions, resulting in fewer all-cause and volume overload hospitalizations.</p><p><strong>Trial registration: </strong>ClinicalTrials.gov NCT04797195; https://clinicaltrials.gov/study/NCT04797195.</p>","PeriodicalId":16337,"journal":{"name":"Journal of Medical Internet Research","volume":"27 ","pages":"e70641"},"PeriodicalIF":5.8,"publicationDate":"2025-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144119955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Identifying Disinformation on the Extended Impacts of COVID-19: Methodological Investigation Using a Fuzzy Ranking Ensemble of Natural Language Processing Models.","authors":"Jian-An Chen, Wu-Chun Chung, Che-Lun Hung, Chun-Ying Wu","doi":"10.2196/73601","DOIUrl":"https://doi.org/10.2196/73601","url":null,"abstract":"<p><strong>Background: </strong>During the COVID-19 pandemic, the continuous spread of misinformation on the internet posed an ongoing threat to public trust and understanding of epidemic prevention policies. Although the pandemic is now under control, information regarding the risks of long-term COVID-19 effects and reinfection still needs to be integrated into COVID-19 policies.</p><p><strong>Objective: </strong>This study aims to develop a robust and generalizable deep learning framework for detecting misinformation related to the prolonged impacts of COVID-19 by integrating pretrained language models (PLMs) with an innovative fuzzy rank-based ensemble approach.</p><p><strong>Methods: </strong>A comprehensive dataset comprising 566 genuine and 2361 fake samples was curated from reliable open sources and processed using advanced techniques. The dataset was randomly split using the scikit-learn package to facilitate both training and evaluation. Deep learning models were trained for 20 epochs on a Tesla T4 for hierarchical attention networks (HANs) and an RTX A5000 (for the other models). To enhance performance, we implemented an ensemble learning strategy that incorporated a reparameterized Gompertz function, which assigned fuzzy ranks based on each model's prediction confidence for each test case. This method effectively fused outputs from state-of-the-art PLMs such as robustly optimized bidirectional encoder representations from transformers pretraining approach (RoBERTa), decoding-enhanced bidirectional encoder representations from transformers with disentangled attention (DeBERTa), and XLNet.</p><p><strong>Results: </strong>After training on the dataset, various classification methods were evaluated on the test set, including the fuzzy rank-based method and state-of-the-art large language models. Experimental results reveal that language models, particularly XLNet, outperform traditional approaches that combine term frequency-inverse document frequency features with support vector machine or utilize deep models like HAN. The evaluation metrics-including accuracy, precision, recall, F<sub>1</sub>-score, and area under the curve (AUC)-indicated a clear performance advantage for models that had a larger number of parameters. However, this study also highlights that model architecture, training procedures, and optimization techniques are critical determinants of classification effectiveness. XLNet's permutation language modeling approach enhances bidirectional context understanding, allowing it to surpass even larger models in the bidirectional encoder representations from transformers (BERT) series despite having relatively fewer parameters. Notably, the fuzzy rank-based ensemble method, which combines multiple language models, achieved impressive results on the test set, with an accuracy of 93.52%, a precision of 94.65%, an F<sub>1</sub>-score of 96.03%, and an AUC of 97.15%.</p><p><strong>Conclusions: </strong>The fusion of ensemble learning","PeriodicalId":16337,"journal":{"name":"Journal of Medical Internet Research","volume":"27 ","pages":"e73601"},"PeriodicalIF":5.8,"publicationDate":"2025-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144119963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Severine Soltani, Varun K Viswanath, Patrick Kasl, Wendy Hartogensis, Stephan Dilchert, Frederick M Hecht, Ashley E Mason, Benjamin L Smarr
{"title":"Testing the Impact of Intensive, Longitudinal Sampling on Assessments of Statistical Power and Effect Size Within a Heterogeneous Human Population: Natural Experiment Using Change in Heart Rate on Weekends as a Surrogate Intervention.","authors":"Severine Soltani, Varun K Viswanath, Patrick Kasl, Wendy Hartogensis, Stephan Dilchert, Frederick M Hecht, Ashley E Mason, Benjamin L Smarr","doi":"10.2196/60284","DOIUrl":"https://doi.org/10.2196/60284","url":null,"abstract":"<p><strong>Background: </strong>The recent emergence of wearable devices has made feasible the passive gathering of intensive, longitudinal data from large groups of individuals. This form of data is effective at capturing physiological changes between participants (interindividual variability) and changes within participants over time (intraindividual variability). The emergence of longitudinal datasets provides an opportunity to quantify the contribution of such longitudinal data to the control of these sources of variability for applications such as responder analysis, where traditional, sparser sampling methods may hinder the categorization of individuals into these phenotypes.</p><p><strong>Objective: </strong>This study aimed to quantify the gains made in statistical power and effect size among statistical comparisons when controlling for interindividual variability and intraindividual variability compared with controlling for neither.</p><p><strong>Methods: </strong>Here, we test the gains in statistical power from controlling for interindividual and intraindividual variability of resting heart rate, collected in 2020 for over 40,000 individuals as part of the TemPredict study on COVID-19 detection. We compared heart rate on weekends with that on weekdays because weekends predictably change the behavior of most individuals, though not all, and in different ways. Weekends also repeat consistently, making their effects on heart rate feasible to assess with confidence over large populations. We therefore used weekends as a model system to test the impact of different statistical controls on detecting a recurring event with a clear ground truth. We randomly and iteratively sampled heart rate from weekday and weekend nights, controlling for interindividual variability, intraindividual variability, both, or neither.</p><p><strong>Results: </strong>Between-participant variability appeared to be a greater source of structured variability than within-participant fluctuations. Accounting for interindividual variability through within-individual sampling required 40× fewer pairs of samples to achieve statistical significance with 4× to 5× greater effect size at significance. Within-individual sampling revealed differential effects of weekends on heart rate, which were obscured by aggregated sampling methods.</p><p><strong>Conclusions: </strong>This work highlights the leverage provided by longitudinal, within-individual sampling to increase statistical power among populations with heterogeneous effects.</p>","PeriodicalId":16337,"journal":{"name":"Journal of Medical Internet Research","volume":"27 ","pages":"e60284"},"PeriodicalIF":5.8,"publicationDate":"2025-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144119968","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Converging Representations of Attention-Deficit/Hyperactivity Disorder and Autism on Social Media: Linguistic and Topic Analysis of Trends in Reddit Data.","authors":"Jemima Kang, Nick Haslam, Mike Conway","doi":"10.2196/70914","DOIUrl":"https://doi.org/10.2196/70914","url":null,"abstract":"<p><strong>Background: </strong>Social media platforms have witnessed a substantial increase in mental health-related discussions, with particular attention focused on attention-deficit/hyperactivity disorder (ADHD) and autism. This heightened interest coincides with growing neurodiversity advocacy. The impact of these changes in the conceptualization of ADHD and autism, and the relationship between the 2 conditions, remains underexplored.</p><p><strong>Objective: </strong>We aim to characterize and understand how the relationship between ADHD and autism has evolved in public discourse over the past decade and explore reasons for their growing alignment.</p><p><strong>Methods: </strong>Using Reddit data from 2012 to 2022, we investigated the frequency of ADHD mentions in r/autism and autism mentions in r/ADHD, compared to commonly mentioned conditions. We analyzed user overlap between the 2 subreddits to track cross-subreddit discussions. Following this, we assessed changes in semantic similarity between ADHD and autism using Word2Vec embedding models, alongside commonly mentioned conditions. Finally, thematic changes in subreddit discussions were explored using BERT-based topic modeling across 2 time periods.</p><p><strong>Results: </strong>Our analysis revealed that ADHD and autism have become progressively more associated across these multiple dimensions. In r/ADHD, there was a steep rise in the proportion of posts mentioning \"autism\" in 2021, overtaking \"bipolar\" and \"OCD\" (obsessive-compulsive disorder) to become the most frequently mentioned condition. Similarly, ADHD mentions increased steadily in r/autism, while the frequency of posts mentioning \"OCD,\" \"PTSD\" (posttraumatic stress disorder), and \"bipolar\" remained stable and low. User overlap between these subreddits grew substantially beginning in 2020. Semantic analysis showed ADHD and autism becoming more closely related from 2019 onward, compared to other conditions. Last, topic modeling indicated growing thematic convergence in ADHD- and autism-related discussions, which reflected an increasing shared emphasis on the experiences of adults with ADHD and autism, challenges in accessing diagnostic assessments, and interpersonal difficulties.</p><p><strong>Conclusions: </strong>Our study clarifies how discourse around these 2 conditions has converged during a period when they have both attracted rising public attention. These findings contribute to wider discussions about the impacts of rising public interest in mental health concepts. They illustrate that public understandings of relationships between conditions are dynamic and changing in ways that diverge from diagnostic frameworks. Future research should continue investigating changing mental health conceptualizations on social media, as these dynamics are becoming increasingly important for the future of psychiatric practice.</p>","PeriodicalId":16337,"journal":{"name":"Journal of Medical Internet Research","volume":"27 ","pages":"e70914"},"PeriodicalIF":5.8,"publicationDate":"2025-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144110960","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kaitlin Hanss, Karthik V Sarma, Anne L Glowinski, Andrew Krystal, Ramotse Saunders, Andrew Halls, Sasha Gorrell, Erin Reilly
{"title":"Assessing the Accuracy and Reliability of Large Language Models in Psychiatry Using Standardized Multiple-Choice Questions: Cross-Sectional Study.","authors":"Kaitlin Hanss, Karthik V Sarma, Anne L Glowinski, Andrew Krystal, Ramotse Saunders, Andrew Halls, Sasha Gorrell, Erin Reilly","doi":"10.2196/69910","DOIUrl":"10.2196/69910","url":null,"abstract":"<p><strong>Background: </strong>Large language models (LLMs), such as OpenAI's GPT-3.5, GPT-4, and GPT-4o, have garnered early and significant enthusiasm for their potential applications within mental health, ranging from documentation support to chat-bot therapy. Understanding the accuracy and reliability of the psychiatric \"knowledge\" stored within the parameters of these models and developing measures of confidence in their responses (ie, the likelihood that an LLM response is accurate) are crucial for the safe and effective integration of these tools into mental health settings.</p><p><strong>Objective: </strong>This study aimed to assess the accuracy, reliability, and predictors of accuracy of GPT-3.5 (175 billion parameters), GPT-4 (approximately 1.8 trillion parameters), and GPT-4o (an optimized version of GPT-4 with unknown parameters) with standardized psychiatry multiple-choice questions (MCQs).</p><p><strong>Methods: </strong>A cross-sectional study was conducted where 3 commonly available, commercial LLMs (GPT-3.5, GPT-4, and GPT-4o) were tested for their ability to provide answers to single-answer MCQs (N=150) extracted from the Psychiatry Test Preparation and Review Manual. Each model generated answers to every MCQ 10 times. We evaluated the accuracy and reliability of the answers and sought predictors of answer accuracy. Our primary outcome was the proportion of questions answered correctly by each LLM (accuracy). Secondary measures were (1) response consistency to MCQs across 10 trials (reliability), (2) the correlation between MCQ answer accuracy and response consistency, and (3) the correlation between MCQ answer accuracy and model self-reported confidence.</p><p><strong>Results: </strong>On the first attempt, GPT-3.5 answered 58.0% (87/150) of MCQs correctly, while GPT-4 and GPT-4o answered 84.0% (126/150) and 87.3% (131/150) correctly, respectively. GPT-4 and GPT-4o showed no difference in performance (P=.51), but they significantly outperformed GPT-3.5 (P<.001). GPT-3.5 exhibited less response consistency on average compared to the other models (P<.001). MCQ response consistency was positively correlated with MCQ accuracy across all models (r=0.340, 0.682, and 0.590 for GPT-3.5, GPT-4, and GPT-4o, respectively; all P<.001), whereas model self-reported confidence showed no correlation with accuracy in the models, except for GPT-3.5, where self-reported confidence was weakly inversely correlated with accuracy (P<.001).</p><p><strong>Conclusions: </strong>To our knowledge, this is the first comprehensive evaluation of the general psychiatric knowledge encoded in commercially available LLMs and the first study to assess their reliability and identify predictors of response accuracy within medical domains. The findings suggest that GPT-4 and GPT-4o encode accurate and reliable general psychiatric knowledge and that methods, such as repeated prompting, may provide a measure of LLM response confidence. This work supports the potenti","PeriodicalId":16337,"journal":{"name":"Journal of Medical Internet Research","volume":"27 ","pages":"e69910"},"PeriodicalIF":5.8,"publicationDate":"2025-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144110996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}