Giorgia Lorenzini, Laura Arbelaez Ossa, Stephen Milford, Bernice Simone Elger, David Martin Shaw, Eva De Clercq
{"title":"The \"Magical Theory\" of AI in Medicine: Thematic Narrative Analysis.","authors":"Giorgia Lorenzini, Laura Arbelaez Ossa, Stephen Milford, Bernice Simone Elger, David Martin Shaw, Eva De Clercq","doi":"10.2196/49795","DOIUrl":"10.2196/49795","url":null,"abstract":"<p><strong>Background: </strong>The discourse surrounding medical artificial intelligence (AI) often focuses on narratives that either hype the technology's potential or predict dystopian futures. AI narratives have a significant influence on the direction of research, funding, and public opinion and thus shape the future of medicine.</p><p><strong>Objective: </strong>The paper aims to offer critical reflections on AI narratives, with a specific focus on medical AI, and to raise awareness as to how people working with medical AI talk about AI and discharge their \"narrative responsibility.\"</p><p><strong>Methods: </strong>Qualitative semistructured interviews were conducted with 41 participants from different disciplines who were exposed to medical AI in their profession. The research represents a secondary analysis of data using a thematic narrative approach. The analysis resulted in 2 main themes, each with 2 other subthemes.</p><p><strong>Results: </strong>Stories about the AI-physician interaction depicted either a competitive or collaborative relationship. Some participants argued that AI might replace physicians, as it performs better than physicians. However, others believed that physicians should not be replaced and that AI should rather assist and support physicians. The idea of excessive technological deferral and automation bias was discussed, highlighting the risk of \"losing\" decisional power. The possibility that AI could relieve physicians from burnout and allow them to spend more time with patients was also considered. Finally, a few participants reported an extremely optimistic account of medical AI, while the majority criticized this type of story. The latter lamented the existence of a \"magical theory\" of medical AI, identified with techno-solutionist positions.</p><p><strong>Conclusions: </strong>Most of the participants reported a nuanced view of technology, recognizing both its benefits and challenges and avoiding polarized narratives. However, some participants did contribute to the hype surrounding medical AI, comparing it to human capabilities and depicting it as superior. Overall, the majority agreed that medical AI should assist rather than replace clinicians. The study concludes that a balanced narrative (that focuses on the technology's present capabilities and limitations) is necessary to fully realize the potential of medical AI while avoiding unrealistic expectations and hype.</p>","PeriodicalId":73551,"journal":{"name":"JMIR AI","volume":"3 ","pages":"e49795"},"PeriodicalIF":0.0,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11369530/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142001500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Aidin Spina, Saman Andalib, Daniel Flores, Rishi Vermani, Faris F Halaseh, Ariana M Nelson
{"title":"Evaluation of Generative Language Models in Personalizing Medical Information: Instrument Validation Study.","authors":"Aidin Spina, Saman Andalib, Daniel Flores, Rishi Vermani, Faris F Halaseh, Ariana M Nelson","doi":"10.2196/54371","DOIUrl":"10.2196/54371","url":null,"abstract":"<p><strong>Background: </strong>Although uncertainties exist regarding implementation, artificial intelligence-driven generative language models (GLMs) have enormous potential in medicine. Deployment of GLMs could improve patient comprehension of clinical texts and improve low health literacy.</p><p><strong>Objective: </strong>The goal of this study is to evaluate the potential of ChatGPT-3.5 and GPT-4 to tailor the complexity of medical information to patient-specific input education level, which is crucial if it is to serve as a tool in addressing low health literacy.</p><p><strong>Methods: </strong>Input templates related to 2 prevalent chronic diseases-type II diabetes and hypertension-were designed. Each clinical vignette was adjusted for hypothetical patient education levels to evaluate output personalization. To assess the success of a GLM (GPT-3.5 and GPT-4) in tailoring output writing, the readability of pre- and posttransformation outputs were quantified using the Flesch reading ease score (FKRE) and the Flesch-Kincaid grade level (FKGL).</p><p><strong>Results: </strong>Responses (n=80) were generated using GPT-3.5 and GPT-4 across 2 clinical vignettes. For GPT-3.5, FKRE means were 57.75 (SD 4.75), 51.28 (SD 5.14), 32.28 (SD 4.52), and 28.31 (SD 5.22) for 6th grade, 8th grade, high school, and bachelor's, respectively; FKGL mean scores were 9.08 (SD 0.90), 10.27 (SD 1.06), 13.4 (SD 0.80), and 13.74 (SD 1.18). GPT-3.5 only aligned with the prespecified education levels at the bachelor's degree. Conversely, GPT-4's FKRE mean scores were 74.54 (SD 2.6), 71.25 (SD 4.96), 47.61 (SD 6.13), and 13.71 (SD 5.77), with FKGL mean scores of 6.3 (SD 0.73), 6.7 (SD 1.11), 11.09 (SD 1.26), and 17.03 (SD 1.11) for the same respective education levels. GPT-4 met the target readability for all groups except the 6th-grade FKRE average. Both GLMs produced outputs with statistically significant differences (P<.001; 8th grade P<.001; high school P<.001; bachelors P=.003; FKGL: 6th grade P=.001; 8th grade P<.001; high school P<.001; bachelors P<.001) between mean FKRE and FKGL across input education levels.</p><p><strong>Conclusions: </strong>GLMs can change the structure and readability of medical text outputs according to input-specified education. However, GLMs categorize input education designation into 3 broad tiers of output readability: easy (6th and 8th grade), medium (high school), and difficult (bachelor's degree). This is the first result to suggest that there are broader boundaries in the success of GLMs in output text simplification. Future research must establish how GLMs can reliably personalize medical texts to prespecified education levels to enable a broader impact on health care literacy.</p>","PeriodicalId":73551,"journal":{"name":"JMIR AI","volume":"3 ","pages":"e54371"},"PeriodicalIF":0.0,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11350306/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141977364","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enhancing Clinical Relevance of Pretrained Language Models Through Integration of External Knowledge: Case Study on Cardiovascular Diagnosis From Electronic Health Records.","authors":"Qiuhao Lu, Andrew Wen, Thien Nguyen, Hongfang Liu","doi":"10.2196/56932","DOIUrl":"10.2196/56932","url":null,"abstract":"<p><strong>Background: </strong>Despite their growing use in health care, pretrained language models (PLMs) often lack clinical relevance due to insufficient domain expertise and poor interpretability. A key strategy to overcome these challenges is integrating external knowledge into PLMs, enhancing their adaptability and clinical usefulness. Current biomedical knowledge graphs like UMLS (Unified Medical Language System), SNOMED CT (Systematized Medical Nomenclature for Medicine-Clinical Terminology), and HPO (Human Phenotype Ontology), while comprehensive, fail to effectively connect general biomedical knowledge with physician insights. There is an equally important need for a model that integrates diverse knowledge in a way that is both unified and compartmentalized. This approach not only addresses the heterogeneous nature of domain knowledge but also recognizes the unique data and knowledge repositories of individual health care institutions, necessitating careful and respectful management of proprietary information.</p><p><strong>Objective: </strong>This study aimed to enhance the clinical relevance and interpretability of PLMs by integrating external knowledge in a manner that respects the diversity and proprietary nature of health care data. We hypothesize that domain knowledge, when captured and distributed as stand-alone modules, can be effectively reintegrated into PLMs to significantly improve their adaptability and utility in clinical settings.</p><p><strong>Methods: </strong>We demonstrate that through adapters, small and lightweight neural networks that enable the integration of extra information without full model fine-tuning, we can inject diverse sources of external domain knowledge into language models and improve the overall performance with an increased level of interpretability. As a practical application of this methodology, we introduce a novel task, structured as a case study, that endeavors to capture physician knowledge in assigning cardiovascular diagnoses from clinical narratives, where we extract diagnosis-comment pairs from electronic health records (EHRs) and cast the problem as text classification.</p><p><strong>Results: </strong>The study demonstrates that integrating domain knowledge into PLMs significantly improves their performance. While improvements with ClinicalBERT are more modest, likely due to its pretraining on clinical texts, BERT (bidirectional encoder representations from transformer) equipped with knowledge adapters surprisingly matches or exceeds ClinicalBERT in several metrics. This underscores the effectiveness of knowledge adapters and highlights their potential in settings with strict data privacy constraints. This approach also increases the level of interpretability of these models in a clinical context, which enhances our ability to precisely identify and apply the most relevant domain knowledge for specific tasks, thereby optimizing the model's performance and tailoring it to meet specific c","PeriodicalId":73551,"journal":{"name":"JMIR AI","volume":"3 ","pages":"e56932"},"PeriodicalIF":0.0,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11336492/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141894950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Maximo R Prescott, Samantha Yeager, Lillian Ham, Carlos D Rivera Saldana, Vanessa Serrano, Joey Narez, Dafna Paltin, Jorge Delgado, David J Moore, Jessica Montoya
{"title":"Comparing the Efficacy and Efficiency of Human and Generative AI: Qualitative Thematic Analyses.","authors":"Maximo R Prescott, Samantha Yeager, Lillian Ham, Carlos D Rivera Saldana, Vanessa Serrano, Joey Narez, Dafna Paltin, Jorge Delgado, David J Moore, Jessica Montoya","doi":"10.2196/54482","DOIUrl":"10.2196/54482","url":null,"abstract":"<p><strong>Background: </strong>Qualitative methods are incredibly beneficial to the dissemination and implementation of new digital health interventions; however, these methods can be time intensive and slow down dissemination when timely knowledge from the data sources is needed in ever-changing health systems. Recent advancements in generative artificial intelligence (GenAI) and their underlying large language models (LLMs) may provide a promising opportunity to expedite the qualitative analysis of textual data, but their efficacy and reliability remain unknown.</p><p><strong>Objective: </strong>The primary objectives of our study were to evaluate the consistency in themes, reliability of coding, and time needed for inductive and deductive thematic analyses between GenAI (ie, ChatGPT and Bard) and human coders.</p><p><strong>Methods: </strong>The qualitative data for this study consisted of 40 brief SMS text message reminder prompts used in a digital health intervention for promoting antiretroviral medication adherence among people with HIV who use methamphetamine. Inductive and deductive thematic analyses of these SMS text messages were conducted by 2 independent teams of human coders. An independent human analyst conducted analyses following both approaches using ChatGPT and Bard. The consistency in themes (or the extent to which the themes were the same) and reliability (or agreement in coding of themes) between methods were compared.</p><p><strong>Results: </strong>The themes generated by GenAI (both ChatGPT and Bard) were consistent with 71% (5/7) of the themes identified by human analysts following inductive thematic analysis. The consistency in themes was lower between humans and GenAI following a deductive thematic analysis procedure (ChatGPT: 6/12, 50%; Bard: 7/12, 58%). The percentage agreement (or intercoder reliability) for these congruent themes between human coders and GenAI ranged from fair to moderate (ChatGPT, inductive: 31/66, 47%; ChatGPT, deductive: 22/59, 37%; Bard, inductive: 20/54, 37%; Bard, deductive: 21/58, 36%). In general, ChatGPT and Bard performed similarly to each other across both types of qualitative analyses in terms of consistency of themes (inductive: 6/6, 100%; deductive: 5/6, 83%) and reliability of coding (inductive: 23/62, 37%; deductive: 22/47, 47%). On average, GenAI required significantly less overall time than human coders when conducting qualitative analysis (20, SD 3.5 min vs 567, SD 106.5 min).</p><p><strong>Conclusions: </strong>The promising consistency in the themes generated by human coders and GenAI suggests that these technologies hold promise in reducing the resource intensiveness of qualitative thematic analysis; however, the relatively lower reliability in coding between them suggests that hybrid approaches are necessary. Human coders appeared to be better than GenAI at identifying nuanced and interpretative themes. Future studies should consider how these powerful technologies can be bes","PeriodicalId":73551,"journal":{"name":"JMIR AI","volume":"3 ","pages":"e54482"},"PeriodicalIF":0.0,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11329846/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141879884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Predicting Workers' Stress: Application of a High-Performance Algorithm Using Working-Style Characteristics.","authors":"Hiroki Iwamoto, Saki Nakano, Ryotaro Tajima, Ryo Kiguchi, Yuki Yoshida, Yoshitake Kitanishi, Yasunori Aoki","doi":"10.2196/55840","DOIUrl":"10.2196/55840","url":null,"abstract":"<p><strong>Background: </strong>Work characteristics, such as teleworking rate, have been studied in relation to stress. However, the use of work-related data to improve a high-performance stress prediction model that suits an individual's lifestyle has not been evaluated.</p><p><strong>Objective: </strong>This study aims to develop a novel, high-performance algorithm to predict an employee's stress among a group of employees with similar working characteristics.</p><p><strong>Methods: </strong>This prospective observational study evaluated participants' responses to web‑based questionnaires, including attendance records and data collected using a wearable device. Data spanning 12 weeks (between January 17, 2022, and April 10, 2022) were collected from 194 Shionogi Group employees. Participants wore the Fitbit Charge 4 wearable device, which collected data on daily sleep, activity, and heart rate. Daily work shift data included details of working hours. Weekly questionnaire responses included the K6 questionnaire for depression/anxiety, a behavioral questionnaire, and the number of days lunch was missed. The proposed prediction model used a neighborhood cluster (N=20) with working-style characteristics similar to those of the prediction target person. Data from the previous week predicted stress levels the following week. Three models were compared by selecting appropriate training data: (1) single model, (2) proposed method 1, and (3) proposed method 2. Shapley Additive Explanations (SHAP) were calculated for the top 10 extracted features from the Extreme Gradient Boosting (XGBoost) model to evaluate the amount and contribution direction categorized by teleworking rates (mean): low: <0.2 (more than 4 days/week in office), middle: 0.2 to <0.6 (2 to 4 days/week in office), and high: ≥0.6 (less than 2 days/week in office).</p><p><strong>Results: </strong>Data from 190 participants were used, with a teleworking rate ranging from 0% to 79%. The area under the curve (AUC) of the proposed method 2 was 0.84 (true positive vs false positive: 0.77 vs 0.26). Among participants with low teleworking rates, most features extracted were related to sleep, followed by activity and work. Among participants with high teleworking rates, most features were related to activity, followed by sleep and work. SHAP analysis showed that for participants with high teleworking rates, skipping lunch, working more/less than scheduled, higher fluctuations in heart rate, and lower mean sleep duration contributed to stress. In participants with low teleworking rates, coming too early or late to work (before/after 9 AM), a higher/lower than mean heart rate, lower fluctuations in heart rate, and burning more/fewer calories than normal contributed to stress.</p><p><strong>Conclusions: </strong>Forming a neighborhood cluster with similar working styles based on teleworking rates and using it as training data improved the prediction performance. The validity of the neighborhood cluste","PeriodicalId":73551,"journal":{"name":"JMIR AI","volume":"3 ","pages":"e55840"},"PeriodicalIF":0.0,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11329844/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141876895","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Regulatory Frameworks for AI-Enabled Medical Device Software in China: Comparative Analysis and Review of Implications for Global Manufacturer.","authors":"Yu Han, Aaron Ceross, Jeroen Bergmann","doi":"10.2196/46871","DOIUrl":"10.2196/46871","url":null,"abstract":"<p><p>The China State Council released the new generation artificial intelligence (AI) development plan, outlining China's ambitious aspiration to assume global leadership in AI by the year 2030. This initiative underscores the extensive applicability of AI across diverse domains, including manufacturing, law, and medicine. With China establishing itself as a major producer and consumer of medical devices, there has been a notable increase in software registrations. This study aims to study the proliferation of health care-related software development within China. This work presents an overview of the Chinese regulatory framework for medical device software. The analysis covers both software as a medical device and software in a medical device. A comparative approach is employed to examine the regulations governing medical devices with AI and machine learning in China, the United States, and Europe. The study highlights the significant proliferation of health care-related software development within China, which has led to an increased demand for comprehensive regulatory guidance, particularly for international manufacturers. The comparative analysis reveals distinct regulatory frameworks and requirements across the three regions. This paper provides a useful outline of the current state of regulations for medical software in China and identifies the regulatory challenges posed by the rapid advancements in AI and machine learning technologies. Understanding these challenges is crucial for international manufacturers and stakeholders aiming to navigate the complex regulatory landscape.</p>","PeriodicalId":73551,"journal":{"name":"JMIR AI","volume":"3 ","pages":"e46871"},"PeriodicalIF":0.0,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11319888/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141790220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kyeryoung Lee, Zongzhi Liu, Yun Mai, Tomi Jun, Meng Ma, Tongyu Wang, Lei Ai, Ediz Calay, William Oh, Gustavo Stolovitzky, Eric Schadt, Xiaoyan Wang
{"title":"Optimizing Clinical Trial Eligibility Design Using Natural Language Processing Models and Real-World Data: Algorithm Development and Validation.","authors":"Kyeryoung Lee, Zongzhi Liu, Yun Mai, Tomi Jun, Meng Ma, Tongyu Wang, Lei Ai, Ediz Calay, William Oh, Gustavo Stolovitzky, Eric Schadt, Xiaoyan Wang","doi":"10.2196/50800","DOIUrl":"10.2196/50800","url":null,"abstract":"<p><strong>Background: </strong>Clinical trials are vital for developing new therapies but can also delay drug development. Efficient trial data management, optimized trial protocol, and accurate patient identification are critical for reducing trial timelines. Natural language processing (NLP) has the potential to achieve these objectives.</p><p><strong>Objective: </strong>This study aims to assess the feasibility of using data-driven approaches to optimize clinical trial protocol design and identify eligible patients. This involves creating a comprehensive eligibility criteria knowledge base integrated within electronic health records using deep learning-based NLP techniques.</p><p><strong>Methods: </strong>We obtained data of 3281 industry-sponsored phase 2 or 3 interventional clinical trials recruiting patients with non-small cell lung cancer, prostate cancer, breast cancer, multiple myeloma, ulcerative colitis, and Crohn disease from ClinicalTrials.gov, spanning the period between 2013 and 2020. A customized bidirectional long short-term memory- and conditional random field-based NLP pipeline was used to extract all eligibility criteria attributes and convert hypernym concepts into computable hyponyms along with their corresponding values. To illustrate the simulation of clinical trial design for optimization purposes, we selected a subset of patients with non-small cell lung cancer (n=2775), curated from the Mount Sinai Health System, as a pilot study.</p><p><strong>Results: </strong>We manually annotated the clinical trial eligibility corpus (485/3281, 14.78% trials) and constructed an eligibility criteria-specific ontology. Our customized NLP pipeline, developed based on the eligibility criteria-specific ontology that we created through manual annotation, achieved high precision (0.91, range 0.67-1.00) and recall (0.79, range 0.50-1) scores, as well as a high F<sub>1</sub>-score (0.83, range 0.67-1), enabling the efficient extraction of granular criteria entities and relevant attributes from 3281 clinical trials. A standardized eligibility criteria knowledge base, compatible with electronic health records, was developed by transforming hypernym concepts into machine-interpretable hyponyms along with their corresponding values. In addition, an interface prototype demonstrated the practicality of leveraging real-world data for optimizing clinical trial protocols and identifying eligible patients.</p><p><strong>Conclusions: </strong>Our customized NLP pipeline successfully generated a standardized eligibility criteria knowledge base by transforming hypernym criteria into machine-readable hyponyms along with their corresponding values. A prototype interface integrating real-world patient information allows us to assess the impact of each eligibility criterion on the number of patients eligible for the trial. Leveraging NLP and real-world data in a data-driven approach holds promise for streamlining the overall clinical trial process, optimizi","PeriodicalId":73551,"journal":{"name":"JMIR AI","volume":"3 ","pages":"e50800"},"PeriodicalIF":0.0,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11319878/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141790219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jingyi Huang, Peiqi Guo, Sheng Zhang, Mengmeng Ji, Ruopeng An
{"title":"Use of Deep Neural Networks to Predict Obesity With Short Audio Recordings: Development and Usability Study.","authors":"Jingyi Huang, Peiqi Guo, Sheng Zhang, Mengmeng Ji, Ruopeng An","doi":"10.2196/54885","DOIUrl":"10.2196/54885","url":null,"abstract":"<p><strong>Background: </strong>The escalating global prevalence of obesity has necessitated the exploration of novel diagnostic approaches. Recent scientific inquiries have indicated potential alterations in voice characteristics associated with obesity, suggesting the feasibility of using voice as a noninvasive biomarker for obesity detection.</p><p><strong>Objective: </strong>This study aims to use deep neural networks to predict obesity status through the analysis of short audio recordings, investigating the relationship between vocal characteristics and obesity.</p><p><strong>Methods: </strong>A pilot study was conducted with 696 participants, using self-reported BMI to classify individuals into obesity and nonobesity groups. Audio recordings of participants reading a short script were transformed into spectrograms and analyzed using an adapted YOLOv8 model (Ultralytics). The model performance was evaluated using accuracy, recall, precision, and F<sub>1</sub>-scores.</p><p><strong>Results: </strong>The adapted YOLOv8 model demonstrated a global accuracy of 0.70 and a macro F<sub>1</sub>-score of 0.65. It was more effective in identifying nonobesity (F<sub>1</sub>-score of 0.77) than obesity (F<sub>1</sub>-score of 0.53). This moderate level of accuracy highlights the potential and challenges in using vocal biomarkers for obesity detection.</p><p><strong>Conclusions: </strong>While the study shows promise in the field of voice-based medical diagnostics for obesity, it faces limitations such as reliance on self-reported BMI data and a small, homogenous sample size. These factors, coupled with variability in recording quality, necessitate further research with more robust methodologies and diverse samples to enhance the validity of this novel approach. The findings lay a foundational step for future investigations in using voice as a noninvasive biomarker for obesity detection.</p>","PeriodicalId":73551,"journal":{"name":"JMIR AI","volume":"3 ","pages":"e54885"},"PeriodicalIF":0.0,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11310637/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141763047","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enhancing Type 2 Diabetes Treatment Decisions With Interpretable Machine Learning Models for Predicting Hemoglobin A1c Changes: Machine Learning Model Development.","authors":"Hisashi Kurasawa, Kayo Waki, Tomohisa Seki, Akihiro Chiba, Akinori Fujino, Katsuyoshi Hayashi, Eri Nakahara, Tsuneyuki Haga, Takashi Noguchi, Kazuhiko Ohe","doi":"10.2196/56700","DOIUrl":"10.2196/56700","url":null,"abstract":"<p><strong>Background: </strong>Type 2 diabetes (T2D) is a significant global health challenge. Physicians need to assess whether future glycemic control will be poor on the current trajectory of usual care and usual-care treatment intensifications so that they can consider taking extra treatment measures to prevent poor outcomes. Predicting poor glycemic control from trends in hemoglobin A<sub>1c</sub> (HbA<sub>1c</sub>) levels is difficult due to the influence of seasonal fluctuations and other factors.</p><p><strong>Objective: </strong>We sought to develop a model that accurately predicts poor glycemic control among patients with T2D receiving usual care.</p><p><strong>Methods: </strong>Our machine learning model predicts poor glycemic control (HbA<sub>1c</sub>≥8%) using the transformer architecture, incorporating an attention mechanism to process irregularly spaced HbA<sub>1c</sub> time series and quantify temporal relationships of past HbA<sub>1c</sub> levels at each time point. We assessed the model using HbA<sub>1c</sub> levels from 7787 patients with T2D seeing specialist physicians at the University of Tokyo Hospital. The training data include instances of poor glycemic control occurring during usual care with usual-care treatment intensifications. We compared prediction accuracy, assessed with the area under the receiver operating characteristic curve, the area under the precision-recall curve, and the accuracy rate, to that of LightGBM.</p><p><strong>Results: </strong>The area under the receiver operating characteristic curve, the area under the precision-recall curve, and the accuracy rate (95% confidence limits) of the proposed model were 0.925 (95% CI 0.923-0.928), 0.864 (95% CI 0.852-0.875), and 0.864 (95% CI 0.86-0.869), respectively. The proposed model achieved high prediction accuracy comparable to or surpassing LightGBM's performance. The model prioritized the most recent HbA<sub>1c</sub> levels for predictions. Older HbA<sub>1c</sub> levels in patients with poor glycemic control were slightly more influential in predictions compared to patients with good glycemic control.</p><p><strong>Conclusions: </strong>The proposed model accurately predicts poor glycemic control for patients with T2D receiving usual care, including patients receiving usual-care treatment intensifications, allowing physicians to identify cases warranting extraordinary treatment intensifications. If used by a nonspecialist, the model's indication of likely future poor glycemic control may warrant a referral to a specialist. Future efforts could incorporate diverse and large-scale clinical data for improved accuracy.</p>","PeriodicalId":73551,"journal":{"name":"JMIR AI","volume":"3 ","pages":"e56700"},"PeriodicalIF":0.0,"publicationDate":"2024-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11294778/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141636021","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Annalisa Baronetto, Luisa Graf, Sarah Fischer, Markus F Neurath, Oliver Amft
{"title":"Multiscale Bowel Sound Event Spotting in Highly Imbalanced Wearable Monitoring Data: Algorithm Development and Validation Study.","authors":"Annalisa Baronetto, Luisa Graf, Sarah Fischer, Markus F Neurath, Oliver Amft","doi":"10.2196/51118","DOIUrl":"10.2196/51118","url":null,"abstract":"<p><strong>Background: </strong>Abdominal auscultation (i.e., listening to bowel sounds (BSs)) can be used to analyze digestion. An automated retrieval of BS would be beneficial to assess gastrointestinal disorders noninvasively.</p><p><strong>Objective: </strong>This study aims to develop a multiscale spotting model to detect BSs in continuous audio data from a wearable monitoring system.</p><p><strong>Methods: </strong>We designed a spotting model based on the Efficient-U-Net (EffUNet) architecture to analyze 10-second audio segments at a time and spot BSs with a temporal resolution of 25 ms. Evaluation data were collected across different digestive phases from 18 healthy participants and 9 patients with inflammatory bowel disease (IBD). Audio data were recorded in a daytime setting with a smart T-Shirt that embeds digital microphones. The data set was annotated by independent raters with substantial agreement (Cohen κ between 0.70 and 0.75), resulting in 136 hours of labeled data. In total, 11,482 BSs were analyzed, with a BS duration ranging between 18 ms and 6.3 seconds. The share of BSs in the data set (BS ratio) was 0.0089. We analyzed the performance depending on noise level, BS duration, and BS event rate. We also report spotting timing errors.</p><p><strong>Results: </strong>Leave-one-participant-out cross-validation of BS event spotting yielded a median F<sub>1</sub>-score of 0.73 for both healthy volunteers and patients with IBD. EffUNet detected BSs under different noise conditions with 0.73 recall and 0.72 precision. In particular, for a signal-to-noise ratio over 4 dB, more than 83% of BSs were recognized, with precision of 0.77 or more. EffUNet recall dropped below 0.60 for BS duration of 1.5 seconds or less. At a BS ratio greater than 0.05, the precision of our model was over 0.83. For both healthy participants and patients with IBD, insertion and deletion timing errors were the largest, with a total of 15.54 minutes of insertion errors and 13.08 minutes of deletion errors over the total audio data set. On our data set, EffUNet outperformed existing BS spotting models that provide similar temporal resolution.</p><p><strong>Conclusions: </strong>The EffUNet spotter is robust against background noise and can retrieve BSs with varying duration. EffUNet outperforms previous BS detection approaches in unmodified audio data, containing highly sparse BS events.</p>","PeriodicalId":73551,"journal":{"name":"JMIR AI","volume":"3 ","pages":"e51118"},"PeriodicalIF":0.0,"publicationDate":"2024-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11269970/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141565295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}