Chaoyi Wu, Pengcheng Qiu, Jinxin Liu, Hongfei Gu, Na Li, Ya Zhang, Yanfeng Wang, Weidi Xie
{"title":"Towards evaluating and building versatile large language models for medicine","authors":"Chaoyi Wu, Pengcheng Qiu, Jinxin Liu, Hongfei Gu, Na Li, Ya Zhang, Yanfeng Wang, Weidi Xie","doi":"10.1038/s41746-024-01390-4","DOIUrl":"https://doi.org/10.1038/s41746-024-01390-4","url":null,"abstract":"<p>In this study, we present <b>MedS-Bench</b>, a comprehensive benchmark to evaluate large language models (LLMs) in clinical contexts, <b>MedS-Bench</b>, spanning 11 high-level clinical tasks. We evaluate nine leading LLMs, <i>e.g</i>., MEDITRON, Llama 3, Mistral, GPT-4, Claude-3.5, <i>etc</i>. and found that most models struggle with these complex tasks. To address these limitations, we developed <b>MedS-Ins</b>, a large-scale instruction-tuning dataset for medicine. <b>MedS-Ins</b> comprises 58 medically oriented language corpora, totaling 5M instances with 19K instructions, across 122 tasks. To demonstrate the dataset’s utility, we conducted a proof-of-concept experiment by performing instruction tuning on a lightweight, open-source medical language model. The resulting model, <b>MMedIns-Llama 3</b>, significantly outperformed existing models on various clinical tasks. To promote further advancements, we have made <b>MedS-Ins</b> fully accessible and invite the research community to contribute to its expansion. Additionally, we have launched a dynamic leaderboard for <b>MedS-Bench</b>, to track the development progress of medical LLMs.</p>","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"45 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143044104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Weiyang Deng, Megan K. O’Brien, Rachel A. Andersen, Richa Rai, Erin Jones, Arun Jayaraman
{"title":"A systematic review of portable technologies for the early assessment of motor development in infants","authors":"Weiyang Deng, Megan K. O’Brien, Rachel A. Andersen, Richa Rai, Erin Jones, Arun Jayaraman","doi":"10.1038/s41746-025-01450-3","DOIUrl":"https://doi.org/10.1038/s41746-025-01450-3","url":null,"abstract":"<p>Early screening and evaluation of infant motor development are crucial for detecting motor deficits and enabling timely interventions. Traditional clinical assessments are often subjective, without fully capturing infants’ “real-world” behavior. This has sparked interest in portable, low-cost technologies to objectively and precisely measure infant motion at home, with a goal of enhancing ecological validity. In this systematic review, we explored the current landscape of portable, technology-based solutions to assess early motor development (within the first year), outlining the prevailing challenges and future directions. We reviewed 66 publications, which utilized video, sensors, or a combination of technologies. There were three key applications of these technologies: (1) automating clinical assessments, (2) illuminating new measures of motor development, and (3) predicting developmental outcomes. There was a promising trend toward earlier and more accurate detection using portable technologies. Additional research and demographic diversity are needed to develop fully automated, robust, and user-friendly tools. Registration & Protocol OSF Registries https://doi.org/10.17605/OSF.IO/R6JAE.</p>","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"54 41 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143044107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Siân Bladon, Emily Eisner, Sandra Bucci, Anuoluwapo Oluwatayo, Glen P. Martin, Matthew Sperrin, John Ainsworth, Sophie Faulkner
{"title":"A systematic review of passive data for remote monitoring in psychosis and schizophrenia","authors":"Siân Bladon, Emily Eisner, Sandra Bucci, Anuoluwapo Oluwatayo, Glen P. Martin, Matthew Sperrin, John Ainsworth, Sophie Faulkner","doi":"10.1038/s41746-025-01451-2","DOIUrl":"https://doi.org/10.1038/s41746-025-01451-2","url":null,"abstract":"<p>There is increasing use of digital tools to monitor people with psychosis and schizophrenia remotely, but using this type of data is challenging. This systematic review aimed to summarise how studies processed and analysed data collected through digital devices. In total, 203 articles collecting passive data through smartphones or wearable devices, from participants with psychosis or schizophrenia were included in the review. Accelerometers were the most common device (<i>n</i> = 115 studies), followed by smartphones (<i>n</i> = 46). The most commonly derived features were sleep duration (<i>n</i> = 50) and time spent sedentary (<i>n</i> = 41). Thirty studies assessed data quality and another 69 applied data quantity thresholds. Mixed effects models were used in 21 studies and time-series and machine-learning methods were used in 18 studies. Reporting of methods to process and analyse data was inconsistent, highlighting a need to improve the standardisation of methods and reporting in this area of research.</p>","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"58 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143044109","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cheng-Jiang Wei, Yan Tang, Yang-Bai Sun, Tie-Long Yang, Cheng Yan, Hui Liu, Jun Liu, Jing-Ning Huang, Ming-Han Wang, Zhen-Wei Yao, Ji-Long Yang, Zhi-Chao Wang, Qing-Feng Li
{"title":"A multicenter study of neurofibromatosis type 1 utilizing deep learning for whole body tumor identification","authors":"Cheng-Jiang Wei, Yan Tang, Yang-Bai Sun, Tie-Long Yang, Cheng Yan, Hui Liu, Jun Liu, Jing-Ning Huang, Ming-Han Wang, Zhen-Wei Yao, Ji-Long Yang, Zhi-Chao Wang, Qing-Feng Li","doi":"10.1038/s41746-025-01454-z","DOIUrl":"https://doi.org/10.1038/s41746-025-01454-z","url":null,"abstract":"<p>Deep-learning models have shown promise in differentiating between benign and malignant lesions. Previous studies have primarily focused on specific anatomical regions, overlooking tumors occurring throughout the body with highly heterogeneous whole-body backgrounds. Using neurofibromatosis type 1 (NF1) as an example, this study developed highly accurate MRI-based deep-learning models for the early automated screening of malignant peripheral nerve sheath tumors (MPNSTs) against complex whole-body background. In a Chinese seven-center cohort, data from 347 subjects were analyzed. Our one-step model incorporated normal tissue/organ labels to provide contextual information, offering a solution for tumors with complex backgrounds. To address privacy concerns, we utilized a lightweight deep neural network suitable for hospital deployment. The final model achieved an accuracy of 85.71% for MPNST diagnosis in the validation cohort and 84.75% accuracy in the independent test set, outperforming another classic two-step model. This success suggests potential for AI models in screening other whole-body primary/metastatic tumors.</p>","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"14 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2025-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143034970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gustavo Frigieri, Sérgio Brasil, Danilo Cardim, Marek Czosnyka, Matheus Ferreira, Wellingson S. Paiva, Xiao Hu
{"title":"Machine learning approach for noninvasive intracranial pressure estimation using pulsatile cranial expansion waveforms","authors":"Gustavo Frigieri, Sérgio Brasil, Danilo Cardim, Marek Czosnyka, Matheus Ferreira, Wellingson S. Paiva, Xiao Hu","doi":"10.1038/s41746-025-01463-y","DOIUrl":"https://doi.org/10.1038/s41746-025-01463-y","url":null,"abstract":"<p>Noninvasive methods for intracranial pressure (ICP) monitoring have emerged, but none has successfully replaced invasive techniques. This observational study developed and tested a machine learning (ML) model to estimate ICP using waveforms from a cranial extensometer device (brain4care [B4C] System). The model explored multiple waveform parameters to optimize mean ICP estimation. Data from 112 neurocritical patients with acute brain injuries were used, with 92 patients randomly assigned to training and testing, and 20 reserved for independent validation. The ML model achieved a mean absolute error of 3.00 mmHg, with a 95% confidence interval within ±7.5 mmHg. Approximately 72% of estimates from the validation sample were within 0-4 mmHg of invasive ICP values. This proof-of-concept study demonstrates that noninvasive ICP estimation via the B4C System and ML is feasible. Prospective studies are needed to validate the model’s clinical utility across diverse settings.</p>","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"25 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2025-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143034968","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hyojin Lee, You Rim Choi, Hyun Kyung Lee, Jaemin Jeong, Joopyo Hong, Hyun-Woo Shin, Hyung-Sin Kim
{"title":"Explainable vision transformer for automatic visual sleep staging on multimodal PSG signals","authors":"Hyojin Lee, You Rim Choi, Hyun Kyung Lee, Jaemin Jeong, Joopyo Hong, Hyun-Woo Shin, Hyung-Sin Kim","doi":"10.1038/s41746-024-01378-0","DOIUrl":"https://doi.org/10.1038/s41746-024-01378-0","url":null,"abstract":"<p>Polysomnography (PSG) is crucial for diagnosing sleep disorders, but manual scoring of PSG is time-consuming and subjective, leading to high variability. While machine-learning models have improved PSG scoring, their clinical use is hindered by the ‘black-box’ nature. In this study, we present <i>SleepXViT</i>, an automatic sleep staging system using Vision Transformer (ViT) that provides intuitive, consistent explanations by mimicking human ‘visual scoring’. Tested on KISS–a PSG image dataset from 7745 patients across four hospitals–<i>SleepXViT</i> achieved a Macro F1 score of 81.94%, outperforming baseline models and showing robust performances on public datasets SHHS1 and SHHS2. Furthermore, <i>SleepXViT</i> offers well-calibrated confidence scores, enabling expert review for low-confidence predictions, alongside high-resolution heatmaps highlighting essential features and relevance scores for adjacent epochs’ influence on sleep stage predictions. Together, these explanations reinforce the scoring consistency of <i>SleepXViT</i>, making it both reliable and interpretable, thereby facilitating the synergy between the AI model and human scorers in clinical settings.</p>","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"47 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2025-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143030921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zheyi Dong, Xiaofei Wang, Sai Pan, Taohan Weng, Xiaoniao Chen, Shuangshuang Jiang, Ying Li, Zonghua Wang, Xueying Cao, Qian Wang, Pu Chen, Lai Jiang, Guangyan Cai, Li Zhang, Yong Wang, Jinkui Yang, Yani He, Hongli Lin, Jie Wu, Li Tang, Jianhui Zhou, Shengxi Li, Zhaohui Li, Yibing Fu, Xinyue Yu, Yanqiu Geng, Yingjie Zhang, Liqiang Wang, Mai Xu, Xiangmei Chen
{"title":"A multimodal transformer system for noninvasive diabetic nephropathy diagnosis via retinal imaging","authors":"Zheyi Dong, Xiaofei Wang, Sai Pan, Taohan Weng, Xiaoniao Chen, Shuangshuang Jiang, Ying Li, Zonghua Wang, Xueying Cao, Qian Wang, Pu Chen, Lai Jiang, Guangyan Cai, Li Zhang, Yong Wang, Jinkui Yang, Yani He, Hongli Lin, Jie Wu, Li Tang, Jianhui Zhou, Shengxi Li, Zhaohui Li, Yibing Fu, Xinyue Yu, Yanqiu Geng, Yingjie Zhang, Liqiang Wang, Mai Xu, Xiangmei Chen","doi":"10.1038/s41746-024-01393-1","DOIUrl":"https://doi.org/10.1038/s41746-024-01393-1","url":null,"abstract":"<p>Differentiating between diabetic nephropathy (DN) and non-diabetic renal disease (NDRD) without a kidney biopsy remains a major challenge, often leading to missed opportunities for targeted treatments that could greatly improve NDRD outcomes. To reform the traditional biopsy-all diagnostic paradigm and avoid unnecessary biopsy, we developed a transformer-based deep learning (DL) system for detecting DN and NDRD upon non-invasive multi-modal data of fundus images and clinical characteristics. Our Trans-MUF achieved an AUC of 0.980 (95% CI: 0.979 to 0.980) over the internal retrospective set and also had superior generalizability over a prospective dataset (AUC: 0.989, 95% CI: 0.987 to 0.990) and a multicenter, cross-machine and multi-operator dataset (AUC: 0.932, 95% CI: 0.931 to 0.939). Moreover, the nephrologists‘ diagnosis accuracy can be improved by 21%, through visualization assistance of the DL system. This paper lays a foundation for automatically differentiating DN and NDRD without biopsy. (Registry name: Correlation Study Between Clinical Phenotype and Pathology of Type 2 Diabetic Nephropathy. ID: NCT03865914. Date: 2017-11-30).</p>","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"6 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2025-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143027195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yong Whi Jeong, Hayon Michelle Choi, Youhyun Park, Yongjin Lee, Ji Ye Jung, Dae Ryong Kang
{"title":"Association between exposure to particulate matter and heart rate variability in vulnerable and susceptible individuals","authors":"Yong Whi Jeong, Hayon Michelle Choi, Youhyun Park, Yongjin Lee, Ji Ye Jung, Dae Ryong Kang","doi":"10.1038/s41746-024-01373-5","DOIUrl":"https://doi.org/10.1038/s41746-024-01373-5","url":null,"abstract":"<p>Particulate matter (PM) exposure can reduce heart rate variability (HRV), a cardiovascular health marker. This study examines PM<sub>1.0</sub> (aerodynamic diameters <1 μm), PM<sub>2.5</sub> (≥1 μm and <2.5 μm), and PM<sub>10</sub> (≥2.5 μm and <10 μm) effects on HRV in patients with environmental diseases as chronic disease groups and vulnerable populations as control groups. PM levels were measured indoors and outdoors for five days in 97 participants, with 24-h HRV monitoring via wearable devices. PM exposure was assessed by categorizing daily cumulative PM concentrations into higher and lower exposure days, while daily average PM concentrations were used for analysis. Results showed significant negative associations between exposure to single and mixtures of different PM metrics and HRV across all groups, particularly in chronic airway disease and higher air pollution exposed groups. These findings highlight that even lower PM levels may reduce HRV, suggesting a need for stricter standards to protect sensitive individuals.</p>","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"1 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2025-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143027190","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jessie P. Bakker, Roland Barge, Jacob Centra, Bryan Cobb, Chas Cota, Christine C. Guo, Bert Hartog, Nathalie Horowicz-Mehler, Elena S. Izmailova, Nikolay V. Manyakov, Samantha McClenahan, Stéphane Motola, Smit Patel, Oana Paun, Marian Schoone, Emre Sezgin, Thomas Switzer, Animesh Tandon, Willem van den Brink, Srinivasan Vairavan, Benjamin Vandendriessche, Bernard Vrijens, Jennifer C. Goldsack
{"title":"V3+ extends the V3 framework to ensure user-centricity and scalability of sensor-based digital health technologies","authors":"Jessie P. Bakker, Roland Barge, Jacob Centra, Bryan Cobb, Chas Cota, Christine C. Guo, Bert Hartog, Nathalie Horowicz-Mehler, Elena S. Izmailova, Nikolay V. Manyakov, Samantha McClenahan, Stéphane Motola, Smit Patel, Oana Paun, Marian Schoone, Emre Sezgin, Thomas Switzer, Animesh Tandon, Willem van den Brink, Srinivasan Vairavan, Benjamin Vandendriessche, Bernard Vrijens, Jennifer C. Goldsack","doi":"10.1038/s41746-024-01322-2","DOIUrl":"https://doi.org/10.1038/s41746-024-01322-2","url":null,"abstract":"<p>We propose the addition of <i>usability validation</i> to the extended V3 framework, now “V3+”, and describe a pragmatic approach to ensuring that sensor-based digital health technologies can be used optimally at scale by diverse users. Alongside the original V3 components (verification; analytical validation; clinical validation), usability validation will ensure user-centricity of digital measurement tools, paving the way for more inclusive, reliable, and trustworthy digital measures within clinical research and clinical care.</p>","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"27 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2025-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143027192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Machine learning based quantitative pain assessment for the perioperative period","authors":"Gayeon Ryu, Jae Moon Choi, Hyeon Seok Seok, Jaehyung Lee, Eun-Kyung Lee, Hangsik Shin, Byung-Moon Choi","doi":"10.1038/s41746-024-01362-8","DOIUrl":"https://doi.org/10.1038/s41746-024-01362-8","url":null,"abstract":"<p>This study developed and evaluated a model for assessing pain during the surgical period using photoplethysmogram data from 242 patients. Pain levels were measured at 2 min intervals using a numerical rating scale or clinical criteria: preoperative, before and after intubation, before and after skin incision, and postoperative. Key features from the photoplethysmography waveform were extracted to build XGBoost-based models for intraoperative and postoperative pain assessment. The combined perioperative model was compared with a commercial surgical pain index, yielding area under the receiver operating characteristics curve scores of 0.819 and 0.927 for intraoperative and postoperative periods, respectively, compared to the commercial index’s scores of 0.829 and 0.577. These results highlight the models’ effectiveness in pain assessment throughout the surgical process, identifying waveform skewness and diastolic phase rate decrease as critical for intraoperative pain assessment and systolic phase area or baseline fluctuation as significant for postoperative pain assessment.</p><p><b>Clinical trial registration</b>: Registration name: Clinical Research Information Service (CRIS). Registration site: http://cris.nih.go.kr. Number: KCT0005840. Principal Investigator: Dr. Byung-Moon Choi. Date of registration: January 28, 2021</p>","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"19 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2025-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143027193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}