Livia Lilli, Stefano Patarnello, Carlotta Masciocchi, Antonio Marchetti, Giovanni Arcuri
{"title":"Benchmarking Large Language Models for Italian Medical Text Classification: Are Generative Models the Best Choice?","authors":"Livia Lilli, Stefano Patarnello, Carlotta Masciocchi, Antonio Marchetti, Giovanni Arcuri","doi":"10.3233/SHTI251486","DOIUrl":"https://doi.org/10.3233/SHTI251486","url":null,"abstract":"<p><p>The extraction of meaningful information from clinical reports has been an area of growing interest, with a variety of studies leveraging natural language processing (NLP) techniques based on BERT architectures and generative large language models (LLMs). However, identifying the most effective approach remains challenging, especially for text classification, where model architecture, data availability, domain-specific nuances and language play a crucial role in performance. In this study, we present a benchmark analysis of generative LLMs and BERT-based models for the classification of metastasis in Italian clinical reports of breast cancer patients. Our methodology compares the performance of generative LLMs implemented within a structured generation framework, versus BERT-based models fine-tuned on the metastasis classification task, and also applied in a zero-shot learning setting. In our experiments, fine-tuned BERT models achieved the most balanced results (F1 = 0.884, AUC = 0.720). Generative LLMs showed promising performance, with potential for improvement through further adaptation. Finally, our study suggests that both BERT-based models and generative LLMs are potential solutions also in low computational settings, making them accessible for real-world clinical applications, particularly in medical text classification.</p>","PeriodicalId":94357,"journal":{"name":"Studies in health technology and informatics","volume":"332 ","pages":"12-16"},"PeriodicalIF":0.0,"publicationDate":"2025-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145215032","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Trust in Artificial Intelligence in Wound Care: Perspectives of Healthcare Professionals and Patients in Germany.","authors":"Birgit Babitsch, Niels Hannemann, Ursula Hübner","doi":"10.3233/SHTI251514","DOIUrl":"https://doi.org/10.3233/SHTI251514","url":null,"abstract":"<p><p>Artificial intelligence (AI) is increasingly integrated into healthcare, changing processes and structures, and thus the practice of healthcare professionals and potentially the role of patients and the healthcare professional-patient relationship. Beyond high-precision AI algorithms, knowledge of how to evaluate and use AI-based results in everyday healthcare is crucial for high-quality and safe care, and a prerequisite for trust. Therefore, this qualitative study aims to explore 1) the general perception of trust in AI used in healthcare and specifically in wound care, 2) the prerequisites for building trust in AI, and 3) the impact of AI on treatment and healthcare professional-patient relationship, all from the perspective of healthcare professionals and patients. Interviews were conducted in 2022/2023 with healthcare professionals specializing in wound care (N = 12) and in 2023 with patients with chronic wounds (N = 10). The interview guide included questions about digitalization in general and AI in particular, as well as trust and the healthcare professional-patient relationship. Our data revealed a limited understanding of AI principles and evaluation of AI-generated outcomes in both groups. Healthcare professionals recognized the potential of AI to provide data-driven suggestions for diagnosis and therapy, acting as a supportive \"second opinion\". Patients, on the contrary, expressed a preference for their physicians to incorporate AI-generated results into their care, thereby placing their trust in the physician's ability to apply them correctly. Neither group expected significant changes in the healthcare professional-patient relationship. Trust in AI was linked to general trust in digitalization, and healthcare professionals showed greater trust in AI results that were aligned with their existing expertise and were transparently explained. These findings suggest that AI can be a valuable tool for high-quality healthcare, but in-formed use requires meeting key prerequisites, including Explainable AI (XAI) principles and ongoing training.</p>","PeriodicalId":94357,"journal":{"name":"Studies in health technology and informatics","volume":"332 ","pages":"144-148"},"PeriodicalIF":0.0,"publicationDate":"2025-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145215116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Morgan Vaterkowski, Nadir Ammour, Christel Daniel, Emmanuelle Kempf
{"title":"How to (Semi)-Automatically Spot Prescreening Oriented Eligibility Criteria.","authors":"Morgan Vaterkowski, Nadir Ammour, Christel Daniel, Emmanuelle Kempf","doi":"10.3233/SHTI251546","DOIUrl":"https://doi.org/10.3233/SHTI251546","url":null,"abstract":"<p><p>Clinical Trial (CT) Recruitment Support Systems (CTRSS) querying Electronic Health Records (EHR) for patient-trial matching during CT execution have been expanding. Since free text CT eligibility criteria (EC) are not readily suitable for the automation of the EHR querying, the configuration of EHR-based CTRSS requires a time-consuming and usually manual processing of EC focusing on those that are the most relevant at the pre-inclusion (prescreening) step. The aim of this study is to provide a methodological approach to semi-automatically detect Prescreening-Oriented Eligibility Criteria (POEC) and build a library of POEC usable in the context of the development and evaluation of EHR-based Clinical Trial Recruitment Support Systems (CTRSS). We proposed an approach for decomposing free text EC into standardized elements and developing a rule-based algorithm to semi-automatically detect POEC. In addition, this paper describes the characteristics of a publicly available POEC library usable for CTRSS evaluation. An annotation framework consisting in 96 patterns of elementary EC categorized in 17 domains was used to annotate 381 free text EC from 20 CT dedicated to various cancer types. This training dataset was used to develop a rule-based algorithm detecting POEC. This study provides a methodological approach to (semi)-automatically spot POEC and store them in a library considering advances in the field of CTRSS. The PENELOPE-C2Q pipeline is designed to feed the PENELOPE POEC library, both having the potential to facilitate the reuse of EHR data for better participation of patients to research.</p>","PeriodicalId":94357,"journal":{"name":"Studies in health technology and informatics","volume":"332 ","pages":"288-292"},"PeriodicalIF":0.0,"publicationDate":"2025-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145215118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Luís Carlos Afonso, João Rafael Almeida, José Luís Oliveira
{"title":"Combining Statistical and Deep Learning Models for Insomnia Detection.","authors":"Luís Carlos Afonso, João Rafael Almeida, José Luís Oliveira","doi":"10.3233/SHTI251525","DOIUrl":"https://doi.org/10.3233/SHTI251525","url":null,"abstract":"<p><p>Insomnia is a common but often underdiagnosed condition in clinical settings, where relevant information is typically buried in unstructured free-text notes. Automated tools that can identify both the presence of insomnia and the supporting evidence are essential to improve diagnosis and enable large-scale studies. However, existing models often prioritize accuracy at the cost of interpretability, which is critical for clinical adoption. To address this, we explore a hybrid approach that balances performance with explainability. Our method combines Finite Context Models (FCMs) for character-level classification of insomnia status with a BERT-based token classification model for extracting textual evidence, using structured annotations from the MIMIC-III dataset. This complementary setup enables both accurate prediction and transparent decision-making in clinical text analysis.</p>","PeriodicalId":94357,"journal":{"name":"Studies in health technology and informatics","volume":"332 ","pages":"195-199"},"PeriodicalIF":0.0,"publicationDate":"2025-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145215125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Core Competencies for Psychiatric Hospital at Home (HaH) Treatment: A Qualitative Exploration.","authors":"Kerstin Denecke, Denis Sumin Moser, Friederike J S Thilo, Manuela Grieser","doi":"10.3233/SHTI251548","DOIUrl":"https://doi.org/10.3233/SHTI251548","url":null,"abstract":"<p><p>Hospital at home (HaH) applied in psychiatric settings is becoming an important component of mental health care, yet the core competencies required by health professionals for successful implementations remain understudied. This exploratory qualitative study aims to identify the core competencies health professionals perceive as essential for effective home treatment and proposes considerations for future training frameworks. Semi-structured interviews were conducted with five health professionals leading or developing psychiatric HaH treatment services. Data were analysed using thematic analysis, focusing on recurrent themes related to skills and challenges. Seven categories of skills and competencies were identified: clinical and diagnostic competencies, home and family-oriented skills, treatment planning and process management skills, communication and interprofessional collaboration skills, teamwork and leadership, professional attitudes and adaptive skills, data-driven practice and evaluation. Future work should concretize these core competencies to make the next step towards educational programs for successful realisation of HaH care services.</p>","PeriodicalId":94357,"journal":{"name":"Studies in health technology and informatics","volume":"332 ","pages":"294-298"},"PeriodicalIF":0.0,"publicationDate":"2025-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145215130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Panos Bonotis, Pantelis Angelidis, Katerina D Tzimourta, Stamatia Bibi
{"title":"Enabling Dynamic Consent Through AI and Blockchain: The CONSENT Platform.","authors":"Panos Bonotis, Pantelis Angelidis, Katerina D Tzimourta, Stamatia Bibi","doi":"10.3233/SHTI251556","DOIUrl":"https://doi.org/10.3233/SHTI251556","url":null,"abstract":"<p><p>The increasing complexity of data ecosystems, especially healthcare, highlights the urgent need for dynamic, user-centric consent management solutions. Traditional static consent models struggle to adapt to evolving privacy regulations, organizational needs, and user expectations. The CONSENT project introduces an innovative Consent Management Platform (CMP) that leverages Artificial Intelligence (AI) and Blockchain technologies to enable secure, transparent, and flexible management of consent in complex data workflows. By combining intelligent consent recommendation mechanisms with tamper-proof decentralized storage, the CONSENT platform aims to empower users with greater control over their data while facilitating organizational compliance with frameworks such as GDPR and CCPA. This paper presents the platforms vision, core technological pillars, and the planned evaluation strategy, including the anticipated implementation in healthcare and digital services, providing insights into how AI and Blockchain can reshape consent management for the healthcare digital age and beyond.</p>","PeriodicalId":94357,"journal":{"name":"Studies in health technology and informatics","volume":"332 ","pages":"330-334"},"PeriodicalIF":0.0,"publicationDate":"2025-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145215134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Al Rahrooh, Anders O Garlid, Panayiotis Petousis, Arthur Fumnell, Alex A T Bui
{"title":"MedPromptEval: A Comprehensive Framework for Systematic Evaluation of Clinical Question Answering Systems.","authors":"Al Rahrooh, Anders O Garlid, Panayiotis Petousis, Arthur Fumnell, Alex A T Bui","doi":"10.3233/SHTI251540","DOIUrl":"https://doi.org/10.3233/SHTI251540","url":null,"abstract":"<p><p>Clinical deployment of large language models (LLMs) faces critical challenges, including inconsistent prompt performance, variable model behavior, and a lack of standardized evaluation methodologies. We present MedPromptEval, a framework that systematically evaluates LLM-prompt combinations across clinically relevant dimensions. This framework automatically generates diverse prompt types, orchestrates response generation across multiple LLMs, and quantifies performance through multiple metrics measuring factual accuracy, semantic relevance, entailment consistency, and linguistic appropriateness. We demonstrate MedPromptEval's utility across publicly available clinical question answering (QA) datasets - MedQuAD, PubMedQA, and HealthCareMagic - in distinct evaluation modes: 1) model comparison using standardized prompts; 2) prompt strategy optimization using a controlled model; and 3) extensive assessment of prompt-model configurations. By enabling reproducible benchmarking of clinical LLM and QA applications, MedPromptEval provides insights for optimizing prompt engineering and model selection, advancing the reliable and effective integration of language models in health care settings.</p>","PeriodicalId":94357,"journal":{"name":"Studies in health technology and informatics","volume":"332 ","pages":"262-266"},"PeriodicalIF":0.0,"publicationDate":"2025-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145215137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Evaluating Human-Robot Collaboration in Hospital Practice: Introducing HERO.","authors":"Kristina Tornbjerg Eriksen, Jeppe Eriksen","doi":"10.3233/SHTI251534","DOIUrl":"https://doi.org/10.3233/SHTI251534","url":null,"abstract":"<p><p>As robots are increasingly integrated into hospital environments, evaluating human-robot collaboration (HRC) requires more than assessing technical performance or user acceptance. This paper introduces the HERO framework - a socio-technical evaluation model for assessing human-robot collaboration (HRC) in hospital environments. Unlike existing models that center on technical performance or user acceptance, HERO integrates ethnographic insight and systemic analysis to foreground the interdependencies between social practices, spatial dynamics, robotic behavior, and institutional structures. Developed through fieldwork, interviews, and a scoping review, HERO consists of four interrelated dimensions: Humans, Environment, Robots, and Organisation. Each is operationalised through guiding questions to support both empirical inquiry and practical application. HERO offers a novel, practice-oriented contribution to the evaluation of HRC in complex real-world settings.</p>","PeriodicalId":94357,"journal":{"name":"Studies in health technology and informatics","volume":"332 ","pages":"232-236"},"PeriodicalIF":0.0,"publicationDate":"2025-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145215143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Vinícius Lima, Rute Almeida, Filipe Bernardi, Francisco Bischoff, Luís Conceição, Daniel Rodrigues, Ricardo Correia, Goreti Marreiros, Alberto Freitas
{"title":"Integrating Large Language Models into Obstetric EHRs: The MedGPT Use Case in Portugal.","authors":"Vinícius Lima, Rute Almeida, Filipe Bernardi, Francisco Bischoff, Luís Conceição, Daniel Rodrigues, Ricardo Correia, Goreti Marreiros, Alberto Freitas","doi":"10.3233/SHTI251559","DOIUrl":"https://doi.org/10.3233/SHTI251559","url":null,"abstract":"<p><p>ObsCare, an obstetric electronic health record system used in 20 Portuguese hospitals, has enabled the collection of longitudinal maternal health data. Within the MedGPT project, we aim to integrate fine-tuned large language models into ObsCare to summarize a pregnant patient's health journey, provide culturally sensitive decision support, and generate personalized educational content. The project addresses critical gaps in obstetric care caused by workforce shortages, increasing C-section rates, and rising maternal age in Portugal. We describe the planned architecture, data sources, and early implementation strategies, highlighting how MedGPT will reduce clinicians' documentation burden and enhance patient engagement. This work-in-progress reflects the preliminary results of a European collaboration to develop ethically aligned Generative AI solutions in healthcare.</p>","PeriodicalId":94357,"journal":{"name":"Studies in health technology and informatics","volume":"332 ","pages":"345-349"},"PeriodicalIF":0.0,"publicationDate":"2025-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145215072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Corine Oldhoff-Nuijsink, Isabel Engelen, Thomas Engelsma, Marloes Derksen
{"title":"Expert Evaluation of an Online Digital Health Literacy Tool.","authors":"Corine Oldhoff-Nuijsink, Isabel Engelen, Thomas Engelsma, Marloes Derksen","doi":"10.3233/SHTI251508","DOIUrl":"https://doi.org/10.3233/SHTI251508","url":null,"abstract":"<p><p>This study aims to gain first insights into developing and using an interactive assessment tool with an automatic scoring method to measure digital health literacy with performance-based tasks. We developed a prototype assessment tool based on an informal rapid literature review. The prototype was then evaluated using a heuristic evaluation (n = 4) and qualitative interviews (n = 5) with domain experts. The expert evaluation of the developed prototype identified 27 usability problems and resulted in recommendations for future development and research. The use of a digital interactive assessment instrument with performance-based tasks has potential and could support digital health literacy assessment. However, further research is needed into the definition of digital health literacy and end-user evaluation of its usability.</p>","PeriodicalId":94357,"journal":{"name":"Studies in health technology and informatics","volume":"332 ","pages":"118-122"},"PeriodicalIF":0.0,"publicationDate":"2025-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145215103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}