Jonathan Kambire, Seydou Golo Barro, Pascal Staccini
{"title":"Input System for a GPT Model Simulating Doctor-Patient Interactions During Medical Consultation.","authors":"Jonathan Kambire, Seydou Golo Barro, Pascal Staccini","doi":"10.3233/SHTI251562","DOIUrl":null,"url":null,"abstract":"<p><p>The introduction of the Licence-Master-Doctorate (LMD) system in African higher education has significantly reshaped university organization, particularly in health-related fields, by exacerbating structural challenges such as the shortage of faculty and inadequate infrastructure. In this context, the present work aims to construct a structured dialogical corpus designed for the training of a customized GPT-2 model, with the goal of simulating medical consultations and supporting the training of medical students. The methodology combines the use of reliable medical sources, the controlled generation of dialogues using existing artificial intelligence systems, and role-playing exercises involving medical students, with detailed annotation of clinical, emotional, and behavioral metadata. The final corpus comprises over 36 million tokens for pre-training and more than 8,326 simulated dialogues for fine-tuning, covering the most prevalent pathologies in Burkina Faso. This multilingual and culturally contextualized approach represents a significant departure from dominant Western corpora, laying the groundwork for a medical conversational model adapted to African realities. While the model is still in training, the complete results will be presented at a later stage. Nevertheless, the collected data already constitute a valuable resource for the development of realistic, diverse, and reusable educational simulators across various medical training contexts.</p>","PeriodicalId":94357,"journal":{"name":"Studies in health technology and informatics","volume":"332 ","pages":"360-364"},"PeriodicalIF":0.0000,"publicationDate":"2025-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Studies in health technology and informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3233/SHTI251562","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The introduction of the Licence-Master-Doctorate (LMD) system in African higher education has significantly reshaped university organization, particularly in health-related fields, by exacerbating structural challenges such as the shortage of faculty and inadequate infrastructure. In this context, the present work aims to construct a structured dialogical corpus designed for the training of a customized GPT-2 model, with the goal of simulating medical consultations and supporting the training of medical students. The methodology combines the use of reliable medical sources, the controlled generation of dialogues using existing artificial intelligence systems, and role-playing exercises involving medical students, with detailed annotation of clinical, emotional, and behavioral metadata. The final corpus comprises over 36 million tokens for pre-training and more than 8,326 simulated dialogues for fine-tuning, covering the most prevalent pathologies in Burkina Faso. This multilingual and culturally contextualized approach represents a significant departure from dominant Western corpora, laying the groundwork for a medical conversational model adapted to African realities. While the model is still in training, the complete results will be presented at a later stage. Nevertheless, the collected data already constitute a valuable resource for the development of realistic, diverse, and reusable educational simulators across various medical training contexts.