Finn Syryca, Christian Gräßer, Teresa Trenkwalder, Philipp Nicol
{"title":"使用人工智能自动生成超声心动图报告:一种简化心血管诊断的新方法。","authors":"Finn Syryca, Christian Gräßer, Teresa Trenkwalder, Philipp Nicol","doi":"10.1007/s10554-025-03382-1","DOIUrl":null,"url":null,"abstract":"<p><p>Accurate interpretation of echocardiography measurements is essential for diagnosing cardiovascular diseases and guiding clinical management. The emergence of large language models (LLMs) like ChatGPT presents a novel opportunity to automate the generation of echocardiography reports and provide clinical recommendations. This study aimed to evaluate the ability of an LLM (ChatGPT) to 1) generate comprehensive echocardiography reports based solely on provided echocardiographic measurements, and when enriched with clinical information 2) formulate accurate diagnoses, along with appropriate recommendations for further tests, treatment, and follow-up. Echocardiographic data from n = 13 fictional cases (Group 1) and n = 8 clinical cases (Group 2) were input into the LLM. The model's outputs were compared against standard clinical assessments conducted by experienced cardiologists. Using a dedicated scoring system, the LLM's performance was evaluated and stratified based on its accuracy in report generation, diagnostic precision, and the appropriateness of its recommendations. Patterns, frequency and examples of misinterpretations by LLM were analysed. Across all cases, mean total score was 6.86 (SD = 1.12). Group 1 had a mean total score of 6.54 (SD = 1.13) and accuracy of 3.92 (SD = 0.86), while Group 2 scored 7.38 (SD = 0.92) and 4.38 (SD = 0.92), respectively. Recommendations were 2.62 (SD = 0.51) for Group 1 and 3.00 (SD = 0.00) for Group 2, with no significant differences (p = 0.096). Fully acceptable reports were 85.7%, borderline acceptable 14.3%, and none were not acceptable. Of 299 parameters, 5.3% were misinterpreted. The LLM demonstrated a high level of accuracy in generating detailed echocardiography reports, mostly correctly identifying normal and abnormal findings, and making accurate diagnoses across a range of cardiovascular conditions. ChatGPT, as an LLM, shows significant potential in automating the interpretation of echocardiographic data, offering accurate diagnostic insights and clinical recommendations. These findings suggest that LLMs could serve as valuable tools in clinical practice, assisting and streamlining clinical workflow.</p>","PeriodicalId":94227,"journal":{"name":"The international journal of cardiovascular imaging","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Automated generation of echocardiography reports using artificial intelligence: a novel approach to streamlining cardiovascular diagnostics.\",\"authors\":\"Finn Syryca, Christian Gräßer, Teresa Trenkwalder, Philipp Nicol\",\"doi\":\"10.1007/s10554-025-03382-1\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Accurate interpretation of echocardiography measurements is essential for diagnosing cardiovascular diseases and guiding clinical management. The emergence of large language models (LLMs) like ChatGPT presents a novel opportunity to automate the generation of echocardiography reports and provide clinical recommendations. This study aimed to evaluate the ability of an LLM (ChatGPT) to 1) generate comprehensive echocardiography reports based solely on provided echocardiographic measurements, and when enriched with clinical information 2) formulate accurate diagnoses, along with appropriate recommendations for further tests, treatment, and follow-up. Echocardiographic data from n = 13 fictional cases (Group 1) and n = 8 clinical cases (Group 2) were input into the LLM. The model's outputs were compared against standard clinical assessments conducted by experienced cardiologists. Using a dedicated scoring system, the LLM's performance was evaluated and stratified based on its accuracy in report generation, diagnostic precision, and the appropriateness of its recommendations. Patterns, frequency and examples of misinterpretations by LLM were analysed. Across all cases, mean total score was 6.86 (SD = 1.12). Group 1 had a mean total score of 6.54 (SD = 1.13) and accuracy of 3.92 (SD = 0.86), while Group 2 scored 7.38 (SD = 0.92) and 4.38 (SD = 0.92), respectively. Recommendations were 2.62 (SD = 0.51) for Group 1 and 3.00 (SD = 0.00) for Group 2, with no significant differences (p = 0.096). Fully acceptable reports were 85.7%, borderline acceptable 14.3%, and none were not acceptable. Of 299 parameters, 5.3% were misinterpreted. The LLM demonstrated a high level of accuracy in generating detailed echocardiography reports, mostly correctly identifying normal and abnormal findings, and making accurate diagnoses across a range of cardiovascular conditions. ChatGPT, as an LLM, shows significant potential in automating the interpretation of echocardiographic data, offering accurate diagnostic insights and clinical recommendations. These findings suggest that LLMs could serve as valuable tools in clinical practice, assisting and streamlining clinical workflow.</p>\",\"PeriodicalId\":94227,\"journal\":{\"name\":\"The international journal of cardiovascular imaging\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-03-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The international journal of cardiovascular imaging\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1007/s10554-025-03382-1\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The international journal of cardiovascular imaging","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s10554-025-03382-1","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Automated generation of echocardiography reports using artificial intelligence: a novel approach to streamlining cardiovascular diagnostics.
Accurate interpretation of echocardiography measurements is essential for diagnosing cardiovascular diseases and guiding clinical management. The emergence of large language models (LLMs) like ChatGPT presents a novel opportunity to automate the generation of echocardiography reports and provide clinical recommendations. This study aimed to evaluate the ability of an LLM (ChatGPT) to 1) generate comprehensive echocardiography reports based solely on provided echocardiographic measurements, and when enriched with clinical information 2) formulate accurate diagnoses, along with appropriate recommendations for further tests, treatment, and follow-up. Echocardiographic data from n = 13 fictional cases (Group 1) and n = 8 clinical cases (Group 2) were input into the LLM. The model's outputs were compared against standard clinical assessments conducted by experienced cardiologists. Using a dedicated scoring system, the LLM's performance was evaluated and stratified based on its accuracy in report generation, diagnostic precision, and the appropriateness of its recommendations. Patterns, frequency and examples of misinterpretations by LLM were analysed. Across all cases, mean total score was 6.86 (SD = 1.12). Group 1 had a mean total score of 6.54 (SD = 1.13) and accuracy of 3.92 (SD = 0.86), while Group 2 scored 7.38 (SD = 0.92) and 4.38 (SD = 0.92), respectively. Recommendations were 2.62 (SD = 0.51) for Group 1 and 3.00 (SD = 0.00) for Group 2, with no significant differences (p = 0.096). Fully acceptable reports were 85.7%, borderline acceptable 14.3%, and none were not acceptable. Of 299 parameters, 5.3% were misinterpreted. The LLM demonstrated a high level of accuracy in generating detailed echocardiography reports, mostly correctly identifying normal and abnormal findings, and making accurate diagnoses across a range of cardiovascular conditions. ChatGPT, as an LLM, shows significant potential in automating the interpretation of echocardiographic data, offering accurate diagnostic insights and clinical recommendations. These findings suggest that LLMs could serve as valuable tools in clinical practice, assisting and streamlining clinical workflow.