{"title":"中文语境下三种大型语言模型对人工流产后护理咨询的响应性能评价:比较分析","authors":"Danyue Xue, Sha Liao","doi":"10.2147/RMHP.S531777","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>This study aimed to evaluate the response performances of three large language models (LLMs) (ChatGPT, Kimi, and Ernie Bot) to inquiries regarding post-abortion care (PAC) in the context of the Chinese language.</p><p><strong>Methods: </strong>The data was collected in October 2024. Twenty questions concerning the necessity of contraception after induced abortion, the best time for contraception, choice of a contraceptive method, contraceptive effectiveness, and the potential impact of contraception on fertility were used in this study. Each question was asked three times in Chinese for each LLM. Three PAC consultants conducted the evaluations. A Likert scale was used to score the responses based on accuracy, relevance, completeness, clarity, and reliability.</p><p><strong>Results: </strong>The number of responses received \"good\" (a mean score > 4), \"average\" (3 < mean score ≤ 4), and \"poor\" (a mean score ≤ 3) in overall evaluation was 159 (88.30%), 19 (10.57%), and 2 (1.10%). No statistically significant differences were identified in the overall evaluation among the three LLMs (<i>P</i> = 0.352). The number of the responses evaluated as good for accuracy, relevance, completeness, clarity, and reliability were 87 (48.33%), 154 (85.53%), 136 (75.57%), 133 (73.87%), and 128 (71.10%), respectively. No statistically significant differences were identified in accuracy, relevance, completeness or clarity between the three LLMs. A statistically significant difference was identified in reliability (<i>P</i> < 0.001).</p><p><strong>Conclusion: </strong>The three LLMs performed well overall and showed great potential for application in PAC consultations. The accuracy of the LLMs' responses should be improved through continuous training and evaluation.</p>","PeriodicalId":56009,"journal":{"name":"Risk Management and Healthcare Policy","volume":"18 ","pages":"2731-2741"},"PeriodicalIF":2.0000,"publicationDate":"2025-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12372831/pdf/","citationCount":"0","resultStr":"{\"title\":\"Evaluation of Three Large Language Models' Response Performances to Inquiries Regarding Post-Abortion Care in the Context of Chinese Language: A Comparative Analysis.\",\"authors\":\"Danyue Xue, Sha Liao\",\"doi\":\"10.2147/RMHP.S531777\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>This study aimed to evaluate the response performances of three large language models (LLMs) (ChatGPT, Kimi, and Ernie Bot) to inquiries regarding post-abortion care (PAC) in the context of the Chinese language.</p><p><strong>Methods: </strong>The data was collected in October 2024. Twenty questions concerning the necessity of contraception after induced abortion, the best time for contraception, choice of a contraceptive method, contraceptive effectiveness, and the potential impact of contraception on fertility were used in this study. Each question was asked three times in Chinese for each LLM. Three PAC consultants conducted the evaluations. A Likert scale was used to score the responses based on accuracy, relevance, completeness, clarity, and reliability.</p><p><strong>Results: </strong>The number of responses received \\\"good\\\" (a mean score > 4), \\\"average\\\" (3 < mean score ≤ 4), and \\\"poor\\\" (a mean score ≤ 3) in overall evaluation was 159 (88.30%), 19 (10.57%), and 2 (1.10%). No statistically significant differences were identified in the overall evaluation among the three LLMs (<i>P</i> = 0.352). The number of the responses evaluated as good for accuracy, relevance, completeness, clarity, and reliability were 87 (48.33%), 154 (85.53%), 136 (75.57%), 133 (73.87%), and 128 (71.10%), respectively. No statistically significant differences were identified in accuracy, relevance, completeness or clarity between the three LLMs. A statistically significant difference was identified in reliability (<i>P</i> < 0.001).</p><p><strong>Conclusion: </strong>The three LLMs performed well overall and showed great potential for application in PAC consultations. The accuracy of the LLMs' responses should be improved through continuous training and evaluation.</p>\",\"PeriodicalId\":56009,\"journal\":{\"name\":\"Risk Management and Healthcare Policy\",\"volume\":\"18 \",\"pages\":\"2731-2741\"},\"PeriodicalIF\":2.0000,\"publicationDate\":\"2025-08-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12372831/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Risk Management and Healthcare Policy\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.2147/RMHP.S531777\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"HEALTH CARE SCIENCES & SERVICES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Risk Management and Healthcare Policy","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.2147/RMHP.S531777","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
Evaluation of Three Large Language Models' Response Performances to Inquiries Regarding Post-Abortion Care in the Context of Chinese Language: A Comparative Analysis.
Background: This study aimed to evaluate the response performances of three large language models (LLMs) (ChatGPT, Kimi, and Ernie Bot) to inquiries regarding post-abortion care (PAC) in the context of the Chinese language.
Methods: The data was collected in October 2024. Twenty questions concerning the necessity of contraception after induced abortion, the best time for contraception, choice of a contraceptive method, contraceptive effectiveness, and the potential impact of contraception on fertility were used in this study. Each question was asked three times in Chinese for each LLM. Three PAC consultants conducted the evaluations. A Likert scale was used to score the responses based on accuracy, relevance, completeness, clarity, and reliability.
Results: The number of responses received "good" (a mean score > 4), "average" (3 < mean score ≤ 4), and "poor" (a mean score ≤ 3) in overall evaluation was 159 (88.30%), 19 (10.57%), and 2 (1.10%). No statistically significant differences were identified in the overall evaluation among the three LLMs (P = 0.352). The number of the responses evaluated as good for accuracy, relevance, completeness, clarity, and reliability were 87 (48.33%), 154 (85.53%), 136 (75.57%), 133 (73.87%), and 128 (71.10%), respectively. No statistically significant differences were identified in accuracy, relevance, completeness or clarity between the three LLMs. A statistically significant difference was identified in reliability (P < 0.001).
Conclusion: The three LLMs performed well overall and showed great potential for application in PAC consultations. The accuracy of the LLMs' responses should be improved through continuous training and evaluation.
期刊介绍:
Risk Management and Healthcare Policy is an international, peer-reviewed, open access journal focusing on all aspects of public health, policy and preventative measures to promote good health and improve morbidity and mortality in the population. Specific topics covered in the journal include:
Public and community health
Policy and law
Preventative and predictive healthcare
Risk and hazard management
Epidemiology, detection and screening
Lifestyle and diet modification
Vaccination and disease transmission/modification programs
Health and safety and occupational health
Healthcare services provision
Health literacy and education
Advertising and promotion of health issues
Health economic evaluations and resource management
Risk Management and Healthcare Policy focuses on human interventional and observational research. The journal welcomes submitted papers covering original research, clinical and epidemiological studies, reviews and evaluations, guidelines, expert opinion and commentary, and extended reports. Case reports will only be considered if they make a valuable and original contribution to the literature. The journal does not accept study protocols, animal-based or cell line-based studies.