会话AI中处理医疗查询的风险分级安全性

Q3 Environmental Science
Gavin Abercrombie, Verena Rieser
{"title":"会话AI中处理医疗查询的风险分级安全性","authors":"Gavin Abercrombie, Verena Rieser","doi":"10.48550/arXiv.2210.00572","DOIUrl":null,"url":null,"abstract":"Conversational AI systems can engage in unsafe behaviour when handling users’ medical queries that may have severe consequences and could even lead to deaths. Systems therefore need to be capable of both recognising the seriousness of medical inputs and producing responses with appropriate levels of risk. We create a corpus of human written English language medical queries and the responses of different types of systems. We label these with both crowdsourced and expert annotations. While individual crowdworkers may be unreliable at grading the seriousness of the prompts, their aggregated labels tend to agree with professional opinion to a greater extent on identifying the medical queries and recognising the risk types posed by the responses. Results of classification experiments suggest that, while these tasks can be automated, caution should be exercised, as errors can potentially be very serious.","PeriodicalId":39298,"journal":{"name":"AACL Bioflux","volume":"100 1","pages":"234-243"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Risk-graded Safety for Handling Medical Queries in Conversational AI\",\"authors\":\"Gavin Abercrombie, Verena Rieser\",\"doi\":\"10.48550/arXiv.2210.00572\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Conversational AI systems can engage in unsafe behaviour when handling users’ medical queries that may have severe consequences and could even lead to deaths. Systems therefore need to be capable of both recognising the seriousness of medical inputs and producing responses with appropriate levels of risk. We create a corpus of human written English language medical queries and the responses of different types of systems. We label these with both crowdsourced and expert annotations. While individual crowdworkers may be unreliable at grading the seriousness of the prompts, their aggregated labels tend to agree with professional opinion to a greater extent on identifying the medical queries and recognising the risk types posed by the responses. Results of classification experiments suggest that, while these tasks can be automated, caution should be exercised, as errors can potentially be very serious.\",\"PeriodicalId\":39298,\"journal\":{\"name\":\"AACL Bioflux\",\"volume\":\"100 1\",\"pages\":\"234-243\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"AACL Bioflux\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.48550/arXiv.2210.00572\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"Environmental Science\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"AACL Bioflux","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2210.00572","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Environmental Science","Score":null,"Total":0}
引用次数: 6

摘要

会话式人工智能系统在处理用户的医疗查询时可能会采取不安全的行为,这可能会产生严重的后果,甚至可能导致死亡。因此,系统需要既能认识到医疗投入的严重性,又能以适当的风险水平作出反应。我们创建了一个人类书面英语医学查询和不同类型系统响应的语料库。我们用众包注释和专家注释来标记它们。虽然个别众包工作者在对提示的严重性进行分级方面可能不可靠,但他们的综合标签在识别医疗问题和识别回答所带来的风险类型方面往往更符合专业意见。分类实验的结果表明,虽然这些任务可以自动化,但应该谨慎行事,因为错误可能非常严重。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Risk-graded Safety for Handling Medical Queries in Conversational AI
Conversational AI systems can engage in unsafe behaviour when handling users’ medical queries that may have severe consequences and could even lead to deaths. Systems therefore need to be capable of both recognising the seriousness of medical inputs and producing responses with appropriate levels of risk. We create a corpus of human written English language medical queries and the responses of different types of systems. We label these with both crowdsourced and expert annotations. While individual crowdworkers may be unreliable at grading the seriousness of the prompts, their aggregated labels tend to agree with professional opinion to a greater extent on identifying the medical queries and recognising the risk types posed by the responses. Results of classification experiments suggest that, while these tasks can be automated, caution should be exercised, as errors can potentially be very serious.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
AACL Bioflux
AACL Bioflux Environmental Science-Management, Monitoring, Policy and Law
CiteScore
1.40
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信