Risk-graded Safety for Handling Medical Queries in Conversational AI

Q3 Environmental Science

AACL Bioflux Pub Date : 2022-10-02 DOI:10.48550/arXiv.2210.00572

Gavin Abercrombie, Verena Rieser

引用次数: 6

Abstract

Conversational AI systems can engage in unsafe behaviour when handling users’ medical queries that may have severe consequences and could even lead to deaths. Systems therefore need to be capable of both recognising the seriousness of medical inputs and producing responses with appropriate levels of risk. We create a corpus of human written English language medical queries and the responses of different types of systems. We label these with both crowdsourced and expert annotations. While individual crowdworkers may be unreliable at grading the seriousness of the prompts, their aggregated labels tend to agree with professional opinion to a greater extent on identifying the medical queries and recognising the risk types posed by the responses. Results of classification experiments suggest that, while these tasks can be automated, caution should be exercised, as errors can potentially be very serious.

查看原文本刊更多论文

会话AI中处理医疗查询的风险分级安全性

会话式人工智能系统在处理用户的医疗查询时可能会采取不安全的行为，这可能会产生严重的后果，甚至可能导致死亡。因此，系统需要既能认识到医疗投入的严重性，又能以适当的风险水平作出反应。我们创建了一个人类书面英语医学查询和不同类型系统响应的语料库。我们用众包注释和专家注释来标记它们。虽然个别众包工作者在对提示的严重性进行分级方面可能不可靠，但他们的综合标签在识别医疗问题和识别回答所带来的风险类型方面往往更符合专业意见。分类实验的结果表明，虽然这些任务可以自动化，但应该谨慎行事，因为错误可能非常严重。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

AACL Bioflux Environmental Science-Management, Monitoring, Policy and Law

CiteScore

1.40

自引率

0.00%

发文量