Detecting Foodborne Illness Complaints in Multiple Languages Using English Annotations Only

Ziyi Liu, Giannis Karamanolakis, Daniel J. Hsu, L. Gravano
{"title":"Detecting Foodborne Illness Complaints in Multiple Languages Using English Annotations Only","authors":"Ziyi Liu, Giannis Karamanolakis, Daniel J. Hsu, L. Gravano","doi":"10.18653/v1/2020.louhi-1.15","DOIUrl":null,"url":null,"abstract":"Health departments have been deploying text classification systems for the early detection of foodborne illness complaints in social media documents such as Yelp restaurant reviews. Current systems have been successfully applied for documents in English and, as a result, a promising direction is to increase coverage and recall by considering documents in additional languages, such as Spanish or Chinese. Training previous systems for more languages, however, would be expensive, as it would require the manual annotation of many documents for each new target language. To address this challenge, we consider cross-lingual learning and train multilingual classifiers using only the annotations for English-language reviews. Recent zero-shot approaches based on pre-trained multi-lingual BERT (mBERT) have been shown to effectively align languages for aspects such as sentiment. Interestingly, we show that those approaches are less effective for capturing the nuances of foodborne illness, our public health application of interest. To improve performance without extra annotations, we create artificial training documents in the target language through machine translation and train mBERT jointly for the source (English) and target language. Furthermore, we show that translating labeled documents to multiple languages leads to additional performance improvements for some target languages. We demonstrate the benefits of our approach through extensive experiments with Yelp restaurant reviews in seven languages. Our classifiers identify foodborne illness complaints in multilingual reviews from the Yelp Challenge dataset, which highlights the potential of our general approach for deployment in health departments.","PeriodicalId":448872,"journal":{"name":"International Workshop on Health Text Mining and Information Analysis","volume":"82 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Workshop on Health Text Mining and Information Analysis","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18653/v1/2020.louhi-1.15","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Health departments have been deploying text classification systems for the early detection of foodborne illness complaints in social media documents such as Yelp restaurant reviews. Current systems have been successfully applied for documents in English and, as a result, a promising direction is to increase coverage and recall by considering documents in additional languages, such as Spanish or Chinese. Training previous systems for more languages, however, would be expensive, as it would require the manual annotation of many documents for each new target language. To address this challenge, we consider cross-lingual learning and train multilingual classifiers using only the annotations for English-language reviews. Recent zero-shot approaches based on pre-trained multi-lingual BERT (mBERT) have been shown to effectively align languages for aspects such as sentiment. Interestingly, we show that those approaches are less effective for capturing the nuances of foodborne illness, our public health application of interest. To improve performance without extra annotations, we create artificial training documents in the target language through machine translation and train mBERT jointly for the source (English) and target language. Furthermore, we show that translating labeled documents to multiple languages leads to additional performance improvements for some target languages. We demonstrate the benefits of our approach through extensive experiments with Yelp restaurant reviews in seven languages. Our classifiers identify foodborne illness complaints in multilingual reviews from the Yelp Challenge dataset, which highlights the potential of our general approach for deployment in health departments.
仅使用英文注释检测多语言食源性疾病投诉
卫生部门一直在部署文本分类系统,以便在Yelp餐厅评论等社交媒体文件中早期发现食源性疾病投诉。目前的系统已经成功地应用于英语文档,因此,一个有希望的方向是通过考虑其他语言(如西班牙语或中文)的文档来增加覆盖率和召回率。然而,为更多的语言训练以前的系统将是昂贵的,因为它需要为每种新的目标语言手工注释许多文档。为了应对这一挑战,我们考虑跨语言学习,并只使用英语评论的注释来训练多语言分类器。最近基于预训练多语言BERT (mBERT)的零射击方法已被证明可以有效地在情感等方面对齐语言。有趣的是,我们表明这些方法在捕捉食源性疾病的细微差别方面效果较差,这是我们感兴趣的公共卫生应用。为了在没有额外注释的情况下提高性能,我们通过机器翻译在目标语言中创建人工训练文档,并对源语言(英语)和目标语言联合训练mBERT。此外,我们还展示了将标记文档翻译成多种语言可以为某些目标语言带来额外的性能改进。我们通过对7种语言的Yelp餐厅评论进行广泛的实验来证明我们的方法的好处。我们的分类器识别来自Yelp挑战数据集的多语言评论中的食源性疾病投诉,这突出了我们在卫生部门部署的一般方法的潜力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信