Classifying Drug Ratings Using User Reviews with Transformer-Based Language Models.

Akhil Shiju, Zhe He
{"title":"Classifying Drug Ratings Using User Reviews with Transformer-Based Language Models.","authors":"Akhil Shiju,&nbsp;Zhe He","doi":"10.1109/ichi54592.2022.00035","DOIUrl":null,"url":null,"abstract":"<p><p>Drug review websites such as Drugs.com provide users' textual reviews and numeric ratings of drugs. These reviews along with the ratings are used for the consumers for choosing a drug. However, the numeric ratings may not always be consistent with text reviews and purely relying on the rating score for finding positive/negative reviews may not be reliable. Automatic classification of user ratings based on textual review can create a more reliable rating for drugs. In this project, we built classification models to classify drug review ratings using textual reviews with traditional machine learning and deep learning models. Traditional machine learning models including Random Forest and Naive Bayesian classifiers were built using TF-IDF features as input. Also, transformer-based neural network models including BERT, Bio_ClinicalBERT, RoBERTa, XLNet, ELECTRA, and ALBERT were built using the raw text as input. Overall, Bio_ClinicalBERT model outperformed the other models with an overall accuracy of 87%. We further identified concepts of the Unified Medical Language System (UMLS) from the postings and analyzed their semantic types stratified by class types. This research demonstrated that transformer-based models can be used to classify drug reviews based solely on textual reviews.</p>","PeriodicalId":73284,"journal":{"name":"IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9744636/pdf/nihms-1855900.pdf","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ichi54592.2022.00035","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

Abstract

Drug review websites such as Drugs.com provide users' textual reviews and numeric ratings of drugs. These reviews along with the ratings are used for the consumers for choosing a drug. However, the numeric ratings may not always be consistent with text reviews and purely relying on the rating score for finding positive/negative reviews may not be reliable. Automatic classification of user ratings based on textual review can create a more reliable rating for drugs. In this project, we built classification models to classify drug review ratings using textual reviews with traditional machine learning and deep learning models. Traditional machine learning models including Random Forest and Naive Bayesian classifiers were built using TF-IDF features as input. Also, transformer-based neural network models including BERT, Bio_ClinicalBERT, RoBERTa, XLNet, ELECTRA, and ALBERT were built using the raw text as input. Overall, Bio_ClinicalBERT model outperformed the other models with an overall accuracy of 87%. We further identified concepts of the Unified Medical Language System (UMLS) from the postings and analyzed their semantic types stratified by class types. This research demonstrated that transformer-based models can be used to classify drug reviews based solely on textual reviews.

Abstract Image

Abstract Image

使用基于变压器的语言模型的用户评论对药物评级进行分类。
Drugs.com等药物评论网站提供用户对药物的文字评论和数字评级。这些评论与评级一起用于消费者选择药物。然而,数字评级可能并不总是与文本评论一致,纯粹依靠评级分数来寻找正面/负面评论可能并不可靠。基于文本审查的用户评级自动分类可以为药物创建更可靠的评级。在这个项目中,我们建立了分类模型,使用传统机器学习和深度学习模型的文本评论对药物审评评级进行分类。传统的机器学习模型包括随机森林和朴素贝叶斯分类器,使用TF-IDF特征作为输入。此外,基于变压器的神经网络模型包括BERT, Bio_ClinicalBERT, RoBERTa, XLNet, ELECTRA和ALBERT使用原始文本作为输入。总体而言,Bio_ClinicalBERT模型以87%的总体准确率优于其他模型。我们进一步从帖子中确定了统一医学语言系统(UMLS)的概念,并分析了它们按类类型分层的语义类型。本研究表明,基于变压器的模型可以用于仅基于文本评论的药物评论分类。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
文献相关原料
公司名称 产品信息 采购帮参考价格
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信