Classifying Drug Ratings Using User Reviews with Transformer-Based Language Models.

IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics Pub Date : 2022-06-01 DOI:10.1109/ichi54592.2022.00035

Akhil Shiju, Zhe He

{"title":"Classifying Drug Ratings Using User Reviews with Transformer-Based Language Models.","authors":"Akhil Shiju, Zhe He","doi":"10.1109/ichi54592.2022.00035","DOIUrl":null,"url":null,"abstract":"<p><p>Drug review websites such as Drugs.com provide users' textual reviews and numeric ratings of drugs. These reviews along with the ratings are used for the consumers for choosing a drug. However, the numeric ratings may not always be consistent with text reviews and purely relying on the rating score for finding positive/negative reviews may not be reliable. Automatic classification of user ratings based on textual review can create a more reliable rating for drugs. In this project, we built classification models to classify drug review ratings using textual reviews with traditional machine learning and deep learning models. Traditional machine learning models including Random Forest and Naive Bayesian classifiers were built using TF-IDF features as input. Also, transformer-based neural network models including BERT, Bio_ClinicalBERT, RoBERTa, XLNet, ELECTRA, and ALBERT were built using the raw text as input. Overall, Bio_ClinicalBERT model outperformed the other models with an overall accuracy of 87%. We further identified concepts of the Unified Medical Language System (UMLS) from the postings and analyzed their semantic types stratified by class types. This research demonstrated that transformer-based models can be used to classify drug reviews based solely on textual reviews.</p>","PeriodicalId":73284,"journal":{"name":"IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics","volume":"2022 ","pages":"163-169"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9744636/pdf/nihms-1855900.pdf","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ichi54592.2022.00035","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

Abstract

Drug review websites such as Drugs.com provide users' textual reviews and numeric ratings of drugs. These reviews along with the ratings are used for the consumers for choosing a drug. However, the numeric ratings may not always be consistent with text reviews and purely relying on the rating score for finding positive/negative reviews may not be reliable. Automatic classification of user ratings based on textual review can create a more reliable rating for drugs. In this project, we built classification models to classify drug review ratings using textual reviews with traditional machine learning and deep learning models. Traditional machine learning models including Random Forest and Naive Bayesian classifiers were built using TF-IDF features as input. Also, transformer-based neural network models including BERT, Bio_ClinicalBERT, RoBERTa, XLNet, ELECTRA, and ALBERT were built using the raw text as input. Overall, Bio_ClinicalBERT model outperformed the other models with an overall accuracy of 87%. We further identified concepts of the Unified Medical Language System (UMLS) from the postings and analyzed their semantic types stratified by class types. This research demonstrated that transformer-based models can be used to classify drug reviews based solely on textual reviews.

Abstract Image

查看原文本刊更多论文

使用基于变压器的语言模型的用户评论对药物评级进行分类。

Drugs.com等药物评论网站提供用户对药物的文字评论和数字评级。这些评论与评级一起用于消费者选择药物。然而，数字评级可能并不总是与文本评论一致，纯粹依靠评级分数来寻找正面/负面评论可能并不可靠。基于文本审查的用户评级自动分类可以为药物创建更可靠的评级。在这个项目中，我们建立了分类模型，使用传统机器学习和深度学习模型的文本评论对药物审评评级进行分类。传统的机器学习模型包括随机森林和朴素贝叶斯分类器，使用TF-IDF特征作为输入。此外，基于变压器的神经网络模型包括BERT, Bio_ClinicalBERT, RoBERTa, XLNet, ELECTRA和ALBERT使用原始文本作为输入。总体而言，Bio_ClinicalBERT模型以87%的总体准确率优于其他模型。我们进一步从帖子中确定了统一医学语言系统(UMLS)的概念，并分析了它们按类类型分层的语义类型。本研究表明，基于变压器的模型可以用于仅基于文本评论的药物评论分类。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics

自引率

0.00%

发文量