Towards a disease prediction system: BioBERT-based medical profile representation

IAES International Journal of Artificial Intelligence (IJ-AI) Pub Date : 2024-06-01 DOI:10.11591/ijai.v13.i2.pp2314-2322

Rima Hatoum, Ali Alkhazraji, Z. Ibrahim, Houssein Dhayni, Ihab Sbeity

{"title":"Towards a disease prediction system: BioBERT-based medical profile representation","authors":"Rima Hatoum, Ali Alkhazraji, Z. Ibrahim, Houssein Dhayni, Ihab Sbeity","doi":"10.11591/ijai.v13.i2.pp2314-2322","DOIUrl":null,"url":null,"abstract":"Healthcare professionals are increasingly interested in predicting diseases before they manifest, as this can prevent more serious health conditions and even save lives. Machine learning techniques are now playing an important role in healthcare, including in the early prediction of diseases based on prior medical knowledge. However, one of the biggest challenges is how to represent medical information in a way that can be processed by machine learning algorithms. Medical histories are often in a format that computers cannot read, so filtering and converting this information into numerical representations is a crucial step. This process has become easier with the advancement of natural language processing techniques. In this paper, we propose three representations of medical information, two of which are based on BioBERT, the latest text representation techniques for the biomedical sector. The efficiency of these representations is tested on the MIMIC-III database, which contains information on 46,520 patients. The focus of the study is on predicting Coronary Artery Disease, and the results demonstrate the effectiveness of the proposed approach. The study highlights the importance of medical history in disease prediction and demonstrates the potential of machine learning techniques to advance healthcare.","PeriodicalId":507934,"journal":{"name":"IAES International Journal of Artificial Intelligence (IJ-AI)","volume":"7 2","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IAES International Journal of Artificial Intelligence (IJ-AI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.11591/ijai.v13.i2.pp2314-2322","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Healthcare professionals are increasingly interested in predicting diseases before they manifest, as this can prevent more serious health conditions and even save lives. Machine learning techniques are now playing an important role in healthcare, including in the early prediction of diseases based on prior medical knowledge. However, one of the biggest challenges is how to represent medical information in a way that can be processed by machine learning algorithms. Medical histories are often in a format that computers cannot read, so filtering and converting this information into numerical representations is a crucial step. This process has become easier with the advancement of natural language processing techniques. In this paper, we propose three representations of medical information, two of which are based on BioBERT, the latest text representation techniques for the biomedical sector. The efficiency of these representations is tested on the MIMIC-III database, which contains information on 46,520 patients. The focus of the study is on predicting Coronary Artery Disease, and the results demonstrate the effectiveness of the proposed approach. The study highlights the importance of medical history in disease prediction and demonstrates the potential of machine learning techniques to advance healthcare.

查看原文本刊更多论文

迈向疾病预测系统：基于 BioBERT 的医疗档案表示法

医疗保健专业人员对在疾病显现之前进行预测越来越感兴趣，因为这可以预防更严重的健康问题，甚至挽救生命。目前，机器学习技术在医疗保健领域发挥着重要作用，包括根据先前的医学知识对疾病进行早期预测。然而，最大的挑战之一是如何以机器学习算法可以处理的方式表示医疗信息。病史通常采用计算机无法读取的格式，因此过滤这些信息并将其转换为数字表示法是至关重要的一步。随着自然语言处理技术的发展，这一过程变得更加容易。在本文中，我们提出了三种医学信息表示法，其中两种基于生物医学领域最新的文本表示技术 BioBERT。这些表示法的效率在 MIMIC-III 数据库中进行了测试，该数据库包含 46520 名患者的信息。研究的重点是预测冠状动脉疾病，结果证明了所建议方法的有效性。该研究强调了病史在疾病预测中的重要性，并展示了机器学习技术在推进医疗保健方面的潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IAES International Journal of Artificial Intelligence (IJ-AI)

自引率

0.00%

发文量