Information Retrieval for Early Detection of Disease Using Semantic Similarity

Aszani Aszani, Hayyu Ilham Wicaksono, Uffi Nadzima, Lukman Heryawan
{"title":"Information Retrieval for Early Detection of Disease Using Semantic Similarity","authors":"Aszani Aszani, Hayyu Ilham Wicaksono, Uffi Nadzima, Lukman Heryawan","doi":"10.22146/ijccs.80077","DOIUrl":null,"url":null,"abstract":" The growth of medical records continues to increase and needs to be used to improve doctors' performance in diagnosing a disease. A retrieval method returns proposed information to provide diagnostic recommendations based on symptoms from medical record datasets by applying the TF-IDF and cosine similarity methods. The challenge in this study was that the symptoms in the medical record dataset were dirty data obtained from patients who were not familiar with biological terms. Therefore, the symptoms were matched in the medical record data with the symptom terms used in the system and from the results, data augmentation was carried out to increase the amount of data up to about 3 times more. In the TF-IDF the highest accuracy with  is only , while after augmentation of the test data, the accuracy becomes . The highest accuracy results with the same  value using the cosine similarity method is  and with the augmented test data accuracy increasing to . From this study it was concluded that a system with sufficient and relevant input of symptoms would provide a more accurate disease prediction. Prediction results using the TF-IDF method with  are more accurate than predictions using the cosine similarity method.","PeriodicalId":31625,"journal":{"name":"IJCCS Indonesian Journal of Computing and Cybernetics Systems","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IJCCS Indonesian Journal of Computing and Cybernetics Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.22146/ijccs.80077","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

 The growth of medical records continues to increase and needs to be used to improve doctors' performance in diagnosing a disease. A retrieval method returns proposed information to provide diagnostic recommendations based on symptoms from medical record datasets by applying the TF-IDF and cosine similarity methods. The challenge in this study was that the symptoms in the medical record dataset were dirty data obtained from patients who were not familiar with biological terms. Therefore, the symptoms were matched in the medical record data with the symptom terms used in the system and from the results, data augmentation was carried out to increase the amount of data up to about 3 times more. In the TF-IDF the highest accuracy with  is only , while after augmentation of the test data, the accuracy becomes . The highest accuracy results with the same  value using the cosine similarity method is  and with the augmented test data accuracy increasing to . From this study it was concluded that a system with sufficient and relevant input of symptoms would provide a more accurate disease prediction. Prediction results using the TF-IDF method with  are more accurate than predictions using the cosine similarity method.
基于语义相似度的疾病早期检测信息检索
医疗记录的增长持续增加,需要用来提高医生诊断疾病的表现。检索方法通过应用TF-IDF和余弦相似性方法,返回建议的信息,以基于来自医疗记录数据集的症状提供诊断建议。这项研究的挑战是,病历数据集中的症状是从不熟悉生物学术语的患者那里获得的肮脏数据。因此,将病历数据中的症状与系统中使用的症状术语相匹配,并根据结果进行数据扩充,将数据量增加约3倍。在TF-IDF中,最高精度仅为,而在增加测试数据后,精度变为。使用余弦相似性方法得到的具有相同值的最高精度结果是,并且随着增强测试数据精度增加到。根据这项研究得出的结论是,一个具有足够和相关症状输入的系统将提供更准确的疾病预测。使用TF-IDF方法的预测结果比使用余弦相似性方法的预测更准确。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
20
审稿时长
12 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信