Lukman Heryawan, Dian Novitaningrum, Kartika Rizqi Nastiti, Salsabila Nurulfarah Mahmudah
{"title":"利用 TF-IDF 和矢量空间模型 (VSM) 搜索病历文档","authors":"Lukman Heryawan, Dian Novitaningrum, Kartika Rizqi Nastiti, Salsabila Nurulfarah Mahmudah","doi":"10.18517/ijaseit.14.3.19606","DOIUrl":null,"url":null,"abstract":"The growth of medical record documents is increasing over time, and the various types of diseases and therapies needed are increasing. However, this has not been followed by an effective and efficient search process. This study aims to deal with search problems that often take a long time with search results that are not necessarily as expected by building a search model for medical record documents using the vector space model (VSM) and TF-IDF methods. The VSM method allows retrieval of results that are not the same as the search queries entered by the user but are expected to provide still results relevant to the user's desired needs. The model development process was taken based on the data in the FS_ANAMNESA and FS_DIAGNOSA columns, followed by preprocessing, which consists of deleting blank lines, lowercase, removing punctuation marks, HTML tags, stop words, excess spaces between words, and normalizing typo words, then forming a TF-IDF matrix based on the frequency of occurrence of each word feature, and followed by the calculation of the similarity value of the search query compared to medical record documents based on the cosine similarity formula. The retrieval results were all columns of each existing medical record document and were sorted based on 10 rows with the highest similarity value. The model evaluation results were based on 1000 medical record documents and tested with 20 search queries in this study, which gave an average precision value of 0.548 and an average recall value of 0.796.","PeriodicalId":14471,"journal":{"name":"International Journal on Advanced Science, Engineering and Information Technology","volume":"105 44","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Medical Record Document Search with TF-IDF and Vector Space Model (VSM)\",\"authors\":\"Lukman Heryawan, Dian Novitaningrum, Kartika Rizqi Nastiti, Salsabila Nurulfarah Mahmudah\",\"doi\":\"10.18517/ijaseit.14.3.19606\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The growth of medical record documents is increasing over time, and the various types of diseases and therapies needed are increasing. However, this has not been followed by an effective and efficient search process. This study aims to deal with search problems that often take a long time with search results that are not necessarily as expected by building a search model for medical record documents using the vector space model (VSM) and TF-IDF methods. The VSM method allows retrieval of results that are not the same as the search queries entered by the user but are expected to provide still results relevant to the user's desired needs. The model development process was taken based on the data in the FS_ANAMNESA and FS_DIAGNOSA columns, followed by preprocessing, which consists of deleting blank lines, lowercase, removing punctuation marks, HTML tags, stop words, excess spaces between words, and normalizing typo words, then forming a TF-IDF matrix based on the frequency of occurrence of each word feature, and followed by the calculation of the similarity value of the search query compared to medical record documents based on the cosine similarity formula. The retrieval results were all columns of each existing medical record document and were sorted based on 10 rows with the highest similarity value. The model evaluation results were based on 1000 medical record documents and tested with 20 search queries in this study, which gave an average precision value of 0.548 and an average recall value of 0.796.\",\"PeriodicalId\":14471,\"journal\":{\"name\":\"International Journal on Advanced Science, Engineering and Information Technology\",\"volume\":\"105 44\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-06-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal on Advanced Science, Engineering and Information Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.18517/ijaseit.14.3.19606\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"Agricultural and Biological Sciences\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal on Advanced Science, Engineering and Information Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18517/ijaseit.14.3.19606","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Agricultural and Biological Sciences","Score":null,"Total":0}
Medical Record Document Search with TF-IDF and Vector Space Model (VSM)
The growth of medical record documents is increasing over time, and the various types of diseases and therapies needed are increasing. However, this has not been followed by an effective and efficient search process. This study aims to deal with search problems that often take a long time with search results that are not necessarily as expected by building a search model for medical record documents using the vector space model (VSM) and TF-IDF methods. The VSM method allows retrieval of results that are not the same as the search queries entered by the user but are expected to provide still results relevant to the user's desired needs. The model development process was taken based on the data in the FS_ANAMNESA and FS_DIAGNOSA columns, followed by preprocessing, which consists of deleting blank lines, lowercase, removing punctuation marks, HTML tags, stop words, excess spaces between words, and normalizing typo words, then forming a TF-IDF matrix based on the frequency of occurrence of each word feature, and followed by the calculation of the similarity value of the search query compared to medical record documents based on the cosine similarity formula. The retrieval results were all columns of each existing medical record document and were sorted based on 10 rows with the highest similarity value. The model evaluation results were based on 1000 medical record documents and tested with 20 search queries in this study, which gave an average precision value of 0.548 and an average recall value of 0.796.
期刊介绍:
International Journal on Advanced Science, Engineering and Information Technology (IJASEIT) is an international peer-reviewed journal dedicated to interchange for the results of high quality research in all aspect of science, engineering and information technology. The journal publishes state-of-art papers in fundamental theory, experiments and simulation, as well as applications, with a systematic proposed method, sufficient review on previous works, expanded discussion and concise conclusion. As our commitment to the advancement of science and technology, the IJASEIT follows the open access policy that allows the published articles freely available online without any subscription. The journal scopes include (but not limited to) the followings: -Science: Bioscience & Biotechnology. Chemistry & Food Technology, Environmental, Health Science, Mathematics & Statistics, Applied Physics -Engineering: Architecture, Chemical & Process, Civil & structural, Electrical, Electronic & Systems, Geological & Mining Engineering, Mechanical & Materials -Information Science & Technology: Artificial Intelligence, Computer Science, E-Learning & Multimedia, Information System, Internet & Mobile Computing