{"title":"准确和可定制的文本分类算法:医疗保健中的两个应用","authors":"Mohammed D. Aldhoayan, Leming Zhou","doi":"10.1109/ICCABS.2016.7802778","DOIUrl":null,"url":null,"abstract":"Text classification is an important step in many data analysis procedures. The demand on text classification algorithm is booming due to the increase of the amount of digital data, especially in the healthcare field. A customizable and accurate algorithm is expected to produce positive impact on the efficiency of many data analysis procedures. In this work, we proposed a novel algorithm for accurately classifying data entries in huge text files into several pre-determined categories. We built the algorithm with multiple rules according to text similarity, frequency, and weight. For different classification tasks, the algorithm can be conveniently adjusted to process the corresponding data sets. Data sets related to healthcare cost analysis (hospital discharge summary) and medical classification systems (ICD-9) are used to evaluate the algorithm. When the algorithm is used on the ICD-9 data, the overall accuracy of the algorithm was 100%. After the algorithm was used on 7480 healthcare cost entries, the results were then compared with the ones processed manually by a physician, and the accuracy was between 86%–91.6%, and the difference is from different classification of ambiguous entries, which is hard to determine the correct category even when it is done manually because those entries were documented improperly. This new classification algorithm is 3 to 5 times faster than the manual process on the same data set. Therefore, this customizable and accurate text classification algorithm is effective in saving time compared to the manual classification methods.","PeriodicalId":306466,"journal":{"name":"2016 IEEE 6th International Conference on Computational Advances in Bio and Medical Sciences (ICCABS)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"An accurate and customizable text classification algorithm: Two applications in healthcare\",\"authors\":\"Mohammed D. Aldhoayan, Leming Zhou\",\"doi\":\"10.1109/ICCABS.2016.7802778\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Text classification is an important step in many data analysis procedures. The demand on text classification algorithm is booming due to the increase of the amount of digital data, especially in the healthcare field. A customizable and accurate algorithm is expected to produce positive impact on the efficiency of many data analysis procedures. In this work, we proposed a novel algorithm for accurately classifying data entries in huge text files into several pre-determined categories. We built the algorithm with multiple rules according to text similarity, frequency, and weight. For different classification tasks, the algorithm can be conveniently adjusted to process the corresponding data sets. Data sets related to healthcare cost analysis (hospital discharge summary) and medical classification systems (ICD-9) are used to evaluate the algorithm. When the algorithm is used on the ICD-9 data, the overall accuracy of the algorithm was 100%. After the algorithm was used on 7480 healthcare cost entries, the results were then compared with the ones processed manually by a physician, and the accuracy was between 86%–91.6%, and the difference is from different classification of ambiguous entries, which is hard to determine the correct category even when it is done manually because those entries were documented improperly. This new classification algorithm is 3 to 5 times faster than the manual process on the same data set. Therefore, this customizable and accurate text classification algorithm is effective in saving time compared to the manual classification methods.\",\"PeriodicalId\":306466,\"journal\":{\"name\":\"2016 IEEE 6th International Conference on Computational Advances in Bio and Medical Sciences (ICCABS)\",\"volume\":\"49 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 IEEE 6th International Conference on Computational Advances in Bio and Medical Sciences (ICCABS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCABS.2016.7802778\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE 6th International Conference on Computational Advances in Bio and Medical Sciences (ICCABS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCABS.2016.7802778","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An accurate and customizable text classification algorithm: Two applications in healthcare
Text classification is an important step in many data analysis procedures. The demand on text classification algorithm is booming due to the increase of the amount of digital data, especially in the healthcare field. A customizable and accurate algorithm is expected to produce positive impact on the efficiency of many data analysis procedures. In this work, we proposed a novel algorithm for accurately classifying data entries in huge text files into several pre-determined categories. We built the algorithm with multiple rules according to text similarity, frequency, and weight. For different classification tasks, the algorithm can be conveniently adjusted to process the corresponding data sets. Data sets related to healthcare cost analysis (hospital discharge summary) and medical classification systems (ICD-9) are used to evaluate the algorithm. When the algorithm is used on the ICD-9 data, the overall accuracy of the algorithm was 100%. After the algorithm was used on 7480 healthcare cost entries, the results were then compared with the ones processed manually by a physician, and the accuracy was between 86%–91.6%, and the difference is from different classification of ambiguous entries, which is hard to determine the correct category even when it is done manually because those entries were documented improperly. This new classification algorithm is 3 to 5 times faster than the manual process on the same data set. Therefore, this customizable and accurate text classification algorithm is effective in saving time compared to the manual classification methods.