{"title":"使用N-Gram机器学习方法的医学转录文本分类","authors":"Lee Kah Win, Gan Keng Hoon","doi":"10.1109/IICAIET55139.2022.9936867","DOIUrl":null,"url":null,"abstract":"Medical domain is in a data rich environment that a variety of knowledge can be extracted for positive outcomes. This research work will show multiclass classification of medical transcriptions using a real dataset. The objective of this paper is to classify medical transcriptions based on the medical specialty labels, namely Discharge Summary, Neurosurgery and ENT. Text normalisation has performed followed by extracting five different n-gram feature representations are. Moreover, three supervised learning classifiers were trained on each of the n-gram feature representations, namely K-Nearest Neighbours, Decision Tree, and Random Forest. The classification performance was evaluated by the metric score of macro F1. The best score achieved was 0.93 macro F1 on testing set using tuned Random Forest and unigram feature vectors.","PeriodicalId":142482,"journal":{"name":"2022 IEEE International Conference on Artificial Intelligence in Engineering and Technology (IICAIET)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Text Classification of Medical Transcriptions using N-Gram Machine Learning Approach\",\"authors\":\"Lee Kah Win, Gan Keng Hoon\",\"doi\":\"10.1109/IICAIET55139.2022.9936867\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Medical domain is in a data rich environment that a variety of knowledge can be extracted for positive outcomes. This research work will show multiclass classification of medical transcriptions using a real dataset. The objective of this paper is to classify medical transcriptions based on the medical specialty labels, namely Discharge Summary, Neurosurgery and ENT. Text normalisation has performed followed by extracting five different n-gram feature representations are. Moreover, three supervised learning classifiers were trained on each of the n-gram feature representations, namely K-Nearest Neighbours, Decision Tree, and Random Forest. The classification performance was evaluated by the metric score of macro F1. The best score achieved was 0.93 macro F1 on testing set using tuned Random Forest and unigram feature vectors.\",\"PeriodicalId\":142482,\"journal\":{\"name\":\"2022 IEEE International Conference on Artificial Intelligence in Engineering and Technology (IICAIET)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-09-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE International Conference on Artificial Intelligence in Engineering and Technology (IICAIET)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IICAIET55139.2022.9936867\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Artificial Intelligence in Engineering and Technology (IICAIET)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IICAIET55139.2022.9936867","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Text Classification of Medical Transcriptions using N-Gram Machine Learning Approach
Medical domain is in a data rich environment that a variety of knowledge can be extracted for positive outcomes. This research work will show multiclass classification of medical transcriptions using a real dataset. The objective of this paper is to classify medical transcriptions based on the medical specialty labels, namely Discharge Summary, Neurosurgery and ENT. Text normalisation has performed followed by extracting five different n-gram feature representations are. Moreover, three supervised learning classifiers were trained on each of the n-gram feature representations, namely K-Nearest Neighbours, Decision Tree, and Random Forest. The classification performance was evaluated by the metric score of macro F1. The best score achieved was 0.93 macro F1 on testing set using tuned Random Forest and unigram feature vectors.