{"title":"基于预训练词嵌入和预训练变形的深度学习方法用于极端多标签文本分类","authors":"Necdet Eren Erciyes, A. K. Görür","doi":"10.1109/UBMK52708.2021.9558977","DOIUrl":null,"url":null,"abstract":"In recent years, there has been a considerable increase in textual documents online. This increase requires the creation of highly improved machine learning methods to classify text in many different domains. The effectiveness of these machine learning methods depends on the model capacity to understand the complex nature of the unstructured data and the relations of features that exist. Many different machine learning methods were proposed for a long time to solve text classification problems, such as SVM, kNN, and Rocchio classification. These shallow learning methods have achieved doubtless success in many different domains. For big and unstructured data like text, deep learning methods which can learn representations and features from the input data wtihout using any feature extraction methods have shown to be one of the major solutions. In this study, we explore the accuracy of recent recommended deep learning methods for multi-label text classification starting with simple RNN, CNN models to pretrained transformer models. We evaluated these methods’ performances by computing multi-label evaluation metrics and compared the results with the previous studies.","PeriodicalId":106516,"journal":{"name":"2021 6th International Conference on Computer Science and Engineering (UBMK)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Deep Learning Methods with Pre-Trained Word Embeddings and Pre-Trained Transformers for Extreme Multi-Label Text Classification\",\"authors\":\"Necdet Eren Erciyes, A. K. Görür\",\"doi\":\"10.1109/UBMK52708.2021.9558977\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In recent years, there has been a considerable increase in textual documents online. This increase requires the creation of highly improved machine learning methods to classify text in many different domains. The effectiveness of these machine learning methods depends on the model capacity to understand the complex nature of the unstructured data and the relations of features that exist. Many different machine learning methods were proposed for a long time to solve text classification problems, such as SVM, kNN, and Rocchio classification. These shallow learning methods have achieved doubtless success in many different domains. For big and unstructured data like text, deep learning methods which can learn representations and features from the input data wtihout using any feature extraction methods have shown to be one of the major solutions. In this study, we explore the accuracy of recent recommended deep learning methods for multi-label text classification starting with simple RNN, CNN models to pretrained transformer models. We evaluated these methods’ performances by computing multi-label evaluation metrics and compared the results with the previous studies.\",\"PeriodicalId\":106516,\"journal\":{\"name\":\"2021 6th International Conference on Computer Science and Engineering (UBMK)\",\"volume\":\"28 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-09-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 6th International Conference on Computer Science and Engineering (UBMK)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/UBMK52708.2021.9558977\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 6th International Conference on Computer Science and Engineering (UBMK)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/UBMK52708.2021.9558977","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Deep Learning Methods with Pre-Trained Word Embeddings and Pre-Trained Transformers for Extreme Multi-Label Text Classification
In recent years, there has been a considerable increase in textual documents online. This increase requires the creation of highly improved machine learning methods to classify text in many different domains. The effectiveness of these machine learning methods depends on the model capacity to understand the complex nature of the unstructured data and the relations of features that exist. Many different machine learning methods were proposed for a long time to solve text classification problems, such as SVM, kNN, and Rocchio classification. These shallow learning methods have achieved doubtless success in many different domains. For big and unstructured data like text, deep learning methods which can learn representations and features from the input data wtihout using any feature extraction methods have shown to be one of the major solutions. In this study, we explore the accuracy of recent recommended deep learning methods for multi-label text classification starting with simple RNN, CNN models to pretrained transformer models. We evaluated these methods’ performances by computing multi-label evaluation metrics and compared the results with the previous studies.