{"title":"Research on Text Classification Method Based on PTF-IDF and Cosine Similarity","authors":"Y. Liu, Qi Xu, Zeshen Tang","doi":"10.1109/ICIIBMS46890.2019.8991542","DOIUrl":null,"url":null,"abstract":"Text classification is a foundational task in many NLP applications. The text classification task in the era of big data faces new challenges. We propose a Promoted TF-IDF (Promoted-TF-IDF) and cosine similarity method for text classification. In our model, with the pre-trained word segmentation tool, we apply PTF-IDF method to judge which words play key roles in text classification to capture the key components in category. We also apply Cosine Similarity algorithm to judge similarity between text and category. We conduct experiments on commonly used datasets. The experimental results show that the proposed method outperforms the state-of-the-art methods on several datasets.","PeriodicalId":444797,"journal":{"name":"2019 International Conference on Intelligent Informatics and Biomedical Sciences (ICIIBMS)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on Intelligent Informatics and Biomedical Sciences (ICIIBMS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIIBMS46890.2019.8991542","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
Text classification is a foundational task in many NLP applications. The text classification task in the era of big data faces new challenges. We propose a Promoted TF-IDF (Promoted-TF-IDF) and cosine similarity method for text classification. In our model, with the pre-trained word segmentation tool, we apply PTF-IDF method to judge which words play key roles in text classification to capture the key components in category. We also apply Cosine Similarity algorithm to judge similarity between text and category. We conduct experiments on commonly used datasets. The experimental results show that the proposed method outperforms the state-of-the-art methods on several datasets.