文本挖掘中改进的特征选择方法TFIDF

Proceedings. International Conference on Machine Learning and Cybernetics Pub Date : 2002-11-04 DOI:10.1109/ICMLC.2002.1174522

L. Jing, Houkuan Huang, Hong-bo Shi

{"title":"文本挖掘中改进的特征选择方法TFIDF","authors":"L. Jing, Houkuan Huang, Hong-bo Shi","doi":"10.1109/ICMLC.2002.1174522","DOIUrl":null,"url":null,"abstract":"This paper describes the feature selection method TFIDF (term frequency, inverse document frequency). With it, we process the data resource and set up the vector space model in order to provide a convenient data structure for text categorization. We calculate the precision of this method with the help of categorization results. According to the empirical results, we analyze its advantages and disadvantages and present a new TFIDF-based feature selection approach to improve its accuracy.","PeriodicalId":90702,"journal":{"name":"Proceedings. International Conference on Machine Learning and Cybernetics","volume":"87 1","pages":"944-946 vol.2"},"PeriodicalIF":0.0000,"publicationDate":"2002-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"207","resultStr":"{\"title\":\"Improved feature selection approach TFIDF in text mining\",\"authors\":\"L. Jing, Houkuan Huang, Hong-bo Shi\",\"doi\":\"10.1109/ICMLC.2002.1174522\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper describes the feature selection method TFIDF (term frequency, inverse document frequency). With it, we process the data resource and set up the vector space model in order to provide a convenient data structure for text categorization. We calculate the precision of this method with the help of categorization results. According to the empirical results, we analyze its advantages and disadvantages and present a new TFIDF-based feature selection approach to improve its accuracy.\",\"PeriodicalId\":90702,\"journal\":{\"name\":\"Proceedings. International Conference on Machine Learning and Cybernetics\",\"volume\":\"87 1\",\"pages\":\"944-946 vol.2\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2002-11-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"207\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings. International Conference on Machine Learning and Cybernetics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICMLC.2002.1174522\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. International Conference on Machine Learning and Cybernetics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLC.2002.1174522","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 207

摘要

本文描述了特征选择方法TFIDF (term frequency, inverse document frequency)。通过对数据资源进行处理，建立向量空间模型，为文本分类提供方便的数据结构。我们利用分类结果计算了该方法的精度。根据实证结果，分析了其优缺点，提出了一种新的基于tfidf的特征选择方法来提高其准确率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Improved feature selection approach TFIDF in text mining

This paper describes the feature selection method TFIDF (term frequency, inverse document frequency). With it, we process the data resource and set up the vector space model in order to provide a convenient data structure for text categorization. We calculate the precision of this method with the help of categorization results. According to the empirical results, we analyze its advantages and disadvantages and present a new TFIDF-based feature selection approach to improve its accuracy.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings. International Conference on Machine Learning and Cybernetics

自引率

0.00%

发文量