{"title":"Rough set based feature selection approach for text mining","authors":"N. Sailaja, L. P. Sree, N. Mangathayaru","doi":"10.1109/IC3I.2016.7917932","DOIUrl":null,"url":null,"abstract":"Text can be thought as the combination of characters. In the environment where the size of unstructured text data is hugely more, to process such data by computers is a challenging task. To extract meaningful and useful patterns from the text, some pre-processing methods and algorithms are required. Feature selection or Reduct generation intends to determine a smallest attributes subset which can represent the same knowledge as the original features(attributes) represented it. Rough set theory (RST) is such a mathematical tool, which can be used with tremendous success. Here, In the paper, we proposed a Rough set based approach for feature selection in the Text data set, which fulfil the aim of Text mining. We have taken different sample Text case documents (like biography text data, sample research articles of various domains, news articles from some sources) as input, these files can be in the form of .txt, .pdf etc. or any other format. We have also presented complexity analysis of our proposed algorithm and experimental results on a sample text data sets.","PeriodicalId":305971,"journal":{"name":"2016 2nd International Conference on Contemporary Computing and Informatics (IC3I)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 2nd International Conference on Contemporary Computing and Informatics (IC3I)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IC3I.2016.7917932","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Text can be thought as the combination of characters. In the environment where the size of unstructured text data is hugely more, to process such data by computers is a challenging task. To extract meaningful and useful patterns from the text, some pre-processing methods and algorithms are required. Feature selection or Reduct generation intends to determine a smallest attributes subset which can represent the same knowledge as the original features(attributes) represented it. Rough set theory (RST) is such a mathematical tool, which can be used with tremendous success. Here, In the paper, we proposed a Rough set based approach for feature selection in the Text data set, which fulfil the aim of Text mining. We have taken different sample Text case documents (like biography text data, sample research articles of various domains, news articles from some sources) as input, these files can be in the form of .txt, .pdf etc. or any other format. We have also presented complexity analysis of our proposed algorithm and experimental results on a sample text data sets.