基于粗糙集的文本挖掘特征选择方法

2016 2nd International Conference on Contemporary Computing and Informatics (IC3I) Pub Date : 2016-12-01 DOI:10.1109/IC3I.2016.7917932

N. Sailaja, L. P. Sree, N. Mangathayaru

{"title":"基于粗糙集的文本挖掘特征选择方法","authors":"N. Sailaja, L. P. Sree, N. Mangathayaru","doi":"10.1109/IC3I.2016.7917932","DOIUrl":null,"url":null,"abstract":"Text can be thought as the combination of characters. In the environment where the size of unstructured text data is hugely more, to process such data by computers is a challenging task. To extract meaningful and useful patterns from the text, some pre-processing methods and algorithms are required. Feature selection or Reduct generation intends to determine a smallest attributes subset which can represent the same knowledge as the original features(attributes) represented it. Rough set theory (RST) is such a mathematical tool, which can be used with tremendous success. Here, In the paper, we proposed a Rough set based approach for feature selection in the Text data set, which fulfil the aim of Text mining. We have taken different sample Text case documents (like biography text data, sample research articles of various domains, news articles from some sources) as input, these files can be in the form of .txt, .pdf etc. or any other format. We have also presented complexity analysis of our proposed algorithm and experimental results on a sample text data sets.","PeriodicalId":305971,"journal":{"name":"2016 2nd International Conference on Contemporary Computing and Informatics (IC3I)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Rough set based feature selection approach for text mining\",\"authors\":\"N. Sailaja, L. P. Sree, N. Mangathayaru\",\"doi\":\"10.1109/IC3I.2016.7917932\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Text can be thought as the combination of characters. In the environment where the size of unstructured text data is hugely more, to process such data by computers is a challenging task. To extract meaningful and useful patterns from the text, some pre-processing methods and algorithms are required. Feature selection or Reduct generation intends to determine a smallest attributes subset which can represent the same knowledge as the original features(attributes) represented it. Rough set theory (RST) is such a mathematical tool, which can be used with tremendous success. Here, In the paper, we proposed a Rough set based approach for feature selection in the Text data set, which fulfil the aim of Text mining. We have taken different sample Text case documents (like biography text data, sample research articles of various domains, news articles from some sources) as input, these files can be in the form of .txt, .pdf etc. or any other format. We have also presented complexity analysis of our proposed algorithm and experimental results on a sample text data sets.\",\"PeriodicalId\":305971,\"journal\":{\"name\":\"2016 2nd International Conference on Contemporary Computing and Informatics (IC3I)\",\"volume\":\"8 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 2nd International Conference on Contemporary Computing and Informatics (IC3I)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IC3I.2016.7917932\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 2nd International Conference on Contemporary Computing and Informatics (IC3I)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IC3I.2016.7917932","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

文本可以被认为是字符的组合。在非结构化文本数据规模巨大的环境中，用计算机处理此类数据是一项具有挑战性的任务。为了从文本中提取有意义和有用的模式，需要一些预处理方法和算法。特征选择或约简生成旨在确定一个最小的属性子集，该子集可以表示与原始特征(属性)表示的相同的知识。粗糙集理论(RST)就是这样一个数学工具，它的应用可以取得巨大的成功。本文提出了一种基于粗糙集的文本数据集特征选择方法，实现了文本挖掘的目的。我们采取了不同的样本文本案例文件(如传记文本数据，不同领域的样本研究文章，来自某些来源的新闻文章)作为输入，这些文件可以是。txt，。pdf等形式或任何其他格式。我们还给出了我们提出的算法的复杂性分析和样本文本数据集的实验结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Rough set based feature selection approach for text mining

Text can be thought as the combination of characters. In the environment where the size of unstructured text data is hugely more, to process such data by computers is a challenging task. To extract meaningful and useful patterns from the text, some pre-processing methods and algorithms are required. Feature selection or Reduct generation intends to determine a smallest attributes subset which can represent the same knowledge as the original features(attributes) represented it. Rough set theory (RST) is such a mathematical tool, which can be used with tremendous success. Here, In the paper, we proposed a Rough set based approach for feature selection in the Text data set, which fulfil the aim of Text mining. We have taken different sample Text case documents (like biography text data, sample research articles of various domains, news articles from some sources) as input, these files can be in the form of .txt, .pdf etc. or any other format. We have also presented complexity analysis of our proposed algorithm and experimental results on a sample text data sets.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2016 2nd International Conference on Contemporary Computing and Informatics (IC3I)

自引率

0.00%

发文量