基于粗糙集的文本挖掘特征选择方法

N. Sailaja, L. P. Sree, N. Mangathayaru
{"title":"基于粗糙集的文本挖掘特征选择方法","authors":"N. Sailaja, L. P. Sree, N. Mangathayaru","doi":"10.1109/IC3I.2016.7917932","DOIUrl":null,"url":null,"abstract":"Text can be thought as the combination of characters. In the environment where the size of unstructured text data is hugely more, to process such data by computers is a challenging task. To extract meaningful and useful patterns from the text, some pre-processing methods and algorithms are required. Feature selection or Reduct generation intends to determine a smallest attributes subset which can represent the same knowledge as the original features(attributes) represented it. Rough set theory (RST) is such a mathematical tool, which can be used with tremendous success. Here, In the paper, we proposed a Rough set based approach for feature selection in the Text data set, which fulfil the aim of Text mining. We have taken different sample Text case documents (like biography text data, sample research articles of various domains, news articles from some sources) as input, these files can be in the form of .txt, .pdf etc. or any other format. We have also presented complexity analysis of our proposed algorithm and experimental results on a sample text data sets.","PeriodicalId":305971,"journal":{"name":"2016 2nd International Conference on Contemporary Computing and Informatics (IC3I)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Rough set based feature selection approach for text mining\",\"authors\":\"N. Sailaja, L. P. Sree, N. Mangathayaru\",\"doi\":\"10.1109/IC3I.2016.7917932\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Text can be thought as the combination of characters. In the environment where the size of unstructured text data is hugely more, to process such data by computers is a challenging task. To extract meaningful and useful patterns from the text, some pre-processing methods and algorithms are required. Feature selection or Reduct generation intends to determine a smallest attributes subset which can represent the same knowledge as the original features(attributes) represented it. Rough set theory (RST) is such a mathematical tool, which can be used with tremendous success. Here, In the paper, we proposed a Rough set based approach for feature selection in the Text data set, which fulfil the aim of Text mining. We have taken different sample Text case documents (like biography text data, sample research articles of various domains, news articles from some sources) as input, these files can be in the form of .txt, .pdf etc. or any other format. We have also presented complexity analysis of our proposed algorithm and experimental results on a sample text data sets.\",\"PeriodicalId\":305971,\"journal\":{\"name\":\"2016 2nd International Conference on Contemporary Computing and Informatics (IC3I)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 2nd International Conference on Contemporary Computing and Informatics (IC3I)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IC3I.2016.7917932\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 2nd International Conference on Contemporary Computing and Informatics (IC3I)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IC3I.2016.7917932","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

文本可以被认为是字符的组合。在非结构化文本数据规模巨大的环境中,用计算机处理此类数据是一项具有挑战性的任务。为了从文本中提取有意义和有用的模式,需要一些预处理方法和算法。特征选择或约简生成旨在确定一个最小的属性子集,该子集可以表示与原始特征(属性)表示的相同的知识。粗糙集理论(RST)就是这样一个数学工具,它的应用可以取得巨大的成功。本文提出了一种基于粗糙集的文本数据集特征选择方法,实现了文本挖掘的目的。我们采取了不同的样本文本案例文件(如传记文本数据,不同领域的样本研究文章,来自某些来源的新闻文章)作为输入,这些文件可以是。txt,。pdf等形式或任何其他格式。我们还给出了我们提出的算法的复杂性分析和样本文本数据集的实验结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Rough set based feature selection approach for text mining
Text can be thought as the combination of characters. In the environment where the size of unstructured text data is hugely more, to process such data by computers is a challenging task. To extract meaningful and useful patterns from the text, some pre-processing methods and algorithms are required. Feature selection or Reduct generation intends to determine a smallest attributes subset which can represent the same knowledge as the original features(attributes) represented it. Rough set theory (RST) is such a mathematical tool, which can be used with tremendous success. Here, In the paper, we proposed a Rough set based approach for feature selection in the Text data set, which fulfil the aim of Text mining. We have taken different sample Text case documents (like biography text data, sample research articles of various domains, news articles from some sources) as input, these files can be in the form of .txt, .pdf etc. or any other format. We have also presented complexity analysis of our proposed algorithm and experimental results on a sample text data sets.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信