Functional words removal techniques: A review

S. Gandotra, B. Arora
{"title":"Functional words removal techniques: A review","authors":"S. Gandotra, B. Arora","doi":"10.1109/PDGC.2018.8745904","DOIUrl":null,"url":null,"abstract":"With the growth of internet activities, electronic documents has become the key source of data and optimization of this data is very important for various research tasks such as Information Retrieval, Natural Language Processing, Web mining, Text mining etc. The data which is present in the web is a combination of both structured as well as unstructured data which mostly contains textual data. Hence, text processing is required for extracting useful information from that type of data which can then be used for further processes. Preprocessing plays a vital role in all text processing activities. Stop-word removal is one of the most important pre-processing techniques which eliminate the functional words from the document. Thus, helps in improving the performance of the system. In this paper, all the stop-word removal techniques used for Indian text are discussed along with the analysis of results produced by using those techniques for various Indian languages is also presented.","PeriodicalId":303401,"journal":{"name":"2018 Fifth International Conference on Parallel, Distributed and Grid Computing (PDGC)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 Fifth International Conference on Parallel, Distributed and Grid Computing (PDGC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PDGC.2018.8745904","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

With the growth of internet activities, electronic documents has become the key source of data and optimization of this data is very important for various research tasks such as Information Retrieval, Natural Language Processing, Web mining, Text mining etc. The data which is present in the web is a combination of both structured as well as unstructured data which mostly contains textual data. Hence, text processing is required for extracting useful information from that type of data which can then be used for further processes. Preprocessing plays a vital role in all text processing activities. Stop-word removal is one of the most important pre-processing techniques which eliminate the functional words from the document. Thus, helps in improving the performance of the system. In this paper, all the stop-word removal techniques used for Indian text are discussed along with the analysis of results produced by using those techniques for various Indian languages is also presented.
虚词去除技术综述
随着互联网活动的增长,电子文档已经成为数据的主要来源,而这些数据的优化对于信息检索、自然语言处理、Web挖掘、文本挖掘等各种研究任务都是非常重要的。web中的数据是结构化和非结构化数据的组合,其中大部分包含文本数据。因此,需要文本处理来从该类型的数据中提取有用的信息,然后将其用于进一步的处理。预处理在所有文本处理活动中起着至关重要的作用。停止词去除是去除文档中功能词的重要预处理技术之一。因此,有助于提高系统的性能。在本文中,讨论了所有用于印度文本的停词去除技术,并分析了使用这些技术对各种印度语言产生的结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信