{"title":"Functional words removal techniques: A review","authors":"S. Gandotra, B. Arora","doi":"10.1109/PDGC.2018.8745904","DOIUrl":null,"url":null,"abstract":"With the growth of internet activities, electronic documents has become the key source of data and optimization of this data is very important for various research tasks such as Information Retrieval, Natural Language Processing, Web mining, Text mining etc. The data which is present in the web is a combination of both structured as well as unstructured data which mostly contains textual data. Hence, text processing is required for extracting useful information from that type of data which can then be used for further processes. Preprocessing plays a vital role in all text processing activities. Stop-word removal is one of the most important pre-processing techniques which eliminate the functional words from the document. Thus, helps in improving the performance of the system. In this paper, all the stop-word removal techniques used for Indian text are discussed along with the analysis of results produced by using those techniques for various Indian languages is also presented.","PeriodicalId":303401,"journal":{"name":"2018 Fifth International Conference on Parallel, Distributed and Grid Computing (PDGC)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 Fifth International Conference on Parallel, Distributed and Grid Computing (PDGC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PDGC.2018.8745904","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
With the growth of internet activities, electronic documents has become the key source of data and optimization of this data is very important for various research tasks such as Information Retrieval, Natural Language Processing, Web mining, Text mining etc. The data which is present in the web is a combination of both structured as well as unstructured data which mostly contains textual data. Hence, text processing is required for extracting useful information from that type of data which can then be used for further processes. Preprocessing plays a vital role in all text processing activities. Stop-word removal is one of the most important pre-processing techniques which eliminate the functional words from the document. Thus, helps in improving the performance of the system. In this paper, all the stop-word removal techniques used for Indian text are discussed along with the analysis of results produced by using those techniques for various Indian languages is also presented.