{"title":"Punjabi document classification using vector evaluation method","authors":"Mehak Katnoria, Varinderpal Singh, Rajiv Kumar","doi":"10.1109/ICCMC.2017.8282606","DOIUrl":null,"url":null,"abstract":"Most information is stored as text, managing a vast amount of documents in digital forms is vital in text mining applications. Text Mining is a field that extracts hidden, useful information from the text document according to user's query. Text Categorization is the most important part of Text Mining. Text Categorization, also known as Text Classification or topic spotting, is defined as a classification of text documents under predefined categories. Although Punjabi text categorization is a promising field, not much work has been done as compared to English text categorization. This paper proposes a method which uses a categorized Punjabi documents corpus, and then the weights of the tested document's words are calculated to determine the document keywords which will be compared with the keywords of the corpus to determine the tested document's best category.","PeriodicalId":163288,"journal":{"name":"2017 International Conference on Computing Methodologies and Communication (ICCMC)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 International Conference on Computing Methodologies and Communication (ICCMC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCMC.2017.8282606","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
Most information is stored as text, managing a vast amount of documents in digital forms is vital in text mining applications. Text Mining is a field that extracts hidden, useful information from the text document according to user's query. Text Categorization is the most important part of Text Mining. Text Categorization, also known as Text Classification or topic spotting, is defined as a classification of text documents under predefined categories. Although Punjabi text categorization is a promising field, not much work has been done as compared to English text categorization. This paper proposes a method which uses a categorized Punjabi documents corpus, and then the weights of the tested document's words are calculated to determine the document keywords which will be compared with the keywords of the corpus to determine the tested document's best category.