{"title":"Categorized Text Document Summarization in the Kannada Language by sentence ranking","authors":"R. Jayashree, K. S. Murthy, B. Anami","doi":"10.1109/ISDA.2012.6416635","DOIUrl":null,"url":null,"abstract":"The growth of internet has given rise to the need for better Information Retrieval (IR) techniques which help in obtaining relevant information at a faster rate. Text Summarization is one such technique which aims at producing a quick and concise summary of the Text. Of late, Key word based summary has drawn wide attention of researchers in Natural Language Processing community. The algorithm we have developed extracts key words from Kannada text documents, for which we combine GSS (Galavotti, Sebastiani, Simi)[13] coefficients and IDF(Inverse Document Frequency) methods along with TF(Term Frequency) for extracting key words and later uses these for summarization. The important objective our work is to assign a weight to each word in a sentence, the weight of a sentence is the sum of weights of all words, based on the scoring of sentences; we choose top `m' sentences. A document from a given category is selected from our database custom built for this purpose. The files are obtained from Kannada Webdunia. Kannada Webdunia is a Kannada Portal which offers Political News, Cinema News, Sports news, Shopping and Jokes. Depending on the number of sentences given by the user, a summary is generated. Finally we make comparison of machine generated summary with that of human summary. Yet another objective of this work is to perform feature extraction through removal of stop words. For removing stop words we have presented a novel technique which finds structurally similar words in a document.","PeriodicalId":370150,"journal":{"name":"2012 12th International Conference on Intelligent Systems Design and Applications (ISDA)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"23","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 12th International Conference on Intelligent Systems Design and Applications (ISDA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISDA.2012.6416635","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 23
Abstract
The growth of internet has given rise to the need for better Information Retrieval (IR) techniques which help in obtaining relevant information at a faster rate. Text Summarization is one such technique which aims at producing a quick and concise summary of the Text. Of late, Key word based summary has drawn wide attention of researchers in Natural Language Processing community. The algorithm we have developed extracts key words from Kannada text documents, for which we combine GSS (Galavotti, Sebastiani, Simi)[13] coefficients and IDF(Inverse Document Frequency) methods along with TF(Term Frequency) for extracting key words and later uses these for summarization. The important objective our work is to assign a weight to each word in a sentence, the weight of a sentence is the sum of weights of all words, based on the scoring of sentences; we choose top `m' sentences. A document from a given category is selected from our database custom built for this purpose. The files are obtained from Kannada Webdunia. Kannada Webdunia is a Kannada Portal which offers Political News, Cinema News, Sports news, Shopping and Jokes. Depending on the number of sentences given by the user, a summary is generated. Finally we make comparison of machine generated summary with that of human summary. Yet another objective of this work is to perform feature extraction through removal of stop words. For removing stop words we have presented a novel technique which finds structurally similar words in a document.