{"title":"A summarizer system based on a semantic analysis of web documents","authors":"Angelin Florence, V. Padmadas","doi":"10.1109/ICTSD.2015.7095851","DOIUrl":null,"url":null,"abstract":"The availability of web and search engines has made the search easier nowadays. Information overload is one of the major problems which require algorithms and tools for faster access. Electronic documents are one of the major sources of information for business and academic information. In order to fully utilizing these on-line documents effectively, it is crucial to be able to extract the summary of these documents. Summarization system will be one of the solutions to the above problem. This project proposes a summarizer system which will be able to perform summarization of multiple documents. The input text documents are analyzed through a parser which parses the input documents and generates parse tree for each sentence. RDF triples are extracted from each sentence by analyzing the typed dependencies generated from the parser in the form of subject, verb and object. Semantic distance is computed between each pair of sentences and a matrix containing the semantic distance for sentences are computed. The measure adopted to compute semantic distance is Wu and Palmer distance. A clustering algorithm is applied to the extracted subject, verb and object space and the extracted RDF triples are grouped into clusters. The important sentences are selected for final summary are extracted using sentence selection algorithm.","PeriodicalId":270099,"journal":{"name":"2015 International Conference on Technologies for Sustainable Development (ICTSD)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2015-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 International Conference on Technologies for Sustainable Development (ICTSD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICTSD.2015.7095851","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
The availability of web and search engines has made the search easier nowadays. Information overload is one of the major problems which require algorithms and tools for faster access. Electronic documents are one of the major sources of information for business and academic information. In order to fully utilizing these on-line documents effectively, it is crucial to be able to extract the summary of these documents. Summarization system will be one of the solutions to the above problem. This project proposes a summarizer system which will be able to perform summarization of multiple documents. The input text documents are analyzed through a parser which parses the input documents and generates parse tree for each sentence. RDF triples are extracted from each sentence by analyzing the typed dependencies generated from the parser in the form of subject, verb and object. Semantic distance is computed between each pair of sentences and a matrix containing the semantic distance for sentences are computed. The measure adopted to compute semantic distance is Wu and Palmer distance. A clustering algorithm is applied to the extracted subject, verb and object space and the extracted RDF triples are grouped into clusters. The important sentences are selected for final summary are extracted using sentence selection algorithm.