A summarizer system based on a semantic analysis of web documents

2015 International Conference on Technologies for Sustainable Development (ICTSD) Pub Date : 2015-04-30 DOI:10.1109/ICTSD.2015.7095851

Angelin Florence, V. Padmadas

{"title":"A summarizer system based on a semantic analysis of web documents","authors":"Angelin Florence, V. Padmadas","doi":"10.1109/ICTSD.2015.7095851","DOIUrl":null,"url":null,"abstract":"The availability of web and search engines has made the search easier nowadays. Information overload is one of the major problems which require algorithms and tools for faster access. Electronic documents are one of the major sources of information for business and academic information. In order to fully utilizing these on-line documents effectively, it is crucial to be able to extract the summary of these documents. Summarization system will be one of the solutions to the above problem. This project proposes a summarizer system which will be able to perform summarization of multiple documents. The input text documents are analyzed through a parser which parses the input documents and generates parse tree for each sentence. RDF triples are extracted from each sentence by analyzing the typed dependencies generated from the parser in the form of subject, verb and object. Semantic distance is computed between each pair of sentences and a matrix containing the semantic distance for sentences are computed. The measure adopted to compute semantic distance is Wu and Palmer distance. A clustering algorithm is applied to the extracted subject, verb and object space and the extracted RDF triples are grouped into clusters. The important sentences are selected for final summary are extracted using sentence selection algorithm.","PeriodicalId":270099,"journal":{"name":"2015 International Conference on Technologies for Sustainable Development (ICTSD)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2015-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 International Conference on Technologies for Sustainable Development (ICTSD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICTSD.2015.7095851","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

The availability of web and search engines has made the search easier nowadays. Information overload is one of the major problems which require algorithms and tools for faster access. Electronic documents are one of the major sources of information for business and academic information. In order to fully utilizing these on-line documents effectively, it is crucial to be able to extract the summary of these documents. Summarization system will be one of the solutions to the above problem. This project proposes a summarizer system which will be able to perform summarization of multiple documents. The input text documents are analyzed through a parser which parses the input documents and generates parse tree for each sentence. RDF triples are extracted from each sentence by analyzing the typed dependencies generated from the parser in the form of subject, verb and object. Semantic distance is computed between each pair of sentences and a matrix containing the semantic distance for sentences are computed. The measure adopted to compute semantic distance is Wu and Palmer distance. A clustering algorithm is applied to the extracted subject, verb and object space and the extracted RDF triples are grouped into clusters. The important sentences are selected for final summary are extracted using sentence selection algorithm.

查看原文本刊更多论文

基于web文档语义分析的摘要系统

网络和搜索引擎的可用性使搜索变得更加容易。信息过载是主要问题之一，需要算法和工具来实现更快的访问。电子文档是商业和学术信息的主要信息来源之一。为了充分有效地利用这些在线文档，能够提取这些文档的摘要是至关重要的。摘要系统将是解决上述问题的方法之一。本项目提出了一个能够对多个文档进行摘要的摘要器系统。输入文本文档通过解析器进行分析，解析器解析输入文档并为每个句子生成解析树。通过分析解析器以主语、动词和宾语的形式生成的类型化依赖关系，从每个句子中提取RDF三元组。计算每对句子之间的语义距离，并计算包含句子语义距离的矩阵。计算语义距离的方法是Wu和Palmer距离。将聚类算法应用于提取的主语、动词和对象空间，并将提取的RDF三元组分组成簇。通过句子选择算法提取出用于最终总结的重要句子。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2015 International Conference on Technologies for Sustainable Development (ICTSD)

自引率

0.00%

发文量