A Formal Technique for Text Summarization from Web Pages by using Latent Semantic Analysis

J. G. Ramos, Isela Navarro-Alatorre, G. Becerra, Omar Flores-Sánchez
{"title":"A Formal Technique for Text Summarization from Web Pages by using Latent Semantic Analysis","authors":"J. G. Ramos, Isela Navarro-Alatorre, G. Becerra, Omar Flores-Sánchez","doi":"10.13053/rcs-148-3-1","DOIUrl":null,"url":null,"abstract":"Web is the more attractive media for information consulting of, practically, whatever theme; humanity considers the Web, in the facts, the standard source of information. However as content grows, effort for discriminating and filtering increases too. Orthogonally, users employ each time smaller devices with reduced screens for web reviewing. Both considerations suggest the neediness of software tools for information acquiring and reduction, i.e., text summarization. There are several methods for text summarization, however, majority of them are based on techniques who considere plain documents in contrast with tree like structures of web pages, other are settled on the existence of keywords ignoring relations among words. In this work we present a formal method for the preparation of text summaries based on latent semantic analysis (LSA), which exploits the implicit relationships between the words that appear in a common context. In this way, text summaries are enriched with a certain semantic flavor incorporated by LSA. Furthermore we prepare the text summary induced by the query of an user and retrieving text excerpts more semantically similar to user’s interest. Additionally we define a formula called semantic similarity which encapsulates the properties of LSA and determines the best text web page node for producing summaries.","PeriodicalId":220522,"journal":{"name":"Res. Comput. Sci.","volume":"37 6","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Res. Comput. Sci.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.13053/rcs-148-3-1","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Web is the more attractive media for information consulting of, practically, whatever theme; humanity considers the Web, in the facts, the standard source of information. However as content grows, effort for discriminating and filtering increases too. Orthogonally, users employ each time smaller devices with reduced screens for web reviewing. Both considerations suggest the neediness of software tools for information acquiring and reduction, i.e., text summarization. There are several methods for text summarization, however, majority of them are based on techniques who considere plain documents in contrast with tree like structures of web pages, other are settled on the existence of keywords ignoring relations among words. In this work we present a formal method for the preparation of text summaries based on latent semantic analysis (LSA), which exploits the implicit relationships between the words that appear in a common context. In this way, text summaries are enriched with a certain semantic flavor incorporated by LSA. Furthermore we prepare the text summary induced by the query of an user and retrieving text excerpts more semantically similar to user’s interest. Additionally we define a formula called semantic similarity which encapsulates the properties of LSA and determines the best text web page node for producing summaries.
基于潜在语义分析的网页文本摘要形式化技术
对于任何主题的信息咨询来说,网络都是更具吸引力的媒介;事实上,人类认为网络是标准的信息来源。然而,随着内容的增长,辨别和过滤的努力也在增加。在垂直方向上,用户每次都使用屏幕更小的设备进行网页审查。这两种考虑都表明需要软件工具来获取和减少信息,即文本摘要。有几种文本摘要的方法,然而,它们中的大多数都是基于将普通文档与网页的树状结构进行对比的技术,其他的则是基于关键词的存在而忽略单词之间的关系。在这项工作中,我们提出了一种基于潜在语义分析(LSA)的文本摘要准备的形式化方法,该方法利用了出现在共同上下文中的单词之间的隐含关系。这样,文本摘要就丰富了LSA所包含的某种语义风味。在此基础上,根据用户的查询生成文本摘要,检索语义上更接近用户兴趣的文本摘要。此外,我们定义了一个称为语义相似度的公式,该公式封装了LSA的属性,并确定了生成摘要的最佳文本网页节点。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信