J. G. Ramos, Isela Navarro-Alatorre, G. Becerra, Omar Flores-Sánchez
{"title":"A Formal Technique for Text Summarization from Web Pages by using Latent Semantic Analysis","authors":"J. G. Ramos, Isela Navarro-Alatorre, G. Becerra, Omar Flores-Sánchez","doi":"10.13053/rcs-148-3-1","DOIUrl":null,"url":null,"abstract":"Web is the more attractive media for information consulting of, practically, whatever theme; humanity considers the Web, in the facts, the standard source of information. However as content grows, effort for discriminating and filtering increases too. Orthogonally, users employ each time smaller devices with reduced screens for web reviewing. Both considerations suggest the neediness of software tools for information acquiring and reduction, i.e., text summarization. There are several methods for text summarization, however, majority of them are based on techniques who considere plain documents in contrast with tree like structures of web pages, other are settled on the existence of keywords ignoring relations among words. In this work we present a formal method for the preparation of text summaries based on latent semantic analysis (LSA), which exploits the implicit relationships between the words that appear in a common context. In this way, text summaries are enriched with a certain semantic flavor incorporated by LSA. Furthermore we prepare the text summary induced by the query of an user and retrieving text excerpts more semantically similar to user’s interest. Additionally we define a formula called semantic similarity which encapsulates the properties of LSA and determines the best text web page node for producing summaries.","PeriodicalId":220522,"journal":{"name":"Res. Comput. Sci.","volume":"37 6","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Res. Comput. Sci.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.13053/rcs-148-3-1","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Web is the more attractive media for information consulting of, practically, whatever theme; humanity considers the Web, in the facts, the standard source of information. However as content grows, effort for discriminating and filtering increases too. Orthogonally, users employ each time smaller devices with reduced screens for web reviewing. Both considerations suggest the neediness of software tools for information acquiring and reduction, i.e., text summarization. There are several methods for text summarization, however, majority of them are based on techniques who considere plain documents in contrast with tree like structures of web pages, other are settled on the existence of keywords ignoring relations among words. In this work we present a formal method for the preparation of text summaries based on latent semantic analysis (LSA), which exploits the implicit relationships between the words that appear in a common context. In this way, text summaries are enriched with a certain semantic flavor incorporated by LSA. Furthermore we prepare the text summary induced by the query of an user and retrieving text excerpts more semantically similar to user’s interest. Additionally we define a formula called semantic similarity which encapsulates the properties of LSA and determines the best text web page node for producing summaries.