{"title":"Clustering the Chilean Web","authors":"Satu Virtanen","doi":"10.1109/LAWEB.2003.1250307","DOIUrl":"https://doi.org/10.1109/LAWEB.2003.1250307","url":null,"abstract":"We perform a clustering of the Chilean Web graph using a local fitness measure, optimized by simulated annealing, and compare the obtained cluster distribution to that of two models of the Web graph. Information on Web clusters can be employed both to validate generation models and to study the properties of the graph. Clusters can also be used in semantics-based grouping of Websites or pages e.g. for indexing and browsing.","PeriodicalId":376743,"journal":{"name":"Proceedings of the IEEE/LEOS 3rd International Conference on Numerical Simulation of Semiconductor Optoelectronic Devices (IEEE Cat. No.03EX726)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128824223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Cooperation schemes between a Web server and a Web search engine","authors":"C. Castillo","doi":"10.1109/LAWEB.2003.1250301","DOIUrl":"https://doi.org/10.1109/LAWEB.2003.1250301","url":null,"abstract":"Search engines provide search results based on a large repository of pages downloaded by a Web crawler from several servers. To provide best results, this repository must be kept as fresh as possible, but this can be difficult due to the large volume of pages involved and to the fact that polling is the only method for detecting changes. We explore and compare several alternatives for keeping fresh repositories of cooperation from servers.","PeriodicalId":376743,"journal":{"name":"Proceedings of the IEEE/LEOS 3rd International Conference on Numerical Simulation of Semiconductor Optoelectronic Devices (IEEE Cat. No.03EX726)","volume":"105 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132965382","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Structuring information on the Web from below: the case of educational organizations in Chile","authors":"Ernesto Krsulovic-Morales, Claudio Gutiérrez","doi":"10.1109/LAWEB.2003.1250303","DOIUrl":"https://doi.org/10.1109/LAWEB.2003.1250303","url":null,"abstract":"We present an ongoing work to help populate the Web with metadata by structuring and integrating information of organizations at a small scale. This is a natural complement to big scale projects to build the semantic Web infrastructure. We show an implementation for Computer Science departments in Chile, and present current work on educational organizations generalizing previous experience.","PeriodicalId":376743,"journal":{"name":"Proceedings of the IEEE/LEOS 3rd International Conference on Numerical Simulation of Semiconductor Optoelectronic Devices (IEEE Cat. No.03EX726)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114382337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the evolution of clusters of near-duplicate Web pages","authors":"Dennis Fetterly, M. Manasse, Marc Najork","doi":"10.1109/LAWEB.2003.1250280","DOIUrl":"https://doi.org/10.1109/LAWEB.2003.1250280","url":null,"abstract":"We expand on a 1997 study of the amount and distribution of near-duplicate pages on the World Wide Web. We downloaded a set of 150 million Web pages on a weekly basis over the span of 11 weeks. We then determined which of these pages are near-duplicates of one another, and tracked how clusters of near-duplicate documents evolved over time. We found that 29.2% of all Web pages are very similar to other pages, and that 22.2% are virtually identical to other pages. We also found that clusters of near-duplicate documents are fairly stable: Two documents that are near-duplicates of one another are very likely to still be near-duplicates 10 weeks later. This result is of significant relevance to search engines: Web crawlers can be fairly confident that two pages that have been found to be near-duplicates of one another will continue to be so for the foreseeable future, and may thus decide to recrawl only one version of that page, or at least to lower the download priority of the other versions, thereby freeing up crawling resources that can be brought to bear more productively somewhere else.","PeriodicalId":376743,"journal":{"name":"Proceedings of the IEEE/LEOS 3rd International Conference on Numerical Simulation of Semiconductor Optoelectronic Devices (IEEE Cat. No.03EX726)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129714100","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The best trail algorithm for assisted navigation of Web sites","authors":"R. Wheeldon, M. Levene","doi":"10.1109/LAWEB.2003.1250294","DOIUrl":"https://doi.org/10.1109/LAWEB.2003.1250294","url":null,"abstract":"We present an algorithm called the best trail algorithm, which helps solve the hypertext navigation problem by automating the construction of memex-like trails through the corpus. The algorithm performs a probabilistic best-first expansion of a set of navigation trees to find relevant and compact trails. We describe the implementation of the algorithm, scoring methods for trails, filtering algorithms and a new metric called potential gain which measures the potential of a page for future navigation opportunities.","PeriodicalId":376743,"journal":{"name":"Proceedings of the IEEE/LEOS 3rd International Conference on Numerical Simulation of Semiconductor Optoelectronic Devices (IEEE Cat. No.03EX726)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129461829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Metadata in web information services and systems","authors":"E.M. Mendez Rodriguez","doi":"10.1109/LAWEB.2003.1250310","DOIUrl":"https://doi.org/10.1109/LAWEB.2003.1250310","url":null,"abstract":"Metadata, specifically DCMI (Dublin Core Metadata Initiative) and its element set ISO 15836-2003, establishes one of the operational infrastructures of the Semantic Web and one of the interoperability main keys for electronic information management and retrieval. This metadata schemas, beside XML/RDF syntactic code, the content description schemes (ontologies, topic maps, thesaurus, etc.) and a set of protocols for information interchange, will be the protagonists of the Second Generation of the Web. This landscape will be reflected by this tutorial. Its specific aims are: To present an introduction to the metadata concept and application. To define the metadata role in Web systems and services, versus its role in global information retrieval by all the web search engines. To present the general principles for any metadata project, mainly metadata schemas assessment and selection. To reflect the metadata schemas, models and standards possibilities and their syntax and coding (HTML/XML/RDF), and their heterogeneity problems and the need of planning interoperability between different web information objects. To explain paradigmatic cases of metadata management in subject gateways, their particularities and prospective use.","PeriodicalId":376743,"journal":{"name":"Proceedings of the IEEE/LEOS 3rd International Conference on Numerical Simulation of Semiconductor Optoelectronic Devices (IEEE Cat. No.03EX726)","volume":"146 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127569097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Proceedings. First Latin American Web Congress","authors":"","doi":"10.1109/LAWEB.2003.1250274","DOIUrl":"https://doi.org/10.1109/LAWEB.2003.1250274","url":null,"abstract":"The following topics are dealt with: World Wide Web; Internet; search engines; Web sites; information retrieval; collaborative work; Web pages; semantic Web; Web documents; Web clustering.","PeriodicalId":376743,"journal":{"name":"Proceedings of the IEEE/LEOS 3rd International Conference on Numerical Simulation of Semiconductor Optoelectronic Devices (IEEE Cat. No.03EX726)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134125285","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}