{"title":"Arc-Community Detection via Triangular Random Walks","authors":"P. Boldi, M. Rosa","doi":"10.1109/LA-WEB.2012.19","DOIUrl":"https://doi.org/10.1109/LA-WEB.2012.19","url":null,"abstract":"Community detection in social networks is a topic of central importance in modern graph mining, and the existence of overlapping communities has recently given rise to new interest in arc clustering. In this paper, we propose the notion of triangular random walk as a way to unveil arc-community structure in social graphs: a triangular walk is a random process that insists differently on arcs that close a triangle. We prove that triangular walks can be used effectively, by translating them into a standard weighted random walk on the line graph, our experiments show that the weights so defined are in fact very helpful in determining the similarity between arcs and yield high-quality clustering. Even if our technique gives a weighting scheme on the line graph and can be combined with any node-clustering method in the final phase, to make our approach more scalable we also propose an algorithm (ALP) that produces the clustering directly without the need to build the weighted line graph explicitly. Our experiments show that ALP, besides providing the largest accuracy, it is also the fastest and most scalable among all arc-clustering algorithms we are aware of.","PeriodicalId":333389,"journal":{"name":"2012 Eighth Latin American Web Congress","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116809520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Genetic Niching Algorithm with Self-Adaptating Operator Rates for Document Clustering","authors":"Elizabeth León Guzman, Jonatan Gómez, O. Nasraoui","doi":"10.1109/LA-WEB.2012.22","DOIUrl":"https://doi.org/10.1109/LA-WEB.2012.22","url":null,"abstract":"We propose a Genetic algorithm for document clustering, where an evolutionary multimodal optimization algorithm evolves candidate cluster representative solutions to search for dense regions in the sparse high dimensional vector space of text documents. The evolution affects not only the document cluster representatives but also the genetic operator rates which are evolved simultaneously with the document cluster representative solutions. The evolving population consists of candidate document cluster representatives that are encoded in the form of a sparse index and sparse index/frequency variable length vectors. In addition, specialized sparse genetic operators are defined for this special representation. The proposed specialized genetic operators achieve different degrees of exploitation and exploration in searching for the optimal document cluster prototypes, in particular the most specialized operator for the document clustering problem is the Sparse Top-K-Addition operator, which can be seen as an incentive towards a more aggressive exploitation of the local context in a small subset of documents, whereas the simple Sparse Real Addition operator works more in an exploratory manner. As shown in our experiments on two well-known document data sets, taking into account associated terms within a local context adds the benefit of an explicit latent semantic consideration in the search for optimal term lists to describe the cluster prototypes.","PeriodicalId":333389,"journal":{"name":"2012 Eighth Latin American Web Congress","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129378036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Visual Tool for Querying and Exploring XML Data","authors":"R. Baeza-Yates, C. Barrera, Valeria Herskovic","doi":"10.1109/LA-WEB.2012.20","DOIUrl":"https://doi.org/10.1109/LA-WEB.2012.20","url":null,"abstract":"We present a visual tool based in XQuery for querying and exploring XML documents. The tool is based in a simple but effective visual metaphor and a results visualization technique based in fisheye Trees. The application fulfills the expectations of facilitating the manipulation of XML documents for inexperienced users, offering original interaction features, improving upon similar work. As XML is one of the main metalanguages of the Web, this tool may have a broad impact.","PeriodicalId":333389,"journal":{"name":"2012 Eighth Latin American Web Congress","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114455239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ricardo Kawase, Patrick Siehndel, E. Herder, W. Nejdl
{"title":"Hyperlink of Men","authors":"Ricardo Kawase, Patrick Siehndel, E. Herder, W. Nejdl","doi":"10.1109/LA-WEB.2012.12","DOIUrl":"https://doi.org/10.1109/LA-WEB.2012.12","url":null,"abstract":"Hand-made hyperlinks are increasingly outnumbered by automatically generated links, which are usually based on text similarity or some sort of recommendation algorithm. In this paper we explore the current linking and appreciation of automatically generated links. To what extent do they prevail on the Web, in what forms do they appear, and do users think those generated links are just as good as human-created links? To answer these questions we first propose a model for extracting contextual information of a hyperlink. Second, we developed a hyperlink ranker to assigned relevance to each existing human generated link. With the outcomes of the hyperlink ranker, together with another two recommendation strategies, we performed a user study with over 100 participants. Results indicate that automated links are \"good enough\", and even preferred in some user contexts. Still, they do not provide the deeper knowledge as expressed by human authors.","PeriodicalId":333389,"journal":{"name":"2012 Eighth Latin American Web Congress","volume":"11 7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114719615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Thaína Amélia de Oliveira Alves, H. T. Marques-Neto
{"title":"Characterizing a Network of SPAMs Recipients","authors":"Thaína Amélia de Oliveira Alves, H. T. Marques-Neto","doi":"10.1109/LA-WEB.2012.14","DOIUrl":"https://doi.org/10.1109/LA-WEB.2012.14","url":null,"abstract":"The email exchange among people is one of the most common communication activities done in the Internet. Consequently, this produces a very large volume of electronic messages over the network, which needs to be well managed by email service providers. Besides, many of these emails are unwanted messages and even malicious. Thus, filtering unwanted mail and ensuring the security of users' mailboxes should be considered as high priority in the set of management tasks performed by email providers. This paper characterizes one network formed from a real dataset of received spams by users of a corporative email provider, which were identified, blocked and registered by a spam filter. The results show that some typical metrics of complex networks such as popularity and connectivity could be applied for improving the efficiency of the identification of email addresses used by spammers and also the email accounts which receive a huge number of unwanted and/or malicious messages. We observed that few of these email addresses have high popularity and high connectivity in the network. Moreover, this work also shows that if the users who receive a large number of spams are renamed or removed, i.e. ignored, the amount of spams could decrease substantially. This points out that the monitoring of a small number of users (or few nodes of a complex network) could positively affect the management of email providers. By knowing additional characteristics about recipients of a large amount of spams and the popularity of email addresses used by spammers, we could improve some techniques used to block unwanted and malicious messages on the network of the service provider.","PeriodicalId":333389,"journal":{"name":"2012 Eighth Latin American Web Congress","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129652007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Clustering Heterogeneous Data Sets","authors":"A. Abdullin, O. Nasraoui","doi":"10.1109/LA-WEB.2012.27","DOIUrl":"https://doi.org/10.1109/LA-WEB.2012.27","url":null,"abstract":"Recent years have seen an increasing interest in clustering data comprising multiple domains or modalities, such as categorical, numerical and transactional, etc. This kind of data is sometimes found within the context of clustering multiview, heterogeneous, or multimodal data. Traditionally, different types of attributes or domains have been handled by first combining them into one format (possibly using some type of conversion) and then following with a traditional clustering algorithm, or computing a combined distance matrix that takes into account the distance values for each domain, then following with a relational or graph clustering approach. In other cases where data consists of multiple views, multiview clustering has been used to cluster the data. In this paper, we review the existing approaches such as multiview clustering and discuss several additional approaches that can be harnessed for the purpose of clustering heterogeneous data once they are adapted for this purpose. The additional approaches include ensemble clustering, collaborative clustering and semi-supervised clustering.","PeriodicalId":333389,"journal":{"name":"2012 Eighth Latin American Web Congress","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121300673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Marva Angélica Mora Lumbreras, L. Flores-Pulido, B. González-Contreras, Edgar Alberto Portilla-Flores
{"title":"Incorporating 3D Sound in Different Virtual Worlds","authors":"Marva Angélica Mora Lumbreras, L. Flores-Pulido, B. González-Contreras, Edgar Alberto Portilla-Flores","doi":"10.1109/LA-WEB.2012.17","DOIUrl":"https://doi.org/10.1109/LA-WEB.2012.17","url":null,"abstract":"The use of virtual tools in a university level results in a series of benefits. A virtual world allows the students to interact with modern technology, to learn with different tools, etc. This project was developed to be used in the Interaction Computer-Human and Computer Graphics courses, due to different manipulations kind, and the combination 3D graphics and sound, at the Bachelor in Computer Engineering at the Autonomous University of Tlaxcala. Specifically, this paper focuses on the behavior of sound on combining: \"Navigation, 3D objects and 3D sound\", in different virtual worlds. Our project allows building different virtual environment multi-screen, with a full navigation system, four ways of manipulating: keyboard, head tracker, remote control and input files with different predefined tours. Furthermore, with this project we can also build a Wheat stone-type stereoscope. We have implemented different virtual worlds for the different physic environments, which use different sound sources.","PeriodicalId":333389,"journal":{"name":"2012 Eighth Latin American Web Congress","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122241509","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Characterizing Landing Pages in Sponsored Search","authors":"Haibin Liu, Woo-Cheol Kim, Dongwon Lee","doi":"10.1109/LA-WEB.2012.10","DOIUrl":"https://doi.org/10.1109/LA-WEB.2012.10","url":null,"abstract":"Using a total of 60,419 ad links collected from three search engines (i.e., Bing, Google, and Yahoo), we characterize the ``mobile-friendliness'' of landing pages in sponsored search. In particular, we analyze the common and different characteristics between landing pages made for desktop vs. mobile device users, measure/validate the quantitative scores for their mobile-friendliness, and classify the results with respect to types of queries and landing pages. Based on our findings, we articulate that: (1) current landing pages (regardless of search engines or platforms) are emph{not} mobile-friendly enough, and (2) better data-driven methods (as opposed to current static methods) to help advertisers build mobile-friendly landing pages are needed.","PeriodicalId":333389,"journal":{"name":"2012 Eighth Latin American Web Congress","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125499227","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Vítor Mangaravite, G. T. Assis, Anderson A. Ferreira
{"title":"Improving the Efficiency of a Genre-Aware Approach to Focused Crawling Based on Link Context","authors":"Vítor Mangaravite, G. T. Assis, Anderson A. Ferreira","doi":"10.1109/LA-WEB.2012.24","DOIUrl":"https://doi.org/10.1109/LA-WEB.2012.24","url":null,"abstract":"Focused crawlers attempt to crawl web pages that are relevant to a specific topic or user interest. Although these kinds of crawlers have been proven to be effective, they need to improve their efficiency. Focused crawlers usually use a Frontier of non-visited URLs to visit the web pages and gather relavant ones. In this work, we define and evaluate a queueing policy of non-visited URLs, based on link context, to improve the efficiency of a genre-aware focused crawler. Our experimental evaluation shows, in some situations, an improvement around 100% in efficiency terms.","PeriodicalId":333389,"journal":{"name":"2012 Eighth Latin American Web Congress","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128349944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-source Conflating Index Construction for Local Search in a Low-Coverage Country","authors":"Dirk Ahlers","doi":"10.1109/LA-WEB.2012.21","DOIUrl":"https://doi.org/10.1109/LA-WEB.2012.21","url":null,"abstract":"Local search is a well-established mode of Web search engines today. For most developed countries, a huge amount of data is directly available on the Web and can be extracted and processed by search engines. In many developing countries, the Web coverage is much lower and only little information is directly available. To develop a Local Search that still can provide sufficient coverage, a hybrid mode of index construction has to be followed that identifies and integrates other sources of geospatial information to increase the coverage. The location part can be exceptionally difficult, as imprecise addresses and low-coverage geocoders do not allow precise coordinates to be used in mapping. We present an approach that is designed to increase coverage and precision. We use the example of Honduras, a country in Latin American, to describe the approach and potential data sources.","PeriodicalId":333389,"journal":{"name":"2012 Eighth Latin American Web Congress","volume":"235 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124321973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}