Shawn M. Jones, Martin Klein, M. Weigle, Michael L. Nelson
{"title":"Summarizing Web Archive Corpora Via Social Media Storytelling By Automatically Selecting and Visualizing Exemplars","authors":"Shawn M. Jones, Martin Klein, M. Weigle, Michael L. Nelson","doi":"10.1145/3606030","DOIUrl":"https://doi.org/10.1145/3606030","url":null,"abstract":"People often create themed collections to make sense of an ever-increasing number of archived web pages. Some of these collections contain hundreds of thousands of documents. Thousands of collections exist, many covering the same topic. Few collections include standardized metadata. This scale makes understanding a collection an expensive proposition. Our Dark and Stormy Archives (DSA) five-process model implements a novel summarization method to help users understand a collection by combining web archives and social media storytelling. The five processes of the DSA model are: select exemplars, generate story metadata, generate document metadata, visualize the story, and distribute the story. Selecting exemplars produces a set of k documents from the N documents in the collection, where k < <N, thus reducing the number of documents visitors need to review to understand a collection. Generating story and document metadata selects images, titles, descriptions, and other content from these exemplars. Visualizing the story ties this metadata together in a format the visitor can consume. Without distributing the story, it is not shared for others to consume. We present a research study demonstrating that our algorithmic primitives can be combined to select relevant exemplars that are otherwise undiscoverable using a conventional search engine and query generation methods. Having demonstrated improved methods for selecting exemplars, we visualize the story. Previous work established that the social card is the best format for visitors to consume surrogates. The social card combines metadata fields, including the document’s title, a brief description, and a striking image. Social cards are commonly found on social media platforms. We discovered that these platforms perform poorly for mementos and rely on web page authors to supply the necessary values for these metadata fields. With web archives, we often encounter archived web pages that predate the existence of this metadata. To generate this missing metadata and ensure that storytelling is available for these documents, we apply machine learning to generate the images needed for social cards with a Precision@1 of 0.8314. We also provide the length values needed for executing automatic summarization algorithms to generate document descriptions. Applying these concepts helps us create the visualizations needed to fulfill the final processes of story generation. We close this work with examples and applications of this technology.","PeriodicalId":50940,"journal":{"name":"ACM Transactions on the Web","volume":" ","pages":""},"PeriodicalIF":3.5,"publicationDate":"2023-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44113678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ke Sun, Tieyun Qian, Chenliang Li, Xuan Ma, Qing Li, Ming Zhong, Yuanyuan Zhu, Mengchi Liu
{"title":"Pre-Training Across Different Cities for Next POI Recommendation","authors":"Ke Sun, Tieyun Qian, Chenliang Li, Xuan Ma, Qing Li, Ming Zhong, Yuanyuan Zhu, Mengchi Liu","doi":"https://dl.acm.org/doi/10.1145/3605554","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3605554","url":null,"abstract":"<p>The Point-of-Interest (POI) transition behaviors could hold absolute sparsity and relative sparsity very differently for different cities. Hence, it is intuitive to transfer knowledge across cities to alleviate those data sparsity and imbalance problems for next POI recommendation. Recently, pre-training over a large-scale dataset has achieved great success in many relevant fields, like computer vision and natural language processing. By devising various self-supervised objectives, pre-training models can produce more robust representations for downstream tasks. However, it is not trivial to directly adopt such existing pre-training techniques for next POI recommendation, due to the <i>lacking of common semantic objects (users or items) across different cities</i>. Thus in this paper, we tackle such a new research problem of <i>pre-training across different cities</i> for next POI recommendation. Specifically, to overcome the key challenge that different cities do not share any common object, we propose a novel pre-training model named <span>CATUS</span>, by transferring the <b>cat</b>egory-level <b>u</b>niversal tran<b>s</b>ition knowledge over different cities. Firstly, we build two self-supervised objectives in <span>CATUS</span>: <i>next category prediction</i> and <i>next POI prediction</i>, to obtain the universal transition-knowledge across different cities and POIs. Then, we design a <i>category-transition oriented sampler</i> on the data level and an <i>implicit and explicit transfer strategy</i> on the encoder level to enhance this transfer process. At the fine-tuning stage, we propose a <i>distance oriented sampler</i> to better align the POI representations into the local context of each city. Extensive experiments on two large datasets consisting of four cities demonstrate the superiority of our proposed <span>CATUS</span> over the state-of-the-art alternatives. The code and datasets are available at https://github.com/NLPWM-WHU/CATUS.</p>","PeriodicalId":50940,"journal":{"name":"ACM Transactions on the Web","volume":"12 1","pages":""},"PeriodicalIF":3.5,"publicationDate":"2023-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138516937","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
K. Sun, T. Qian, Chenliang Li, Xuan Ma, Qing Li, Ming Zhong, Yuanyuan Zhu, Mengchi Liu
{"title":"Pre-Training Across Different Cities for Next POI Recommendation","authors":"K. Sun, T. Qian, Chenliang Li, Xuan Ma, Qing Li, Ming Zhong, Yuanyuan Zhu, Mengchi Liu","doi":"10.1145/3605554","DOIUrl":"https://doi.org/10.1145/3605554","url":null,"abstract":"The Point-of-Interest (POI) transition behaviors could hold absolute sparsity and relative sparsity very differently for different cities. Hence, it is intuitive to transfer knowledge across cities to alleviate those data sparsity and imbalance problems for next POI recommendation. Recently, pre-training over a large-scale dataset has achieved great success in many relevant fields, like computer vision and natural language processing. By devising various self-supervised objectives, pre-training models can produce more robust representations for downstream tasks. However, it is not trivial to directly adopt such existing pre-training techniques for next POI recommendation, due to the lacking of common semantic objects (users or items) across different cities. Thus in this paper, we tackle such a new research problem of pre-training across different cities for next POI recommendation. Specifically, to overcome the key challenge that different cities do not share any common object, we propose a novel pre-training model named CATUS, by transferring the category-level universal transition knowledge over different cities. Firstly, we build two self-supervised objectives in CATUS: next category prediction and next POI prediction, to obtain the universal transition-knowledge across different cities and POIs. Then, we design a category-transition oriented sampler on the data level and an implicit and explicit transfer strategy on the encoder level to enhance this transfer process. At the fine-tuning stage, we propose a distance oriented sampler to better align the POI representations into the local context of each city. Extensive experiments on two large datasets consisting of four cities demonstrate the superiority of our proposed CATUS over the state-of-the-art alternatives. The code and datasets are available at https://github.com/NLPWM-WHU/CATUS.","PeriodicalId":50940,"journal":{"name":"ACM Transactions on the Web","volume":" ","pages":""},"PeriodicalIF":3.5,"publicationDate":"2023-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47846112","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Privacy Scoring Over OSNs: Shared Data Granularity as a Latent Dimension","authors":"Yasir Kilic, Ali Inan","doi":"10.1145/3604909","DOIUrl":"https://doi.org/10.1145/3604909","url":null,"abstract":"Privacy scoring aims at measuring the privacy violation risk of a user over an online social network (OSN) based on attribute values shared in the user’s OSN profile page and the user’s position in the network. Existing studies on privacy scoring rely on possibly biased or emotional survey data. In this study, we work with real-world data collected from the professional LinkedIn OSN and show that probabilistic scoring models derived from the item response theory (IRT) fit real-world data better than naive approaches. We also introduce the granularity of the data an OSN user shares on her profile as a latent dimension of the OSN privacy scoring problem. Incorporating data granularity into our model, we build the most comprehensive solution to the OSN privacy scoring problem. Extensive experimental evaluation of various scoring models indicate the effectiveness of the proposed solution.","PeriodicalId":50940,"journal":{"name":"ACM Transactions on the Web","volume":" ","pages":""},"PeriodicalIF":3.5,"publicationDate":"2023-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49275997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Willian Massami Watanabe, Danilo Alves dos Santos, Claiton de Oliveira
{"title":"Layout Cross-Browser Failure Classification for Mobile Responsive Design Web Applications: Combining Classification Models Using Feature Selection","authors":"Willian Massami Watanabe, Danilo Alves dos Santos, Claiton de Oliveira","doi":"10.1145/3580518","DOIUrl":"https://doi.org/10.1145/3580518","url":null,"abstract":"Cross-Browser Incompatibilities - XBIs are defined as inconsistencies that can be observed in Web applications when they are rendered in a specific browser compared to others. These inconsistencies are associated with differences in the way each browser implements their capabilities and render Web applications. The inconsistencies range from minor layout differences to lack of core functionalities of Web applications when rendered in specific browsers. The state-of-the-art proposes different approaches for detecting XBIs and many of them are based on classification models, using features extracted from the DOM-structure (DOM-based approaches) and screenshots (computer vision approaches) of Web applications. A comparison between both DOM-based and computer vision classification models has not been previously reported in the literature and a combination between both approaches could possibly lead to increased accuracy of classification models. In this paper, we extend the use of these classification models for detecting Layout XBIs in Responsive Design Web applications, rendered on different browser viewport widths and devices (iPhone 12 mini, iPhone 12, iPhone 12 PRO MAX and Pixel XL). We investigate the use of state-of-the-art classification models (Browserbite, Crosscheck and our previous work) for detecting Layout Cross-Browser Failures, which consist of Layout XBIs which negatively affect the layout of Responsive Design Web applications. Furthermore, we propose an enhanced classification model which combines features from different state-of-the-art classification models (DOM-based and computer vision), using Feature Selection. We built two datasets for evaluating the efficacy of classification models in separately detecting External and Internal Layout failures, using data from 72 Responsive design Web applications. The proposed classification model reported the highest F1-Score for detecting External Layout Failures (0.65) and Internal Layout Failures (0.35), and these results reported significant differences compared to Browserbite and Crosscheck classification models. Nevertheless, the experiment showed a lower accuracy in the classification of Internal Layout Failures and suggest the use of other image similarity metrics or Deep Learning models for increasing the efficacy of classification models.","PeriodicalId":50940,"journal":{"name":"ACM Transactions on the Web","volume":" ","pages":""},"PeriodicalIF":3.5,"publicationDate":"2023-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45335033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Willian Massami Watanabe, Danilo Alves dos Santos, Claiton de Oliveira
{"title":"Layout Cross-Browser Failure Classification for Mobile Responsive Design Web Applications: Combining Classification Models Using Feature Selection","authors":"Willian Massami Watanabe, Danilo Alves dos Santos, Claiton de Oliveira","doi":"https://dl.acm.org/doi/10.1145/3580518","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3580518","url":null,"abstract":"<p>Cross-Browser Incompatibilities - XBIs are defined as inconsistencies that can be observed in Web applications when they are rendered in a specific browser compared to others. These inconsistencies are associated with differences in the way each browser implements their capabilities and render Web applications. The inconsistencies range from minor layout differences to lack of core functionalities of Web applications when rendered in specific browsers. The state-of-the-art proposes different approaches for detecting XBIs and many of them are based on classification models, using features extracted from the DOM-structure (DOM-based approaches) and screenshots (computer vision approaches) of Web applications. A comparison between both DOM-based and computer vision classification models has not been previously reported in the literature and a combination between both approaches could possibly lead to increased accuracy of classification models. In this paper, we extend the use of these classification models for detecting Layout XBIs in Responsive Design Web applications, rendered on different browser viewport widths and devices (iPhone 12 mini, iPhone 12, iPhone 12 PRO MAX and Pixel XL). We investigate the use of state-of-the-art classification models (Browserbite, Crosscheck and our previous work) for detecting Layout Cross-Browser Failures, which consist of Layout XBIs which negatively affect the layout of Responsive Design Web applications. Furthermore, we propose an enhanced classification model which combines features from different state-of-the-art classification models (DOM-based and computer vision), using Feature Selection. We built two datasets for evaluating the efficacy of classification models in separately detecting External and Internal Layout failures, using data from 72 Responsive design Web applications. The proposed classification model reported the highest F1-Score for detecting External Layout Failures (0.65) and Internal Layout Failures (0.35), and these results reported significant differences compared to Browserbite and Crosscheck classification models. Nevertheless, the experiment showed a lower accuracy in the classification of Internal Layout Failures and suggest the use of other image similarity metrics or Deep Learning models for increasing the efficacy of classification models.</p>","PeriodicalId":50940,"journal":{"name":"ACM Transactions on the Web","volume":"43 10","pages":""},"PeriodicalIF":3.5,"publicationDate":"2023-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138495150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Privacy Scoring Over OSNs: Shared Data Granularity as a Latent Dimension","authors":"Yasir Kilic, Ali Inan","doi":"https://dl.acm.org/doi/10.1145/3604909","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3604909","url":null,"abstract":"<p>Privacy scoring aims at measuring the privacy violation risk of a user over an online social network (OSN) based on attribute values shared in the user’s OSN profile page and the user’s position in the network. Existing studies on privacy scoring rely on possibly biased or emotional survey data. In this study, we work with real-world data collected from the professional LinkedIn OSN and show that probabilistic scoring models derived from the item response theory (IRT) fit real-world data better than naive approaches. We also introduce the granularity of the data an OSN user shares on her profile as a latent dimension of the OSN privacy scoring problem. Incorporating data granularity into our model, we build the most comprehensive solution to the OSN privacy scoring problem. Extensive experimental evaluation of various scoring models indicate the effectiveness of the proposed solution.</p>","PeriodicalId":50940,"journal":{"name":"ACM Transactions on the Web","volume":"43 11","pages":""},"PeriodicalIF":3.5,"publicationDate":"2023-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138495149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Closeness Centrality on Uncertain Graphs","authors":"Zhenfang Liu, Jianxiong Ye, Zhaonian Zou","doi":"10.1145/3604912","DOIUrl":"https://doi.org/10.1145/3604912","url":null,"abstract":"Centrality is a family of metrics for characterizing the importance of a vertex in a graph. Although a large number of centrality metrics have been proposed, a majority of them ignores uncertainty in graph data. In this paper, we formulate closeness centrality on uncertain graphs and define the batch closeness centrality evaluation problem that computes the closeness centrality of a subset of vertices in an uncertain graph. We develop three algorithms, MS-BCC, MG-BCC and MGMS-BCC, based on sampling to approximate the closeness centrality of the specified vertices. All these algorithms require to perform breadth-first searches (BFS) starting from the specified vertices on a large number of sampled possible worlds of the uncertain graph. To improve the efficiency of the algorithms, we exploit operation-level parallelism of the BFS traversals and simultaneously execute the shared sequences of operations in the breadth-first searches. Parallelization is realized at different levels in these algorithms. The experimental results show that the proposed algorithms can efficiently and accurately approximate the closeness centrality of the given vertices. MGMS-BCC is faster than both MS-BCC and MG-BCC because it avoids more repeated executions of the shared operation sequences in the BFS traversals.","PeriodicalId":50940,"journal":{"name":"ACM Transactions on the Web","volume":" ","pages":""},"PeriodicalIF":3.5,"publicationDate":"2023-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48938056","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Causality and Correlation Graph Modeling for Effective and Explainable Session-based Recommendation","authors":"Huizi Wu, Cong Geng, Hui Fang","doi":"https://dl.acm.org/doi/10.1145/3593313","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3593313","url":null,"abstract":"<p>Session-based recommendation which has been witnessed a booming interest recently, focuses on predicting a user’s next interested item(s) based on an anonymous session. Most existing studies adopt complex deep learning techniques (e.g., graph neural networks) for effective session-based recommendation. However, they merely address <i>co-occurrence</i> between items, but fail to well distinguish <i>causality</i> and <i>correlation</i> relationship. Considering the varied interpretations and characteristics of causality and correlation relationship between items, in this study, we propose a novel method denoted as CGSR by jointly modeling causality and correlation relationship between items. In particular, we construct cause, effect and correlation graphs from sessions by simultaneously considering the false causality problem. We further design a graph neural network-based method for session-based recommendation. To conclude, we strive to explore the relationship between items from specific “causality” (directed) and “correlation” (undirected) perspectives. Extensive experiments on three datasets show that our model outperforms other state-of-the-art methods in terms of recommendation accuracy. Moreover, we further propose an explainable framework on CGSR, and demonstrate the explainability of our model via case studies on Amazon dataset.</p>","PeriodicalId":50940,"journal":{"name":"ACM Transactions on the Web","volume":"43 12","pages":""},"PeriodicalIF":3.5,"publicationDate":"2023-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138495148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Guixiang Zhu, Jie Cao, Lei Chen, Youquan Wang, Zhan Bu, Shuxin Yang, Jianqing Wu, Zhiping Wang
{"title":"A Multi-Task Graph Neural Network with Variational Graph Auto-Encoders for Session-Based Travel Packages Recommendation","authors":"Guixiang Zhu, Jie Cao, Lei Chen, Youquan Wang, Zhan Bu, Shuxin Yang, Jianqing Wu, Zhiping Wang","doi":"https://dl.acm.org/doi/10.1145/3577032","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3577032","url":null,"abstract":"<p>Session-based travel packages recommendation aims to predict users’ next click based on their current and historical sessions recorded by Online Travel Agencies (OTAs). Recently, an increasing number of studies attempted to apply Graph Neural Networks (GNNs) to the session-based recommendation and obtained promising results. However, most of them do not take full advantage of the explicit latent structure from attributes of items, making learned representations of items less effective and difficult to interpret. Moreover, they only combine historical sessions (long-term preferences) with a current session (short-term preference) to learn a unified representation of users, ignoring the effects of historical sessions for the current session. To this end, this article proposes a novel session-based model named STR-VGAE, which fills subtasks of the travel packages recommendation and variational graph auto-encoders simultaneously. STR-VGAE mainly consists of three components: <i>travel packages encoder</i>, <i>users behaviors encoder</i>, and <i>interaction modeling</i>. Specifically, the <i>travel packages encoder</i> module is used to learn a unified travel package representation from co-occurrence attribute graphs by using multi-view variational graph auto-encoders and a multi-view attention network. The <i>users behaviors encoder</i> module is used to encode user’ historical and current sessions with a personalized GNN, which considers the effects of historical sessions on the current session, and coalesce these two kinds of session representations to learn the high-quality users’ representations by exploiting a gated fusion approach. The <i>interaction modeling</i> module is used to calculate recommendation scores over all candidate travel packages. Extensive experiments on a real-life tourism e-commerce dataset from China show that STR-VGAE yields significant performance advantages over several competitive methods, meanwhile provides an interpretation for the generated recommendation list.</p>","PeriodicalId":50940,"journal":{"name":"ACM Transactions on the Web","volume":"43 13","pages":""},"PeriodicalIF":3.5,"publicationDate":"2023-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138495147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}