John A. Berlin, Mat Kelly, Michael L. Nelson, M. Weigle
{"title":"To Re-experience the Web: A Framework for the Transformation and Replay of Archived Web Pages","authors":"John A. Berlin, Mat Kelly, Michael L. Nelson, M. Weigle","doi":"10.1145/3589206","DOIUrl":"https://doi.org/10.1145/3589206","url":null,"abstract":"When replaying an archived web page, or memento, the fundamental expectation is that the page should be viewable and function exactly as it did at archival time. However, this expectation requires web archives upon replay to modify the page and its embedded resources so that all resources and links reference the archive rather than the original server. Although these modifications necessarily change the state of the representation, it is understood that without them the replay of mementos from the archive would not be possible. The process of replaying mementos and the modifications made to the representations by web archives varies between archives. Because of this, there is no standard terminology for describing the replay and needed modifications. In this paper, we propose terminology for describing the existing styles of replay and the modifications made on the part of web archives to mementos to facilitate replay. Because of issues discovered with server-side only modifications, we propose a general framework for the auto-generation of client-side rewriting libraries. Finally, we evaluate the effectiveness of using a generated client-side rewriting library to augment the existing replay systems of web archives by crawling mementos replayed from the Internet Archive’s Wayback Machine with and without the generated client-side rewriter. By using the generated client-side rewriter, we were able to decrease the cumulative number of requests blocked by the content security policy of the Wayback Machine for 577 mementos by 87.5% and increased the cumulative number of requests made by 32.8%. We were also able to replay mementos that were previously not replayable from the Internet Archive. Many of the client-side rewriting ideas described in this work have been implemented into Wombat, a client-side URL rewriting system that is used by the Webrecorder, Pywb, and Wayback Machine playback systems.","PeriodicalId":50940,"journal":{"name":"ACM Transactions on the Web","volume":" ","pages":""},"PeriodicalIF":3.5,"publicationDate":"2023-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47117652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chang-Ai Sun, An Fu, Jingting Jia, Meng Li, Jun Han
{"title":"Improving Conformance of Web Services: A Constraint-based Model-driven Approach","authors":"Chang-Ai Sun, An Fu, Jingting Jia, Meng Li, Jun Han","doi":"https://dl.acm.org/doi/10.1145/3580515","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3580515","url":null,"abstract":"<p>Web services have been widely used to develop complex distributed software systems in the context of Service Oriented Architecture (SOA). As a standard for describing Web services, the Web Service Description Language (WSDL) provides a universal mechanism to describe the service’s functionalities for the service consumers. However, the current WSDL only provides the description of the interfaces to a Web Service without any restrictions or assumptions on how to properly invoke the service, resulting in divergent understanding of the Web service’s behavior between the service developer and service consumer. A particular challenge is how to make explicit the various behavior assumptions and restrictions of a service (for the user), and make sure that the service implementation conforms to them (for the developer). In this article, we propose a constraint-based model-driven approach to improving the behavior conformance of Web services. In our approach, constraints are introduced in an extended WSDL, called CxWSDL, to formally and explicitly express the implicit restrictions and assumptions on the behavior of a Web service, and then the predefined constraints are used to derive test cases in a model-driven manner to test the service implementation’s conformance to its behavior constraints from the user’s perspective. An empirical study involving four real-life Web services was conducted to evaluate the effectiveness of our approach, and four actual inconsistencies were discovered.</p>","PeriodicalId":50940,"journal":{"name":"ACM Transactions on the Web","volume":"43 33","pages":""},"PeriodicalIF":3.5,"publicationDate":"2023-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138495125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"FinTech on the Web: An Overview","authors":"Chung-Chi Chen, Hen-Hsen Huang, Hiroya Takamura, Makoto P. Kato, Yu-Lieh Huang","doi":"https://dl.acm.org/doi/10.1145/3572404","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3572404","url":null,"abstract":"<p>In this article, we provide an overview of ACM TWEB’s special issue, <i>Financial Technology on the Web</i>. This special issue covers diverse topics: (1) a new architecture for leveraging online news to investment and risk management, (2) a cross-platform analysis of the post quality and users’ behaviors, and (3) an empirical study on disentangling decentralized finance compositions. In addition to a guide for the special issue, we also share a brief opinion on the future of financial technology on the Web.</p>","PeriodicalId":50940,"journal":{"name":"ACM Transactions on the Web","volume":"43 29","pages":""},"PeriodicalIF":3.5,"publicationDate":"2023-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138495129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A User-Centric Analysis of Social Media for Stock Market Prediction","authors":"Mohamed Reda Bouadjenek, Scott Sanner, Ga Wu","doi":"https://dl.acm.org/doi/10.1145/3532856","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3532856","url":null,"abstract":"<p>Social media platforms such as Twitter or StockTwits are widely used for sharing stock market opinions between investors, traders, and entrepreneurs. Empirically, previous work has shown that the content posted on these social media platforms can be leveraged to predict various aspects of stock market performance. Nonetheless, actors on these social media platforms may not always have altruistic motivations and may instead seek to influence stock trading behavior through the (potentially misleading) information they post. While a lot of previous work has sought to analyze how social media can be used to predict the stock market, there remain many questions regarding the quality of the predictions and the behavior of active users on these platforms. To this end, this article seeks to address a number of open research questions: Which social media platform is more predictive of stock performance? What posted content is actually predictive, and over what time horizon? How does stock market posting behavior vary among different users? Are all users trustworthy or do some user’s predictions consistently mislead about the true stock movement? To answer these questions, we analyzed data from Twitter and StockTwits covering almost 5 years of posted messages spanning 2015 to 2019. The results of this large-scale study provide a number of important insights among which we present the following: (i) StockTwits is a more predictive source of information than Twitter, leading us to focus our analysis on StockTwits; (ii) on StockTwits, users’ self-labeled sentiments are correlated with the stock market but are only slightly predictive in aggregate over the short-term; (iii) there are at least three clear types of temporal predictive behavior for users over a 144 days horizon: short, medium, and long term; and (iv) consistently incorrect users who are reliably wrong tend to exhibit what we conjecture to be “botlike” post content and their removal from the data tends to improve stock market predictions from self-labeled content.</p>","PeriodicalId":50940,"journal":{"name":"ACM Transactions on the Web","volume":"43 32","pages":""},"PeriodicalIF":3.5,"publicationDate":"2023-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138495126","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Investment and Risk Management with Online News and Heterogeneous Networks","authors":"Gary Ang, Ee-Peng Lim","doi":"https://dl.acm.org/doi/10.1145/3532858","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3532858","url":null,"abstract":"<p>Stock price movements in financial markets are influenced by large volumes of news from diverse sources on the web, e.g., online news outlets, blogs, social media. Extracting useful information from online news for financial tasks, e.g., forecasting stock returns or risks, is, however, challenging due to the low signal-to-noise ratios of such online information. Assessing the relevance of each news article to the price movements of individual stocks is also difficult, even for human experts. In this article, we propose the Guided Global-Local Attention-based Multimodal Heterogeneous Network (GLAM) model, which comprises novel attention-based mechanisms for multimodal sequential and graph encoding, a guided learning strategy, and a multitask training objective. GLAM uses multimodal information, heterogeneous relationships between companies and leverages significant local responses of individual stock prices to online news to extract useful information from diverse global online news relevant to individual stocks for multiple forecasting tasks. Our extensive experiments with multiple datasets show that GLAM outperforms other state-of-the-art models on multiple forecasting tasks and investment and risk management application case-studies.</p>","PeriodicalId":50940,"journal":{"name":"ACM Transactions on the Web","volume":"43 30","pages":""},"PeriodicalIF":3.5,"publicationDate":"2023-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138495128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Reverse Maximum Inner Product Search: Formulation, Algorithms, and Analysis","authors":"Daichi Amagata, Takahiro Hara","doi":"10.1145/3587215","DOIUrl":"https://doi.org/10.1145/3587215","url":null,"abstract":"The MIPS (maximum inner product search), which finds the item with the highest inner product with a given query user, is an essential problem in the recommendation field. Usually, e-commerce companies face situations where they want to promote and sell new or discounted items. In these situations, we have to consider a question: who are interested in the items and how to find them? This article answers this question by addressing a new problem called reverse maximum inner product search (reverse MIPS). Given a query vector and two sets of vectors (user vectors and item vectors), the problem of reverse MIPS finds a set of user vectors whose inner product with the query vector is the maximum among the query and item vectors. Although the importance of this problem is clear, its straightforward implementation incurs a computationally expensive cost. We therefore propose Simpfer, a simple, fast, and exact algorithm for reverse MIPS. In an offline phase, Simpfer builds a simple index that maintains a lower-bound of the maximum inner product. By exploiting this index, Simpfer judges whether the query vector can have the maximum inner product or not, for a given user vector, in a constant time. Our index enables filtering user vectors, which cannot have the maximum inner product with the query vector, in a batch. We theoretically demonstrate that Simpfer outperforms baselines employing state-of-the-art MIPS techniques. In addition, we answer two new research questions. Can approximation algorithms further improve reverse MIPS processing? Is there an exact algorithm that is faster than Simpfer? For the former, we show that approximation with quality guarantee provides a little speed-up. For the latter, we propose Simpfer++, a theoretically and practically faster algorithm than Simpfer. Our extensive experiments on real datasets show that Simpfer is at least two orders of magnitude faster than the baselines, and Simpfer++ further improves the online processing time.","PeriodicalId":50940,"journal":{"name":"ACM Transactions on the Web","volume":" ","pages":""},"PeriodicalIF":3.5,"publicationDate":"2023-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46937459","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Niffler: Real-time Device-level Anomalies Detection in Smart Home","authors":"Haohua Du, Yue Wang, Xiaoya Xu, Mingsheng Liu","doi":"10.1145/3586073","DOIUrl":"https://doi.org/10.1145/3586073","url":null,"abstract":"Device-level security has become a major concern in smart home systems. Detecting problems in smart home sytems strives to increase accuracy in near real time without hampering the regular tasks of the smart home. The current state of the art in detecting anomalies in smart home devices is mainly focused on the app level, which provides a basic level of security by assuming that the devices are functioning correctly. However, this approach is insufficient for ensuring the overall security of the system, as it overlooks the possibility of anomalies occurring at the lower layers such as the devices. In this article, we propose a novel notion, correlated graph, and with the aid of that, we develop our system to detect misbehaving devices without modifying the existing system. Our correlated graphs explicitly represent the contextual correlations among smart devices with little knowledge about the system. We further propose a linkage path model and a sensitivity ranking method to assist in detecting the abnormalities. We implement a semi-automatic prototype of our approach, evaluate it in real-world settings, and demonstrate its efficiency, which achieves an accuracy of around 90% in near real time.","PeriodicalId":50940,"journal":{"name":"ACM Transactions on the Web","volume":"17 1","pages":"1 - 27"},"PeriodicalIF":3.5,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41445954","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Luca Pajola, Dongkai Chen, M. Conti, V. S. Subrahmanian
{"title":"A Novel Review Helpfulness Measure based on the User-Review-Item Paradigm","authors":"Luca Pajola, Dongkai Chen, M. Conti, V. S. Subrahmanian","doi":"10.1145/3585280","DOIUrl":"https://doi.org/10.1145/3585280","url":null,"abstract":"Review platforms are viral online services where users share and read opinions about products (e.g., a smartphone) or experiences (e.g., a meal at a restaurant). Other users may be influenced by such opinions when making deciding what to buy. The usability of review platforms is currently limited by the massive number of opinions on many products. Therefore, showing only the most helpful reviews for each product is in the best interests of both users and the platform (e.g., Amazon). The current state of the art is far from accurate in predicting how helpful a review is. First, most existing works lack compelling comparisons as many studies are conducted on datasets that are not publicly available. As a consequence, new studies are not always built on top of prior baselines. Second, most existing research focuses only on features derived from the review text, ignoring other fundamental aspects of the review platforms (e.g., the other reviews of a product, the order in which they were submitted). In this paper, we first carefully review the most relevant works in the area published during the last 20 years. We then propose the User-Review-Item (URI) paradigm, a novel abstraction for modeling the problem that moves the focus of the feature engineering from the review to the platform level. We empirically validate the URI paradigm on a dataset of products from six Amazon categories with 270 trained models: on average, classifiers gain +4% in F1-score when considering the whole review platform context. In our experiments, we further emphasize some problems with the helpfulness prediction task: (1) the users’ writing style changes over time (i.e., concept drift), (2) past models do not generalize well across different review categories, and (3) past methods to generate the ground-truth produced unreliable helpfulness scores, affecting the model evaluation phase.","PeriodicalId":50940,"journal":{"name":"ACM Transactions on the Web","volume":" ","pages":""},"PeriodicalIF":3.5,"publicationDate":"2023-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44921832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enhancing Conversational Recommendation Systems with Representation Fusion","authors":"Yingxu Wang, Xiaoru Chen, Jinyuan Fang, Zaiqiao Meng, Shangsong Liang","doi":"https://dl.acm.org/doi/10.1145/3577034","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3577034","url":null,"abstract":"<p>Conversational Recommendation Systems (CRSs) aim to improve recommendation performance by utilizing information from a conversation session. A CRS first constructs questions and then asks users for their feedback in each conversation session to refine better recommendation lists to users. The key design of CRS is to construct proper questions and obtain users’ feedback in response to these questions so as to effectively capture user preferences. Many CRS works have been proposed; however, they suffer from defects when constructing questions for users to answer: (1) employing a dialogue policy agent for constructing questions is one of the most common choices in CRS, but it needs to be trained with a huge corpus, and (2) it is not appropriate that constructing questions from a single policy (e.g., a CRS only selects attributes that the user has interacted with) for all users with different preferences. To address these defects, we propose a novel CRS model, namely a Representation Fusion–based Conversational Recommendation model, where the whole conversation session is divided into two subsessions (i.e., Local Question Search subsession and Global Question Search subsession) and two different question search methods are proposed to construct questions in the corresponding subsessions without employing policy agents. In particular, in the Local Question Search subsession we adopt a novel graph mining method to find questions, where the paths in the graph between users and attributes can eliminate irrelevant attributes; in the Global Question Search subsession we propose to initialize user preference on items with the user and all item historical rating records and construct questions based on user’s preference. Then, we update the embeddings independently over the two subsessions according to user’s feedback and fuse the final embeddings from the two subsessions for the recommendation. Experiments on three real-world recommendation datasets demonstrate that our proposed method outperforms five state-of-the-art baselines.</p>","PeriodicalId":50940,"journal":{"name":"ACM Transactions on the Web","volume":"43 31","pages":""},"PeriodicalIF":3.5,"publicationDate":"2023-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138495127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mei Yu, Kun Zhu, Mankun Zhao, Jian Yu, Tianyi Xu, Di Jin, Xuewei Li, Ruiguo Yu
{"title":"Learning Neighbor User Intention on User-Item Interaction Graphs for Better Sequential Recommendation","authors":"Mei Yu, Kun Zhu, Mankun Zhao, Jian Yu, Tianyi Xu, Di Jin, Xuewei Li, Ruiguo Yu","doi":"https://dl.acm.org/doi/10.1145/3580520","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3580520","url":null,"abstract":"<p>The task of Sequential Recommendation aims to predict the user’s preference by analyzing the user’s historical behaviours. Existing methods model item transitions through leveraging sequential patterns. However, they mainly consider the target user’s own behaviours and dynamic characteristics, while often ignore the high-order collaborative connections when modelling user preferences. Some recent works try to use graph-based methods to introduce high-order collaborative signals for Sequential Recommendation, but they have two main problems. One is that the sequential patterns cannot be effectively mined, and the other is that their way of introducing high-order collaborative signals is not very suitable for Sequential Recommendation. To address these problems, we propose to fully exploit sequence features and model high-order collaborative signals for Sequential Recommendation. We propose a <b>N</b>eighbor user <b>I</b>ntention based <b>S</b>equential <b>Rec</b>ommender, namely NISRec, which utilizes the intentions of high-order connected neighbor users as high-order collaborative signals, in order to improve recommendation performance for the target user. To be specific, NISRec contains two main modules: the neighbor user intention embedding module (NIE) and the fusion module. The NIE describes both the long-term and the short-term intentions of neighbor users and aggregates them separately. The fusion module uses these two types of aggregated intentions to model high-order collaborative signals in both the embedding process and the user preference modelling phase for recommendation of the target user. Experimental results show that our new approach outperforms the state-of-the-art methods on both sparse and dense datasets. Extensive studies further show the effectiveness of the diverse neighbor intentions introduced by NISRec.</p>","PeriodicalId":50940,"journal":{"name":"ACM Transactions on the Web","volume":"45 1","pages":""},"PeriodicalIF":3.5,"publicationDate":"2023-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138516923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}