Proceedings of the Tenth ACM International Conference on Web Search and Data Mining最新文献_第9页

Modeling Navigation in Information Networks 信息网络中的导航建模

Proceedings of the Tenth ACM International Conference on Web Search and Data Mining Pub Date : 2017-02-02 DOI: 10.1145/3018661.3022754

D. Dimitrov

{"title":"Modeling Navigation in Information Networks","authors":"D. Dimitrov","doi":"10.1145/3018661.3022754","DOIUrl":"https://doi.org/10.1145/3018661.3022754","url":null,"abstract":"Navigation in an information space is a natural way to explore and discover its content. Information systems on the Web like digital encyclopedias (e.g., Wikipedia) are interested in providing good navigational support to their users. To that end, navigation models can be useful for estimating the general navigability of an information space and for understanding how users interact with it. Such models can also be applied to identify problems faced by the users during navigation and to improve user interfaces. Studying navigation on the Web is a challenging task that has a long tradition in our scientific community. Based on large studies, researchers have made significant steps towards understanding navigational user behavior on the Web identifying general usage patterns, regularities, and strategies users apply during navigation. The seminal information foraging theory has been developed suggesting that people follow links by constantly estimating their quality in terms of information value and cost associated with obtaining that value by interacting with the environment. Furthermore, models describing the network structure of the Web like the bow tie model, and the small world models have been introduced. These models contributed valuable insights towards characterizing the underlying network topology on which the users operate and the extent to which it allows efficient navigation. In the context of information networks, researchers have successfully modeled user navigation resorting to Markov chains and to decentralized search. With respect to the users' navigational behavior and their click activities to traverse a link, researchers have found a valuable source of information in the log files of Web servers. Click data has also been collected by letting humans play navigational games on Wikipedia. With this data, researchers tested different navigational hypotheses; for example, (i) if humans tend to navigate between semantically similar articles, (ii) if they experience a trade-off between following links leading towards semantically similar articles and following links leading towards possibly well-connected articles. For navigation with a particular target in mind, users are found to be greedy with respect to the next click if they are confident to be on the right path, whereas they tend to explore the information network at random if they feel insecure or lost and have no intuition about the next click. Although these research lines have advanced our understanding of navigational user behavior in information networks, for the goal of the proposed thesis-modeling navigation-related work does not address and cover the following questions: (i) What is the relationship between the user's awareness regarding the structure and the topology of the information network and the efficiency of navigation, i.e., modeled as decentralized search and (ii) How do users interact with the content to explore and discover it, i.e., are there some specific l","PeriodicalId":344017,"journal":{"name":"Proceedings of the Tenth ACM International Conference on Web Search and Data Mining","volume":"103 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132472706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Modeling Event Importance for Ranking Daily News Events 为每日新闻事件排序建模事件重要性

Proceedings of the Tenth ACM International Conference on Web Search and Data Mining Pub Date : 2017-02-02 DOI: 10.1145/3018661.3018728

Vinay Setty, Abhijith Anand, Arunav Mishra, Avishek Anand

{"title":"Modeling Event Importance for Ranking Daily News Events","authors":"Vinay Setty, Abhijith Anand, Arunav Mishra, Avishek Anand","doi":"10.1145/3018661.3018728","DOIUrl":"https://doi.org/10.1145/3018661.3018728","url":null,"abstract":"We deal with the problem of ranking news events on a daily basis for large news corpora, an essential building block for news aggregation. News ranking has been addressed in the literature before but with individual news articles as the unit of ranking. However, estimating event importance accurately requires models to quantify current day event importance as well as its significance in the historical context. Consequently, in this paper we show that a cluster of news articles representing an event is a better unit of ranking as it provides an improved estimation of popularity, source diversity and authority cues. In addition, events facilitate quantifying their historical significance by linking them with long-running topics and recent chain of events. Our main contribution in this paper is to provide effective models for improved news event ranking. To this end, we propose novel event mining and feature generation approaches for improving estimates of event importance. Finally, we conduct extensive evaluation of our approaches on two large real-world news corpora each of which span for more than a year with a large volume of up to tens of thousands of daily news articles. Our evaluations are large-scale and based on a clean human curated ground-truth from Wikipedia Current Events Portal. Experimental comparison with a state-of-the-art news ranking technique based on language models demonstrates the effectiveness of our approach.","PeriodicalId":344017,"journal":{"name":"Proceedings of the Tenth ACM International Conference on Web Search and Data Mining","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129774774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 25

Beyond the Words: Predicting User Personality from Heterogeneous Information 言语之外:从异质信息中预测用户个性

Proceedings of the Tenth ACM International Conference on Web Search and Data Mining Pub Date : 2017-02-02 DOI: 10.1145/3018661.3018717

Honghao Wei, Fuzheng Zhang, Nicholas Jing Yuan, Chuan Cao, Hao Fu, Xing Xie, Y. Rui, Wei-Ying Ma

{"title":"Beyond the Words: Predicting User Personality from Heterogeneous Information","authors":"Honghao Wei, Fuzheng Zhang, Nicholas Jing Yuan, Chuan Cao, Hao Fu, Xing Xie, Y. Rui, Wei-Ying Ma","doi":"10.1145/3018661.3018717","DOIUrl":"https://doi.org/10.1145/3018661.3018717","url":null,"abstract":"An incisive understanding of user personality is not only essential to many scientific disciplines, but also has a profound business impact on practical applications such as digital marketing, personalized recommendation, mental diagnosis, and human resources management. Previous studies have demonstrated that language usage in social media is effective in personality prediction. However, except for single language features, a less researched direction is how to leverage the heterogeneous information on social media to have a better understanding of user personality. In this paper, we propose a Heterogeneous Information Ensemble framework, called HIE, to predict users' personality traits by integrating heterogeneous information including self-language usage, avatar, emoticon, and responsive patterns. In our framework, to improve the performance of personality prediction, we have designed different strategies extracting semantic representations to fully leverage heterogeneous information on social media. We evaluate our methods with extensive experiments based on a real-world data covering both personality survey results and social media usage from thousands of volunteers. The results reveal that our approaches significantly outperform several widely adopted state-of-the-art baseline methods. To figure out the utility of HIE in a real-world interactive setting, we also present DiPsy, a personalized chatbot to predict user personality through heterogeneous information in digital traces and conversation logs.","PeriodicalId":344017,"journal":{"name":"Proceedings of the Tenth ACM International Conference on Web Search and Data Mining","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127854609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 81

D-Cube: Dense-Block Detection in Terabyte-Scale Tensors D-Cube: tb尺度张量中的密集块检测

Proceedings of the Tenth ACM International Conference on Web Search and Data Mining Pub Date : 2017-02-02 DOI: 10.1145/3018661.3018676

Kijung Shin, Bryan Hooi, Jisu Kim, C. Faloutsos

{"title":"D-Cube: Dense-Block Detection in Terabyte-Scale Tensors","authors":"Kijung Shin, Bryan Hooi, Jisu Kim, C. Faloutsos","doi":"10.1145/3018661.3018676","DOIUrl":"https://doi.org/10.1145/3018661.3018676","url":null,"abstract":"How can we detect fraudulent lockstep behavior in large-scale multi-aspect data (i.e., tensors)? Can we detect it when data are too large to fit in memory or even on a disk? Past studies have shown that dense blocks in real-world tensors (e.g., social media, Wikipedia, TCP dumps, etc.) signal anomalous or fraudulent behavior such as retweet boosting, bot activities, and network attacks. Thus, various approaches, including tensor decomposition and search, have been used for rapid and accurate dense-block detection in tensors. However, all such methods have low accuracy, or assume that tensors are small enough to fit in main memory, which is not true in many real-world applications such as social media and web. To overcome these limitations, we propose D-Cube, a disk-based dense-block detection method, which also can be run in a distributed manner across multiple machines. Compared with state-of-the-art methods, D-Cube is (1) Memory Efficient: requires up to 1,600 times less memory and handles 1,000 times larger data (2.6TB), (2) Fast: up to 5 times faster due to its near-linear scalability with all aspects of data, (3) Provably Accurate: gives a guarantee on the densities of the blocks it finds, and (4) Effective: successfully spotted network attacks from TCP dumps and synchronized behavior in rating data with the highest accuracy.","PeriodicalId":344017,"journal":{"name":"Proceedings of the Tenth ACM International Conference on Web Search and Data Mining","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121568795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 61

iPhone's Digital Marketplace: Characterizing the Big Spenders iPhone的数字市场:大消费者的特征

Proceedings of the Tenth ACM International Conference on Web Search and Data Mining Pub Date : 2017-01-25 DOI: 10.1145/3018661.3018697

F. Kooti, Mihajlo Grbovic, L. Aiello, Eric Bax, Kristina Lerman

{"title":"iPhone's Digital Marketplace: Characterizing the Big Spenders","authors":"F. Kooti, Mihajlo Grbovic, L. Aiello, Eric Bax, Kristina Lerman","doi":"10.1145/3018661.3018697","DOIUrl":"https://doi.org/10.1145/3018661.3018697","url":null,"abstract":"With mobile shopping surging in popularity, people are spending ever more money on digital purchases through their mobile devices and phones. However, few large-scale studies of mobile shopping exist. In this paper we analyze a large data set consisting of more than 776M digital purchases made on Apple mobile devices that include songs, apps, and in-app purchases. We find that 61% of all the spending is on in-app purchases and that the top 1% of users are responsible for 59% of all the spending. These big spenders are more likely to be male and older, and less likely to be from the US. We study how they adopt and abandon individual app, and find that, after an initial phase of increased daily spending, users gradually lose interest: the delay between their purchases increases and the spending decreases with a sharp drop toward the end. Finally, we model the in-app purchasing behavior in multiple steps: 1) we model the time between purchases; 2) we train a classifier to predict whether the user will make a purchase from a new app or continue purchasing from the existing app; and 3) based on the outcome of the previous step, we attempt to predict the exact app, new or existing, from which the next purchase will come. The results yield new insights into spending habits in the mobile digital marketplace.","PeriodicalId":344017,"journal":{"name":"Proceedings of the Tenth ACM International Conference on Web Search and Data Mining","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129900134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

Joint Deep Modeling of Users and Items Using Reviews for Recommendation 使用评论进行推荐的用户和项目联合深度建模

Proceedings of the Tenth ACM International Conference on Web Search and Data Mining Pub Date : 2017-01-17 DOI: 10.1145/3018661.3018665

Lei Zheng, V. Noroozi, Philip S. Yu

引用次数: 837

Deep Memory Networks for Attitude Identification 态度识别的深度记忆网络

Proceedings of the Tenth ACM International Conference on Web Search and Data Mining Pub Date : 2017-01-16 DOI: 10.1145/3018661.3018714

Cheng Li, Xiaoxiao Guo, Q. Mei

{"title":"Deep Memory Networks for Attitude Identification","authors":"Cheng Li, Xiaoxiao Guo, Q. Mei","doi":"10.1145/3018661.3018714","DOIUrl":"https://doi.org/10.1145/3018661.3018714","url":null,"abstract":"We consider the task of identifying attitudes towards a given set of entities from text. Conventionally, this task is decomposed into two separate subtasks: target detection that identifies whether each entity is mentioned in the text, either explicitly or implicitly, and polarity classification that classifies the exact sentiment towards an identified entity (the target) into positive, negative, or neutral. Instead, we show that attitude identification can be solved with an end-to-end machine learning architecture, in which the two subtasks are interleaved by a deep memory network. In this way, signals produced in target detection provide clues for polarity classification, and reversely, the predicted polarity provides feedback to the identification of targets. Moreover, the treatments for the set of targets also influence each other -- the learned representations may share the same semantics for some targets but vary for others. The proposed deep memory network, the AttNet, outperforms methods that do not consider the interactions between the subtasks or those among the targets, including conventional machine learning methods and the state-of-the-art deep learning models.","PeriodicalId":344017,"journal":{"name":"Proceedings of the Tenth ACM International Conference on Web Search and Data Mining","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127496487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 90

Real-Time Bidding by Reinforcement Learning in Display Advertising 基于强化学习的展示广告实时竞价

Proceedings of the Tenth ACM International Conference on Web Search and Data Mining Pub Date : 2017-01-10 DOI: 10.1145/3018661.3018702

Han Cai, Kan Ren, Weinan Zhang, Kleanthis Malialis, Jun Wang, Yong Yu, Defeng Guo

{"title":"Real-Time Bidding by Reinforcement Learning in Display Advertising","authors":"Han Cai, Kan Ren, Weinan Zhang, Kleanthis Malialis, Jun Wang, Yong Yu, Defeng Guo","doi":"10.1145/3018661.3018702","DOIUrl":"https://doi.org/10.1145/3018661.3018702","url":null,"abstract":"The majority of online display ads are served through real-time bidding (RTB) --- each ad display impression is auctioned off in real-time when it is just being generated from a user visit. To place an ad automatically and optimally, it is critical for advertisers to devise a learning algorithm to cleverly bid an ad impression in real-time. Most previous works consider the bid decision as a static optimization problem of either treating the value of each impression independently or setting a bid price to each segment of ad volume. However, the bidding for a given ad campaign would repeatedly happen during its life span before the budget runs out. As such, each bid is strategically correlated by the constrained budget and the overall effectiveness of the campaign (e.g., the rewards from generated clicks), which is only observed after the campaign has completed. Thus, it is of great interest to devise an optimal bidding strategy sequentially so that the campaign budget can be dynamically allocated across all the available impressions on the basis of both the immediate and future rewards. In this paper, we formulate the bid decision process as a reinforcement learning problem, where the state space is represented by the auction information and the campaign's real-time parameters, while an action is the bid price to set. By modeling the state transition via auction competition, we build a Markov Decision Process framework for learning the optimal bidding policy to optimize the advertising performance in the dynamic real-time bidding environment. Furthermore, the scalability problem from the large real-world auction volume and campaign budget is well handled by state value approximation using neural networks. The empirical study on two large-scale real-world datasets and the live A/B testing on a commercial platform have demonstrated the superior performance and high efficiency compared to state-of-the-art methods.","PeriodicalId":344017,"journal":{"name":"Proceedings of the Tenth ACM International Conference on Web Search and Data Mining","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125230745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 206

Managing Risk of Bidding in Display Advertising 展示广告竞价风险管理

Proceedings of the Tenth ACM International Conference on Web Search and Data Mining Pub Date : 2017-01-10 DOI: 10.1145/3018661.3018701

Haifeng Zhang, Weinan Zhang, Yifei Rong, Kan Ren, Wenxin Li, Jun Wang

{"title":"Managing Risk of Bidding in Display Advertising","authors":"Haifeng Zhang, Weinan Zhang, Yifei Rong, Kan Ren, Wenxin Li, Jun Wang","doi":"10.1145/3018661.3018701","DOIUrl":"https://doi.org/10.1145/3018661.3018701","url":null,"abstract":"In this paper, we deal with the uncertainty of bidding for display advertising. Similar to the financial market trading, real-time bidding (RTB) based display advertising employs an auction mechanism to automate the impression level media buying; and running a campaign is no different than an investment of acquiring new customers in return for obtaining additional converted sales. Thus, how to optimally bid on an ad impression to drive the profit and return-on-investment becomes essential. However, the large randomness of the user behaviors and the cost uncertainty caused by the auction competition may result in a significant risk from the campaign performance estimation. In this paper, we explicitly model the uncertainty of user click-through rate estimation and auction competition to capture the risk. We borrow an idea from finance and derive the value at risk for each ad display opportunity. Our formulation results in two risk-aware bidding strategies that penalize risky ad impressions and focus more on the ones with higher expected return and lower risk. The empirical study on real-world data demonstrates the effectiveness of our proposed risk-aware bidding strategies: yielding profit gains of 15.4% in offline experiments and up to 17.5% in an online A/B test on a commercial RTB platform over the widely applied bidding strategies.","PeriodicalId":344017,"journal":{"name":"Proceedings of the Tenth ACM International Conference on Web Search and Data Mining","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131098705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

Raising Graphs From Randomness to Reveal Information Networks 从随机中提取图来揭示信息网络

Proceedings of the Tenth ACM International Conference on Web Search and Data Mining Pub Date : 2017-01-02 DOI: 10.1145/3018661.3018664

Róbert Pálovics, A. Benczúr

{"title":"Raising Graphs From Randomness to Reveal Information Networks","authors":"Róbert Pálovics, A. Benczúr","doi":"10.1145/3018661.3018664","DOIUrl":"https://doi.org/10.1145/3018661.3018664","url":null,"abstract":"We analyze the fine-grained connections between the average degree and the power-law degree distribution exponent in growing information networks. Our starting observation is a power-law degree distribution with a decreasing exponent and increasing average degree as a function of the network size. Our experiments are based on three Twitter at-mention networks and three more from the Koblenz Network Collection. We observe that popular network models cannot explain decreasing power-law degree distribution exponent and increasing average degree at the same time. We propose a model that is the combination of exponential growth, and a power-law developing network, in which new \"homophily\" edges are continuously added to nodes proportional to their current homophily degree. Parameters of the average degree growth and the power-law degree distribution exponent functions depend on the ratio of the network growth exponent parameters. Specifically, we connect the growth of the average degree to the decreasing exponent of the power-law degree distribution. Prior to our work, only one of the two cases were handled. Existing models and even their combinations can only reproduce some of our key new observations in growing information networks.","PeriodicalId":344017,"journal":{"name":"Proceedings of the Tenth ACM International Conference on Web Search and Data Mining","volume":"os-5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127849701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1