{"title":"FIN10K: A Web-based Information System for Financial Report Analysis and Visualization","authors":"Yu-Wen Liu, Liang‐Chih Liu, Chuan-Ju Wang, Ming-Feng Tsai","doi":"10.1145/2983323.2983328","DOIUrl":"https://doi.org/10.1145/2983323.2983328","url":null,"abstract":"In this demonstration, we present FIN10K, a web-based information system that facilitates the analysis of textual information in financial reports. The proposed system has three main components: (1) a 10-K Corpus, including an inverted index of financial reports on Form 10-K, several numerical finance measures, and pre-trained word embeddings; (2) an information retrieval system; and (3) two data visualizations of the analyzed results. The system can be of great help in revealing valuable insights within large amounts of textual information. The system is now online available at http: //clip.csie.org/10K/.","PeriodicalId":250808,"journal":{"name":"Proceedings of the 25th ACM International on Conference on Information and Knowledge Management","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117324909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Non-Parametric Topic Model for Short Texts Incorporating Word Coherence Knowledge","authors":"Yuhao Zhang, W. Mao, D. Zeng","doi":"10.1145/2983323.2983898","DOIUrl":"https://doi.org/10.1145/2983323.2983898","url":null,"abstract":"Mining topics in short texts (e.g. tweets, instant messages) can help people grasp essential information and understand key contents, and is widely used in many applications related to social media and text analysis. The sparsity and noise of short texts often restrict the performance of traditional topic models like LDA. Recently proposed Biterm Topic Model (BTM) which models word co-occurrence patterns directly, is revealed effective for topic detection in short texts. However, BTM has two main drawbacks. It needs to manually specify topic number, which is difficult to accurately determine when facing new corpora. Besides, BTM assumes that two words in same term should belong to the same topic, which is often too strong as it does not differentiate two types of words (i.e. general words and topical words). To tackle these problems, in this paper, we propose a non-parametric topic model npCTM with the above distinction. Our model incorporates the Chinese restaurant process (CRP) into the BTM model to determine topic number automatically. Our model also distinguishes general words from topical words by jointly considering the distribution of these two word types for each word as well as word coherence information as prior knowledge. We carry out experimental studies on real-world twitter dataset. The results demonstrate the effectiveness of our method to discover coherent topics compared with the baseline methods.","PeriodicalId":250808,"journal":{"name":"Proceedings of the 25th ACM International on Conference on Information and Knowledge Management","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128792205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fusang Zhang, Beihong Jin, Tingjian Ge, Qiang Ji, Yanling Cui
{"title":"Who are My Familiar Strangers?: Revealing Hidden Friend Relations and Common Interests from Smart Card Data","authors":"Fusang Zhang, Beihong Jin, Tingjian Ge, Qiang Ji, Yanling Cui","doi":"10.1145/2983323.2983804","DOIUrl":"https://doi.org/10.1145/2983323.2983804","url":null,"abstract":"The newly emerging location-based social networks (LBSN) such as Tinder and Momo extends social interaction from friends to strangers, providing novel experiences of making new friends. Familiar strangers refer to the strangers who meet frequently in daily life and may share common interests; thus they may be good candidates for friend recommendation. In this paper, we study the problem of discovering familiar strangers, specifically, public transportation trip companions, and their common interests. We collect 5.7 million transaction records of smart cards from about 3.02 million people in the city of Beijing, China. We first analyze this dataset and reveal the temporal and spatial characteristics of passenger encounter behaviors. Then we propose a stability metric to measure hidden friend relations. This metric facilitates us to employ community detection techniques to capture the communities of trip companions. Further, we infer common interests of each community using a topic model, i.e., LDA4HFC (Latent Dirichlet Allocation for Hidden Friend Communities) model. Such topics for communities help to understand how hidden friend clusters are formed. We evaluate our method using large-scale and real-world datasets, consisting of two-week smart card records and 901,855 Point of Interests (POIs) in Beijing. The results show that our method outperforms three baseline methods with higher recommendation accuracy. Moreover, our case study demonstrates that the discovered topics interpret the communities very well.","PeriodicalId":250808,"journal":{"name":"Proceedings of the 25th ACM International on Conference on Information and Knowledge Management","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126915598","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
W. Zhang, Mingkui Tan, Quan Z. Sheng, Lina Yao, Javen Qinfeng Shi
{"title":"Efficient Orthogonal Non-negative Matrix Factorization over Stiefel Manifold","authors":"W. Zhang, Mingkui Tan, Quan Z. Sheng, Lina Yao, Javen Qinfeng Shi","doi":"10.1145/2983323.2983761","DOIUrl":"https://doi.org/10.1145/2983323.2983761","url":null,"abstract":"Orthogonal Non-negative Matrix Factorization (ONMF) approximates a data matrix X by the product of two lower dimensional factor matrices: X -- UVT, with one of them orthogonal. ONMF has been widely applied for clustering, but it often suffers from high computational cost due to the orthogonality constraint. In this paper, we propose a method, called Nonlinear Riemannian Conjugate Gradient ONMF (NRCG-ONMF), which updates U and V alternatively and preserves the orthogonality of U while achieving fast convergence speed. Specifically, in order to update U, we develop a Nonlinear Riemannian Conjugate Gradient (NRCG) method on the Stiefel manifold using Barzilai-Borwein (BB) step size. For updating V, we use a closed-form solution under non-negativity constraint. Extensive experiments on both synthetic and real-world data sets show consistent superiority of our method over other approaches in terms of orthogonality preservation, convergence speed and clustering performance.","PeriodicalId":250808,"journal":{"name":"Proceedings of the 25th ACM International on Conference on Information and Knowledge Management","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128990204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Duer: Intelligent Personal Assistant","authors":"Haifeng Wang","doi":"10.1145/2983323.2983372","DOIUrl":"https://doi.org/10.1145/2983323.2983372","url":null,"abstract":"Intelligent personal assistant is widely recognized as a more natural and efficient way of human-computer interaction, which has attracted extensive interests from both academia and industry. In this talk, I describe Duer, Baidu's intelligent personal assistant. In particular, I would like to focus on the following three features. Firstly, Duer comprehensively understands people's requirements via multiple channels, including not only explicit utterances, but also user models and rich contexts. Duer's user models are learnt from users' interaction history, and the rich contexts consist of temporal and geographical information, as well as the foregoing dialogues. Secondly, Duer meets diverse requirements with a range of instruments, such as chatting, information provision, reminder service, etc. These instruments are implemented based on mining the big data of web pages, applications, and user logs, which are then seamlessly integrated in the dialogue flow. Thirdly, Duer features multi-modal interaction, which allows people to interact with it by means of texts, speech, and images. We believe the above features will enable Duer to become a better and distinguished intelligent assistant for each of you.","PeriodicalId":250808,"journal":{"name":"Proceedings of the 25th ACM International on Conference on Information and Knowledge Management","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130657075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chunkai Wang, Xiaofeng Meng, Qi Guo, Zujian Weng, Chen Yang
{"title":"OrientStream: A Framework for Dynamic Resource Allocation in Distributed Data Stream Management Systems","authors":"Chunkai Wang, Xiaofeng Meng, Qi Guo, Zujian Weng, Chen Yang","doi":"10.1145/2983323.2983681","DOIUrl":"https://doi.org/10.1145/2983323.2983681","url":null,"abstract":"Distributed data stream management systems (DDSMS) are usually composed of upper layer relational query systems (RQS) and lower layer stream processing systems (SPS). When users submit new queries to RQS, a query planner needs to be converted into a directed acyclic graph (DAG) consisting of tasks which are running on SPS. Based on different query requests and data stream properties, SPS need to configure different deployments strategies. However, how to dynamically predict deployment configurations of SPS to ensure the processing throughput and low resource usage is a great challenge. This article presents OrientStream, a framework for dynamic resource allocation in DDSMS using incremental machine learning techniques. By introducing the data-level, query plan-level, operator-level and cluster-level's four-level feature extraction mechanism, we firstly use the different query workloads as training sets to predict the resource usage of DDSMS and then select the optimal resource configuration from candidate settings based on the current query requests and stream properties. Finally, we validate our approach on the open source SPS--Storm. Experiments show that OrientStream can reduce CPU usage of 8%-15% and memory usage of 38%-48% respectively.","PeriodicalId":250808,"journal":{"name":"Proceedings of the 25th ACM International on Conference on Information and Knowledge Management","volume":"92 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116177356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Min-Hee Jang, C. Faloutsos, Sang-Wook Kim, U. Kang, Jiwoon Ha
{"title":"PIN-TRUST: Fast Trust Propagation Exploiting Positive, Implicit, and Negative Information","authors":"Min-Hee Jang, C. Faloutsos, Sang-Wook Kim, U. Kang, Jiwoon Ha","doi":"10.1145/2983323.2983753","DOIUrl":"https://doi.org/10.1145/2983323.2983753","url":null,"abstract":"Given \"who-trusts/distrusts-whom\" information, how can we propagate the trust and distrust? With the appearance of fraudsters in social network sites, the importance of trust prediction has increased. Most such methods use only explicit and implicit trust information (e.g., if Smith likes several of Johnson's reviews, then Smith implicitly trusts Johnson), but they do not consider distrust. In this paper, we propose PIN-TRUST, a novel method to handle all three types of interaction information: explicit trust, implicit trust, and explicit distrust. The novelties of our method are the following: (a) it is carefully designed, to take into account positive, implicit, and negative information, (b) it is scalable (i.e., linear on the input size), (c) most importantly, it is effective and accurate. Our extensive experiments with a real dataset, Epinions.com data, of 100K nodes and 1M edges, confirm that PIN-TRUST is scalable and outperforms existing methods in terms of prediction accuracy, achieving up to 50.4 percentage relative improvement.","PeriodicalId":250808,"journal":{"name":"Proceedings of the 25th ACM International on Conference on Information and Knowledge Management","volume":"127 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116324273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ling Luo, Bin Li, I. Koprinska, S. Berkovsky, Fang Chen
{"title":"Discovering Temporal Purchase Patterns with Different Responses to Promotions","authors":"Ling Luo, Bin Li, I. Koprinska, S. Berkovsky, Fang Chen","doi":"10.1145/2983323.2983665","DOIUrl":"https://doi.org/10.1145/2983323.2983665","url":null,"abstract":"The supermarkets often use sales promotions to attract customers and create brand loyalty. They would often like to know if their promotions are effective for various customers, so that better timing and more suitable rate can be planned in the future. Given a transaction data set collected by an Australian national supermarket chain, in this paper we conduct a case study aimed at discovering customers' long-term purchase patterns, which may be induced by preference changes, as well as short-term purchase patterns, which may be induced by promotions. Since purchase events of individual customers may be too sparse to model, we propose to discover a number of latent purchase patterns from the data. The latent purchase patterns are modeled via a mixture of non-homogeneous Poisson processes where each Poisson intensity function is composed by long-term and short-term components. Through the case study, 1) we validate that our model can accurately estimate the occurrences of purchase events; 2) we discover easy-to-interpret long-term gradual changes and short-term periodic changes in different customer groups; 3) we identify the customers who are receptive to promotions through the correlation between behavior patterns and the promotions, which is particularly worthwhile for target marketing.","PeriodicalId":250808,"journal":{"name":"Proceedings of the 25th ACM International on Conference on Information and Knowledge Management","volume":"95 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121493448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jelena Stojanovic, Djordje Gligorijevic, Z. Obradovic
{"title":"Modeling Customer Engagement from Partial Observations","authors":"Jelena Stojanovic, Djordje Gligorijevic, Z. Obradovic","doi":"10.1145/2983323.2983854","DOIUrl":"https://doi.org/10.1145/2983323.2983854","url":null,"abstract":"It is of high interest for a company to identify customers expected to bring the largest profit in the upcoming period. Knowing as much as possible about each customer is crucial for such predictions. However, their demographic data, preferences, and other information that might be useful for building loyalty programs is often missing. Additionally, modeling relations among different customers as a network can be beneficial for predictions at an individual level, as similar customers tend to have similar purchasing patterns. We address this problem by proposing a robust framework for structured regression on deficient data in evolving networks with a supervised representation learning based on neural features embedding. The new method is compared to several unstructured and structured alternatives for predicting customer behavior (e.g. purchasing frequency and customer ticket) on user networks generated from customer databases of two companies from different industries. The obtained results show 4% to 130% improvement in accuracy over alternatives when all customer information is known. Additionally, the robustness of our method is demonstrated when up to 80% of demographic information was missing where it was up to several folds more accurate as compared to alternatives that are either ignoring cases with missing values or learn their feature representation in an unsupervised manner.","PeriodicalId":250808,"journal":{"name":"Proceedings of the 25th ACM International on Conference on Information and Knowledge Management","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114852324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiawei Zhang, Philip S. Yu, Yuanhua Lv, Qianyi Zhan
{"title":"Information Diffusion at Workplace","authors":"Jiawei Zhang, Philip S. Yu, Yuanhua Lv, Qianyi Zhan","doi":"10.1145/2983323.2983848","DOIUrl":"https://doi.org/10.1145/2983323.2983848","url":null,"abstract":"People nowadays need to spend a large amount of time on their work everyday and workplace has become an important social occasion for effective communication and information exchange among employees. Besides traditional online contacts (e.g., face-to-face meetings and telephone calls), to facilitate the communication and cooperation among employees, a new type of online social networks has been launched inside the firewalls of many companies, which are named as the \"enterprise social networks\" (ESNs). In this paper, we want to study the information diffusion among employees at workplace via both online ESNs and online contacts. This is formally defined as the IDE (Information Diffusion in Enterprise) problem. Several challenges need to be addressed in solving the IDE problem: (1) diffusion channel extraction from online ESN and online contacts; (2) effective aggregation of the information delivered via different diffusion channels; and (3) communication channel weighting and selection. A novel information diffusion model, Muse (Multi-source Multi-channel Multi-topic diffUsion SElection), is introduced in this paper to resolve these challenges. Extensive experiments conducted on real-world ESN and organizational chart dataset demonstrate the outstanding performance of Muse in addressing the IDE problem.","PeriodicalId":250808,"journal":{"name":"Proceedings of the 25th ACM International on Conference on Information and Knowledge Management","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124131569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}