Vikrant Shimpi, Aakash Patel, Siddartha Kshirsagar, M. Natu, Monika Bhave
{"title":"Resolving the message riddle: A multi-pronged approach to infer trouble tickets","authors":"Vikrant Shimpi, Aakash Patel, Siddartha Kshirsagar, M. Natu, Monika Bhave","doi":"10.1109/DSAA53316.2021.9564152","DOIUrl":"https://doi.org/10.1109/DSAA53316.2021.9564152","url":null,"abstract":"Today's IT systems heavily rely on IT support for smoother and faster operations. Any application or infrastructure issue is reported by tools or end-users in the form of a trouble ticket. The information about actual issue is hidden inside these ticket descriptions and is provided in different ways. In this paper, we address the problem of extracting the issues from these ticket descriptions. We break the problem into different sub-problems each presenting different challenges and hence requiring different solutions! We present our experience of applying theory into practice by presenting a real-world case-study.","PeriodicalId":129612,"journal":{"name":"2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126746951","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Saeede Anbaee Farimani, M. V. Jahan, A. M. Fard, Gholamreza Haffari
{"title":"Leveraging Latent Economic Concepts and Sentiments in the News for Market Prediction","authors":"Saeede Anbaee Farimani, M. V. Jahan, A. M. Fard, Gholamreza Haffari","doi":"10.1109/DSAA53316.2021.9564122","DOIUrl":"https://doi.org/10.1109/DSAA53316.2021.9564122","url":null,"abstract":"Most of the existing news-based market prediction techniques disregard conceptual and emotional relations in the news stream. In this work, we consider the conceptual relationship between news documents using contextualized latent concept modeling as well as leveraging news sentiment and technical indicators. We present our approach as an open-source RESTFul API. We build a corpus of financial news related to currency pairs in the Foreign Exchange and Cryptocurrencies markets. Next, we apply BERT-based embedding to generate word vectors, cluster the vectors to create latent economic concepts, and propose a document representation based on the distribution of words on these concepts as well as news sentiment. We use a recurrent convolutional neural network to jointly use BERT-based text representation and technical indicators embedding for market time series prediction. We further augment our model with technical indicators using another recurrent layer. The experimental results show the superiority of our method compared to the baselines. Our MarketNews dataset, news crawler, and MarketPredict APIs are available for public use.","PeriodicalId":129612,"journal":{"name":"2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125258613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"motif2vec: Semantic-aware Representation Learning for Wearables' Time Series Data","authors":"Suwen Lin, Xian Wu, N. Chawla","doi":"10.1109/DSAA53316.2021.9564120","DOIUrl":"https://doi.org/10.1109/DSAA53316.2021.9564120","url":null,"abstract":"The proliferation of wearable sensors allows for the continuous collection of temporal characterization of an individual's physical activity and physiological data. This is enabling an unprecedented opportunity to delve into a deeper analysis of the underlying patterns of such temporal data and to infer attributes associated with health, behaviors, and well-being. However, there remain several challenges to fully discover both structural and temporal patterns (motifs) in these data streams and to leverage the semantic relationship among these motifs. These include: i) the temporal data of variable length and high resolution leads to the motifs of various sizes; ii) periodic occurrences and hierarchical overlaps of these motifs further challenge the modeling of their complex structural and semantic relations. We propose a semantic-aware unsupervised representation learning model, motif2vec, to learn the latent representation of time series data collected from wearable sensors. The motif2vec consists of three major components: 1) transforming the time series into a set of variable-length motif sequences; 2) formalizing random walks to construct the neighborhood of motifs and thus to extract structural and semantic relationship among motifs; 3) learning time series latent features to capture the motif neighborhood structure with a skip-gram model. Experiments on two real-world datasets, derived from two different wearables and population groups, show motif2vec outperforms six state-of-the-art benchmarks on various tasks.","PeriodicalId":129612,"journal":{"name":"2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA)","volume":"110 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123652579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Overview on Supervised Semi-structured Data Classification","authors":"Lijun Zhang, Ning Li, Zhanhuai Li","doi":"10.1109/DSAA53316.2021.9564205","DOIUrl":"https://doi.org/10.1109/DSAA53316.2021.9564205","url":null,"abstract":"Many collaboratively building resources, such as Wikipedia, Weibo and Quora, exist in the form of semi-structured data. The semi-structured data has been widely used in areas such as data integration, data distribution, data storage, data management, information retrieval and knowledge management. For large volumes of semi-structured data on the Web, semi-structured data classification technique can group them into different categories by their structure and/or content information. Supervised semi-structured data classification plays an important role in many applications. This paper provides an overview of the literature in the area of supervised semi-structured data classification. A general framework for semi-structured data classification is presented, which is mainly composed of two steps: feature extraction and model building. Several different representation models of semi-structured data are discussed, mainly including rooted labeled tree model, feature vector space model and feature set model. A large selection of semi-structured data classification approaches are reviewed in detail from two aspects: based on structure only and based on both structure and content. Finally, several future research directions for semistructured data classification are presented.","PeriodicalId":129612,"journal":{"name":"2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129634621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Huijun Wu, Xiaoyao Qian, Hulya Pamukcu Crowell, Tushar Singh, Aleks Shulman, Prashil Bhimani, Abhishek Maloo, Chunxu Tang, Yao Li, Lu Zhang, Chris Ulherr
{"title":"Migrate On-Premises Real-Time Data Analytics Jobs Into the Cloud","authors":"Huijun Wu, Xiaoyao Qian, Hulya Pamukcu Crowell, Tushar Singh, Aleks Shulman, Prashil Bhimani, Abhishek Maloo, Chunxu Tang, Yao Li, Lu Zhang, Chris Ulherr","doi":"10.1109/DSAA53316.2021.9564177","DOIUrl":"https://doi.org/10.1109/DSAA53316.2021.9564177","url":null,"abstract":"Twitter's data platform team is serving a large number of real-time analytics jobs, powering a wide range of data science use cases, from aggregations over time to spam detection. These analytics jobs constitute a crucial step in Twitter's data science infrastructure. As a key part of Twitter's “partly cloudy” strategy, real-time data analytics jobs are being migrated from on-premises into the cloud. We would like to share our migration approach and findings in this paper. The jobs to be migrated vary but follow common patterns, including the “read-modify-write store” and “lambda architecture” patterns. Both patterns can be migrated to the Beam data model in general ways. Besides job patterns, the job IOs are handled by replicating or proxying between on-premises and the cloud. Tests are applied in two phases through monitoring metrics and control tests. A case study demonstrates the business impact of migration. Finally, we discuss lessons learned.","PeriodicalId":129612,"journal":{"name":"2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130185791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Source detection on networks using spatial temporal graph convolutional networks","authors":"Hao Sha, M. Hasan, G. Mohler","doi":"10.1109/DSAA53316.2021.9564188","DOIUrl":"https://doi.org/10.1109/DSAA53316.2021.9564188","url":null,"abstract":"Detecting the source of an outbreak cluster during a pandemic like COVID-19 can provide insights into the transmission process, associated risk factors, and help contain the spread. In this work we study the problem of source detection from multiple snapshots of spreading on an arbitrary network structure. We use a spatial temporal graph convolutional network based model (SD-STGCN) to produce a source probability distribution, by fusing information from temporal and topological spaces. We perform extensive experiments using popular compartmental simulation models over synthetic networks and empirical contact networks. We also demonstrate the applicability of our approach with real COVID-19 case data.","PeriodicalId":129612,"journal":{"name":"2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA)","volume":"2150 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130030117","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CAUSALYSIS: Causal Machine Learning for Real-Estate Investment Decisions","authors":"Rodrigo Rivera-Castro, Evgeny Burnaev","doi":"10.1109/DSAA53316.2021.9564210","DOIUrl":"https://doi.org/10.1109/DSAA53316.2021.9564210","url":null,"abstract":"As a company, proper financial planning is challenging. The knowledge is specific and competent experts are scarce. Poor financial management has a high cost. It results in penalty fees, missed opportunities, and return on investment. CAUSALYSIS empowers small and medium businesses with financial scenario planning powered by Causal Machine Learning. We describe a use case for causal machine learning on the ROI of property rentals.","PeriodicalId":129612,"journal":{"name":"2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131562176","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Anirban Mondal, Raghav Mittal, Vrinda Khandelwal, Parul Chaudhary, P. Reddy
{"title":"PEAR: A Product Expiry-Aware and Revenue-Conscious Itemset Placement Scheme","authors":"Anirban Mondal, Raghav Mittal, Vrinda Khandelwal, Parul Chaudhary, P. Reddy","doi":"10.1109/DSAA53316.2021.9564189","DOIUrl":"https://doi.org/10.1109/DSAA53316.2021.9564189","url":null,"abstract":"Placement of items on the shelf space of retail stores significantly impacts the revenue of the retailer. Since customers typically tend to buy sets of items (i.e., itemsets) together, several research efforts have been undertaken towards facilitating itemset placement in retail stores for improving retailer revenue. However, they fail to consider that the time-period of expiry can vary across items i.e., some items expire sooner than others. This leads to loss of opportunity towards improving retailer revenue. Hence, we propose PEAR, which is a Product Expiry-Aware and Revenue-conscious itemset placement scheme for improving retailer revenue. Our key contributions are three-fold. First, we introduce the problem of addressing retail itemset placement when the items can be associated with different time-periods of expiry. Second, we propose the expiry-aware PEAR scheme for efficiently identifying and placing high-revenue itemsets for improving retailer revenue. Third, we conduct a performance study with two real datasets to demonstrate that PEAR is indeed effective in improving retailer revenue w.r.t. a reference scheme.","PeriodicalId":129612,"journal":{"name":"2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA)","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134295906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rijad Sarić, Junchao Chen, M. Krstic, Edhem Čustović, G. Panic, Jasmin Kevric, D. Jokić
{"title":"Classification of Space Particle Events using Supervised Machine Learning Algorithms","authors":"Rijad Sarić, Junchao Chen, M. Krstic, Edhem Čustović, G. Panic, Jasmin Kevric, D. Jokić","doi":"10.1109/DSAA53316.2021.9564114","DOIUrl":"https://doi.org/10.1109/DSAA53316.2021.9564114","url":null,"abstract":"Solar Particle Events (SPEs) generate cosmic radiation of different magnitude in a time span of several hours or even days. This contributes to an increased probability of higher magnitude Single-Event Upsets (SEUs) occurrence in space applications. It is critical to establish early detection of SEU rate or Soft Error Rate (SRE) changes to enable timely radiation hardening measures. This research paper focuses on the high-accuracy detection of SPEs using the manually collected space data. Additionally, the prediction of SRE increase or decrease was established with the seven widely used supervised machine learning algorithms. Excellent performance of 97.82%, including a high F1-score, was achieved during the presence of SPE using $k$-Nearest Neighbor algorithms.","PeriodicalId":129612,"journal":{"name":"2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA)","volume":"28 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113999673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Viet Yen Nguyen, Andréa B. Duque, Jules Belveze, Lukas D. Baker, Astrid Harsaae, Pedro Pessoa, Inês Oliveira
{"title":"Atlastic Reputation AI: Four Years of Advancing and Applying a SOTA NLP Classifier","authors":"Viet Yen Nguyen, Andréa B. Duque, Jules Belveze, Lukas D. Baker, Astrid Harsaae, Pedro Pessoa, Inês Oliveira","doi":"10.1109/DSAA53316.2021.9564190","DOIUrl":"https://doi.org/10.1109/DSAA53316.2021.9564190","url":null,"abstract":"We report on the main challenges we had to overcome while bootstrapping, commercializing and scaling our automated sentiment analysis solution in the media intelligence industry. Over the span of four years, our solution evolved from a run-of-the-mill component to a bespoke in-house bred flagship feature that, to our knowledge, trumps those of our industry peers. From all the learnings to date, the most significant one has been shifting our data labeling efforts from a third party to a completely in-house controlled process — a move that brought us noticeable improvements in inferencing quality.","PeriodicalId":129612,"journal":{"name":"2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA)","volume":"193 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121080501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}