Jithin Mathews, P. Mehta, Suryamukhi Kuchibhotla, Dikshant Bisht, Sobhan Babu Chintapalli, S. V. K. V. Rao
{"title":"Regression Analysis towards Estimating Tax Evasion in Goods and Services Tax","authors":"Jithin Mathews, P. Mehta, Suryamukhi Kuchibhotla, Dikshant Bisht, Sobhan Babu Chintapalli, S. V. K. V. Rao","doi":"10.1109/WI.2018.00011","DOIUrl":"https://doi.org/10.1109/WI.2018.00011","url":null,"abstract":"Tax evasion is as old as tax itself. In this paper, we devise a technique to predict the amount of tax-revenue lost by the state due to unscrupulous actions from a particular set of suspicious dealers. For the same, we build a regression model using the tax-return information of genuine business dealers and predict the amount of tax evaded by suspicious business dealers. Dealers are classified as genuine or suspicious by applying Benford's analysis on the different group of dealers formed after running k-medoids clustering algorithm over a set of dealers. In addition to getting an estimate on the loss of tax-revenue, results obtained from this work aid the tax enforcement officers on taking precautionary measures against tax evasion. The dataset used in the work is provided by the commercial tax department of Telangana state, India.","PeriodicalId":405966,"journal":{"name":"2018 IEEE/WIC/ACM International Conference on Web Intelligence (WI)","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121874797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Extraction and Visualization of Occupational Health and Safety Related Information from Open Web","authors":"Tirthankar Dasgupta, Abir Naskar, Rupsa Saha, Lipika Dey","doi":"10.1109/WI.2018.00-56","DOIUrl":"https://doi.org/10.1109/WI.2018.00-56","url":null,"abstract":"In this paper, we have proposed natural language processing and deep learning based techniques for the automatic extraction and curation of occupational health and safety related information from safety-related articles. Such articles typically contain details of the organizations that have been cited for violating the health and safety regulations, safety-related issues and incidents, the location of the incident, and finally details of the penalties incurred. We have done experiments with a collection of 5400 related articles. The end-product of our work is an occupational risk-register that contains details of safety incidents across geographies and time. This register can be further utilized for analytical and reporting purposes. Such information is extremely valuable to industries which see a high occurrence of occupational injuries.","PeriodicalId":405966,"journal":{"name":"2018 IEEE/WIC/ACM International Conference on Web Intelligence (WI)","volume":"380 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127586571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Integrating Game Theory and Data Mining for Dynamic Distribution of Police to Combat Crime","authors":"C. Segovia, K. Smith‐Miles","doi":"10.1109/WI.2018.00016","DOIUrl":"https://doi.org/10.1109/WI.2018.00016","url":null,"abstract":"This paper proposes a framework that provides a strategy for police to allocate resources to tackle crime, by integrating data mining models for dynamic crime prediction with a game theoretical approach to recognize the adversarial nature of the problem. The proposed framework is applied to a real case study from Santiago (Chile), and compared to other strategies involving game theory or data mining alone. The hybrid approach is demonstrated to lead to improved payoffs for the police and reduced payoffs for the criminals. A robustness analysis explores how accuracy of the data mining models affects the outcomes of the game, showing that the proposed approach can absorb significant forecasting errors while still producing superior outcomes for the police.","PeriodicalId":405966,"journal":{"name":"2018 IEEE/WIC/ACM International Conference on Web Intelligence (WI)","volume":"97 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133982726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improving the Classification of Drunk Texting in Tweets Using Semantic Enrichment","authors":"Marcos A. Grzeça, K. Becker, R. Galante","doi":"10.1109/WI.2018.00-90","DOIUrl":"https://doi.org/10.1109/WI.2018.00-90","url":null,"abstract":"Excessive alcohol consumption is a worldwide problem, and social networks such as Twitter can provide valuable data that help understanding factors related to alcoholism, particularly among youngsters. The identification of drunk tweets (i.e. posted under the influence of alcohol) is complex because tweets are short, sparse and written with diverse and internet specific vocabulary, possibly with errors due to alcohol influence. In this paper, we propose an enriching framework that integrates conceptual and semantic features that expand and generalize the vocabulary, providing context to tweet terms. It also handles misspellings and the selection of discriminative features resulting from contextual enrichment. We outperformed the baseline, achieving improvements of 13.79 percentage points in recall, with no significant harm to precision. We illustrate the value of drunk tweets classification by developing an exploratory analysis that reveals drunk tweeters demographics and tweet properties.","PeriodicalId":405966,"journal":{"name":"2018 IEEE/WIC/ACM International Conference on Web Intelligence (WI)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134467002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"[Publisher's information]","authors":"","doi":"10.1109/wi.2018.00134","DOIUrl":"https://doi.org/10.1109/wi.2018.00134","url":null,"abstract":"","PeriodicalId":405966,"journal":{"name":"2018 IEEE/WIC/ACM International Conference on Web Intelligence (WI)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130365443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tamara Álvarez-López, P. Bellot, Milagros Fernández Gavilanes, E. Costa-Montenegro
{"title":"From Genre Classification to Aspect Extraction: New Annotation Schemas for Book Reviews","authors":"Tamara Álvarez-López, P. Bellot, Milagros Fernández Gavilanes, E. Costa-Montenegro","doi":"10.1109/WI.2018.00-57","DOIUrl":"https://doi.org/10.1109/WI.2018.00-57","url":null,"abstract":"In this paper, new schemas for feature categorization in different kinds of reviews, in the domain of books, are presented, so aspect extraction techniques could be later applied. We deal here with two types of reviews: formal reviews about scholarly books, written by experts, and informal ones about fiction books, written by readers which are not necessarily highly qualified. Our final goal is to extract the most relevant aspects or features to which any opinion is expressed in these reviews, along with the sentiment associated, for later integrating it to book recommender systems, improving the quality of the recommendations. Throughout this paper, the need for different annotation schemas is proved, by developing a new review classification system, as well as making an analysis at lexical and semantic levels on both kinds of reviews, for finally concluding with the presentation of the new categorization schemas.","PeriodicalId":405966,"journal":{"name":"2018 IEEE/WIC/ACM International Conference on Web Intelligence (WI)","volume":"272 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115664393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
N. Seydoux, K. Drira, Nathalie Hernandez, T. Monteil
{"title":"A Distributed Scalable Approach for Rule Processing: Computing in the Fog for the SWoT","authors":"N. Seydoux, K. Drira, Nathalie Hernandez, T. Monteil","doi":"10.1109/WI.2018.0-100","DOIUrl":"https://doi.org/10.1109/WI.2018.0-100","url":null,"abstract":"The development of the Semantic Web of Things (SWoT) is challenged by the nature of IoT deployment architectures, where constrained devices collect data processed remotely by powerful Cloud servers. Such a deployment pattern introduces bottlenecks constituting a hurdle for scalability, and increases response time. This hinders the development of a number of critical and time-sensitive applications. Enabling the deployment of the Semantic Web stack closer to the constrained devices of the IoT may foster the development of time-sensitive interoperable applications, while reducing forwarding the user data to remote third party Cloud servers. The approach we develop in this paper is a contribution towards this direction, and aims to enable rule-based reasoning closer to sensors producing IoT data. For this purpose, we define a distributed scalable semantic processing algorithm by dynamically propagating deduction rules on Fog nodes. Our goal is to shorten the time needed to deliver high level information deduced from the collected data. This approach is evaluated on a smart building use case where both distribution and scalability have been considered.","PeriodicalId":405966,"journal":{"name":"2018 IEEE/WIC/ACM International Conference on Web Intelligence (WI)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122802570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Applications of Stochastic Models in Network Embedding: A Survey","authors":"Minglong Lei, Yong Shi, Lingfeng Niu","doi":"10.1109/WI.2018.00-23","DOIUrl":"https://doi.org/10.1109/WI.2018.00-23","url":null,"abstract":"Network embedding is a promising topic that maps the vertices to the latent space while keeps the structural proximity in the original space. The network embedding task is difficult since the network vertices have no specific time or space orders. Models that used to extract information from images and texts with regular space or time structures can not be directly applied in network heading. The key feature of network embedding methods should be further exploited. Previous network embedding reviews mainly focus on the models and algorithms used in different methods. In this survey, we review the network embedding works in the stochastic perspective either in data side or model side. Roughly, the network embedding methods fall into three main categories: matrix based methods, random walk based methods and aggregated based methods. We focus on the applications of stochastic models in solving the challenges of network embedding in data processing and modeling following the line of the three categories.","PeriodicalId":405966,"journal":{"name":"2018 IEEE/WIC/ACM International Conference on Web Intelligence (WI)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125557202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Adaptive Dirichlet Multinomial Mixture Model for Short Text Streaming Clustering","authors":"Ruting Duan, Chunping Li","doi":"10.1109/WI.2018.0-108","DOIUrl":"https://doi.org/10.1109/WI.2018.0-108","url":null,"abstract":"In this paper, we propose an adaptive Dirichlet Multinomial Mixture model for short text clustering along the time slices. A hyperparameters adjusting algorithm is utilized to capture the temporal dynamics automatically, and a collapsed Gibbs sampling algorithm for the extended Dirichlet Multinomial Mixture (DMM) model (e-GSDMM algorithm), is proposed to infer the changes of topic and word distributions along the time slices. Our extensive experiments over three different datasets show that the proposed model is efficient and performs better than the existing GSDMM approach for short text clustering on the streaming data.","PeriodicalId":405966,"journal":{"name":"2018 IEEE/WIC/ACM International Conference on Web Intelligence (WI)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126404929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Python Library for Memory Augmented Neural Networks","authors":"P. Debie, Weiwei Wang, S. Bromuri","doi":"10.1109/WI.2018.00-47","DOIUrl":"https://doi.org/10.1109/WI.2018.00-47","url":null,"abstract":"A Memory Augmented Neural Network (MANN) is an extension to an RNN which enables it to save large amount of data to a memory object which is dimensionally separated from the Neural Network. This paper introduces a new Python library based on TensorFlow to define MANNs as Python objects. In addition to the standard implementation of the MANN, this contribution proposes a modification to the head calculation which decreases the noise while searching through the memory. The paper presents two experiments concerning the proposed implementation.FirsttheMANNistrainedtobeabletostoreand reproduce a piece of data (a task with linear data connectivity), and second the MANN is trained to find a Minimum Vertex Cover of a Graph (MVCG). This task was chosen because the connectivity of the vertex in the graph, that would pose a challenge to the MANN. The tests show that he MANN has no problem learning the first task, and that it is able to find an optimal solution for the MVCG problem in most cases.","PeriodicalId":405966,"journal":{"name":"2018 IEEE/WIC/ACM International Conference on Web Intelligence (WI)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121085148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}