Yihuang Kang, I-Ling Cheng, W. Mao, Bowen Kuo, Pei-Ju Lee
{"title":"Towards Interpretable Deep Extreme Multi-Label Learning","authors":"Yihuang Kang, I-Ling Cheng, W. Mao, Bowen Kuo, Pei-Ju Lee","doi":"10.1109/IRI.2019.00024","DOIUrl":"https://doi.org/10.1109/IRI.2019.00024","url":null,"abstract":"Many Machine Learning algorithms, such as deep neural networks, have long been criticized for being \"black-boxes\"-a kind of models unable to provide how it arrive at a decision without further efforts to interpret. This problem has raised concerns on model applications' trust, safety, nondiscrimination, and other ethical issues. In this paper, we discuss the machine learning interpretability of a real-world application, eXtreme Multi-label Learning (XML), which involves learning models from annotated data with many pre-defined labels. We propose a two-step XML approach that combines deep non-negative autoencoder with other multi-label classifiers to tackle different data applications with a large number of labels. Our experimental result shows that the proposed approach is able to cope with many-label problems as well as to provide interpretable label hierarchies and dependencies that helps us understand how the model recognizes the existences of objects in an image.","PeriodicalId":295028,"journal":{"name":"2019 IEEE 20th International Conference on Information Reuse and Integration for Data Science (IRI)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121262622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"IRI 2019 Panel I","authors":"","doi":"10.1109/iri.2019.00013","DOIUrl":"https://doi.org/10.1109/iri.2019.00013","url":null,"abstract":"The goal for this panel is to propose a schema for the advancement of intelligent systems through the use of symbolic and/or neural AI and data science. Specifically, discussants will explore how conventional numerical analysis and other techniques can leverage symbolic and/or neural AI to yield more capable intelligent systems. This approach could yield significant improvements in such domains as Meteorological and Oceanographic (METOC) signal processing, logistics, scheduling, pattern recognition, optimization, ergonomics, explanation, causal inference and prediction, system diagnostics, education and training, and a plethora of additional applications. Self-reference is inherent to autonomous thought; and, this appears to be indistinguishable from consciousness from a computability perspective. Thus, the question arises, can we program more efficient ways to support the programming (problem-solving) process? The panel will explore these and other advanced topics related to information reuse and integration and of fundamental importance to data science. Causal inference and prediction are of particular interest to the discussants and for all who are working with AI/ML. In fact, LeCun, of deep learning fame, has stated that prediction is the central problem defining all of AI. Getting this right could have a tremendous impact in a lot of important operational areas: Weather. In weather prediction (METOC), a patented software solution replaced the use of partial differential equations (PDEs) with geographically-dispersed sensor registries for atmospheric modeling. These sensors feed their data to local and centralized computers that learn to predict weather based on mapped previous experiences. AI is needed to map or generalize current data to recorded cases and make viable micro-climatic predictions, which surpass those of PDEs and their associated error marches when solved numerically using triangular elements (Gallerkin methods). Radar and Sonar. Signal processing is used in radar and sonar to actively identify the transmitter or alternatively make a passive identification of friend or foe (IFF). Here, waveforms can be fitted – not by Newton backward/forward differencing and/or Fourier Series, but rather through the synthesis of Type II fuzzy functions – invented by the late Lotfi Zadeh, the father of fuzzy logic and a regular plenary presenter up until the time of his passing. This expands the effectiveness of radar and sonar applications by reducing the number of rules (including mathematical theorems), that would otherwise be needed. Logistics. Most logistic problems require the representation and design of heuristics to solve otherwise intractable problems (e.g., the TSP). The Navy has many such problems involving time-critical shipments to multiple locations in minimal time and at minimal cost. Air Operations. Similarly, aircraft carriers need better algorithms to schedule their takeoff and landing operations in rolling seas, in inclement","PeriodicalId":295028,"journal":{"name":"2019 IEEE 20th International Conference on Information Reuse and Integration for Data Science (IRI)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127532903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Billion-Scale Matrix Compression and Multiplication with Implications in Data Mining","authors":"M. Nelson, S. Radhakrishnan, C. Sekharan","doi":"10.1109/IRI.2019.00067","DOIUrl":"https://doi.org/10.1109/IRI.2019.00067","url":null,"abstract":"Billion-scale Boolean matrices in the era of big data occupy storage that is measured in 100's of petabytes to zetabytes. The fundamental operation on these matrices for data mining involves multiplication which suffers a significant slow-down as the required data cannot fit in most main memories. In this paper, we propose new algorithms to perform Matrix-Vector and Matrix-Matrix operations directly on compressed Boolean matrices using innovative techniques extended from our previous work on compression. Our extension involves the development of a row-by-row differential compression technique which reduces the overall space requirement and the number of matrix operations. We have provided extensive empirical results on billion-scale Boolean matrices that are Boolean adjacency matrices of web graphs. Our work has significant implications on key problems such as page-ranking and itemset mining that use matrix multiplication.","PeriodicalId":295028,"journal":{"name":"2019 IEEE 20th International Conference on Information Reuse and Integration for Data Science (IRI)","volume":"IA-20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126561263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Software Quality Prediction: An Investigation Based on Machine Learning","authors":"S. Reddivari, Jayalakshmi Raman","doi":"10.1109/IRI.2019.00030","DOIUrl":"https://doi.org/10.1109/IRI.2019.00030","url":null,"abstract":"Irrespective of the type of software system that is being developed, producing and delivering high-quality software within the specified time and budget is crucial for many software businesses. The software process model has a major impact on the quality of the overall system - the longer a defect remains in the system undetected, the harder it becomes to fix. However, predicting the quality of the software in the early phases would immensely assist developers in software maintenance and quality assurance activities, and to allocate effort and resources more efficiently. This paper presents an evaluation of eight machine learning techniques in the context of reliability and maintainability. Reliability is investigated as the number of defects in a system and the maintainability is analyzed as the number of changes made in the system. Software metrics are direct reflections of various characteristics of software and are used in our study as the major attributes for training the models for both defect and maintainability prediction. Among the eight different techniques we experimented with, Random Forest provided the best results with an AUC of over 0.8 during both defect and maintenance prediction.","PeriodicalId":295028,"journal":{"name":"2019 IEEE 20th International Conference on Information Reuse and Integration for Data Science (IRI)","volume":"249 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126571896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Message from the IRI 2019 Program Co-Chairs","authors":"Huan Liu, Aidong Zhang, William Wulf","doi":"10.1109/iri.2019.00006","DOIUrl":"https://doi.org/10.1109/iri.2019.00006","url":null,"abstract":"Welcome to the proceedings of the 20th IEEE International Conference on Information Reuse and Integration for Data Science (IRI 2019) in Los Angeles, California, USA. Information reuse and integration (IRI) aims at maximizing the reuse of information by creating simple, rich, and reusable knowledge representations and consequently explores strategies for integrating this knowledge into legacy systems. IRI plays a pivotal role in the capture, representation, maintenance, integration, validation, and extrapolation of information; and applies both information and knowledge for enhancing decision making in various application domains. Over two decades of conferences, IRI has established itself as an internationally renowned forum for researchers and practitioners to exchange ideas, connect with colleagues, and advance the state of the art and practice of current and future research in information reuse and integration. More specifically, this year IRI 2019 conference focuses on data science.","PeriodicalId":295028,"journal":{"name":"2019 IEEE 20th International Conference on Information Reuse and Integration for Data Science (IRI)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130017712","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards a Visualization-Driven Approach to Database Benchmarking Analysis","authors":"Dippy Aggarwal, Shreya Shekhar","doi":"10.1109/IRI.2019.00045","DOIUrl":"https://doi.org/10.1109/IRI.2019.00045","url":null,"abstract":"Employing TPC-defined benchmarks and their derivatives is an established approach adopted by organizations to evaluate and demonstrate performance of their database management systems with the goal of increasing sales and establishing competitiveness of their products. One common challenge in the benchmarking process is the data analysis that involves large, performance datasets for characterizing a database system over underlying system configuration. In this paper, we address two different scenarios that demand detailed data analysis and are commonly found in database benchmarking process - analyzing query execution behavior when multiple streams of queries are run concurrently (typically referred as throughput phase in TPC benchmarks), and visualizing query performance with respect to different resources - cores, memory, storage. We highlight the challenges that exist in the raw data analysis space for each of these use-cases and then demonstrate how the data visualizations we have developed using Python enable insights in an easy-to-use, intuitive manner. Given that the two scenarios we cover are common across multiple benchmarks such as TPC-H, TPC-DS, TPCxBB, and their derivatives, our proposed visualizations can be adapted and used as a resource by the database benchmarking community.","PeriodicalId":295028,"journal":{"name":"2019 IEEE 20th International Conference on Information Reuse and Integration for Data Science (IRI)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132976429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zi Qi, Zhan Bu, Xi Xiong, Hongliang Sun, Jie Cao, Chengcui Zhang
{"title":"A Stock Index Prediction Framework: Integrating Technical and Topological Mesoscale Indicators","authors":"Zi Qi, Zhan Bu, Xi Xiong, Hongliang Sun, Jie Cao, Chengcui Zhang","doi":"10.1109/IRI.2019.00018","DOIUrl":"https://doi.org/10.1109/IRI.2019.00018","url":null,"abstract":"With its growing importance in predicting future stock trends, nearly everyone watches the Chinese financial market. Traditional approaches typically employ a variety of statistical techniques or machine learning methods for stock index predicting, and often rely on analysis of technical indicators. In the existing literature, researchers rarely attempt to predict the stock index by using the topological features of temporal stock correlation networks. Keeping this in mind, we first calculate the correlation coefficient of any two stocks using the classic Visibility Graph Model (VGM). Then, by using the Planar Maximally Filtered Graph (PMFG) method, we generate temporal stock correlation networks from historical stock quantitative data. Next, we choose fourteen frequently adopted Technical Indicators (TIs) and five Topological Mesoscale Indicators (TMIs, extracted from the temporal stock correlation networks) as predictive variables of six machine learning classifiers. To improve forecast accuracy and to address potential overfitting problems, we modify the classic Sequential Backward Selection (SBS) algorithm to learn the most significant predictive variables for each classifier. We then conduct a series of comprehensive experiments on three Chinese stock indices to validate our prediction framework's performance. Experimental results show that using a combination of TIs and TMIs significantly improves forecast accuracy over conventional methods that use either TIs or TMIs exclusively.","PeriodicalId":295028,"journal":{"name":"2019 IEEE 20th International Conference on Information Reuse and Integration for Data Science (IRI)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115786883","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the Feasibility of Attribute-Based Access Control Policy Mining","authors":"Shuvra Chakraborty, R. Sandhu, R. Krishnan","doi":"10.1109/IRI.2019.00047","DOIUrl":"https://doi.org/10.1109/IRI.2019.00047","url":null,"abstract":"As the technology of attribute-based access control (ABAC) matures and begins to supplant earlier models such as role-based or discretionary access control, it becomes necessary to convert from already deployed access control systems to ABAC. Several variations of this general problem can be defined, some of which have been studied by researchers. In particular the ABAC policy mining problem assumes that attribute values for various entities such as users and objects in the system are given, in addition to the authorization state, from which the ABAC policy needs to be discovered. In this paper, we formalize the ABAC RuleSet Existence problem in this context and develop an algorithm and complexity analysis for its solution. We further introduce the notion of ABAC RuleSet Infeasibility Correction along with an algorithm for its solution.","PeriodicalId":295028,"journal":{"name":"2019 IEEE 20th International Conference on Information Reuse and Integration for Data Science (IRI)","volume":"162 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124547884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deep Learning with Maxout Activations for Visual Recognition and Verification","authors":"G. Oscos, Paul Morris, T. Khoshgoftaar","doi":"10.1109/IRI.2019.00033","DOIUrl":"https://doi.org/10.1109/IRI.2019.00033","url":null,"abstract":"Visual recognition is one of the most active research topics in computer vision due to its potential applications in self-driving cars, healthcare, social media, manufacturing, etc. For image classification tasks, deep convolutional neural networks have achieved state-of-the-art results, and many activation functions have been proposed to enhance the classification performance of these networks. We explore the performance of multiple maxout activation variants on image classification, facial recognition and verification tasks using convolutional neural networks. Our experiments compare rectified linear unit, leaky rectified linear unit, scaled exponential linear unit, and hyperbolic tangent to four maxout variants. Throughout the experiments, we find that maxout networks train relatively slower than networks comprised of traditional activation functions. We found that on average, across all datasets, rectified linear units perform better than any maxout activation when the number of convolutional filters is increased six times.","PeriodicalId":295028,"journal":{"name":"2019 IEEE 20th International Conference on Information Reuse and Integration for Data Science (IRI)","volume":"258 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132099001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Technological Advancements in Post-Traumatic Stress Disorder Detection: A Survey","authors":"Bathsheba Farrow, S. Jayarathna","doi":"10.1109/IRI.2019.00044","DOIUrl":"https://doi.org/10.1109/IRI.2019.00044","url":null,"abstract":"It is estimated that 70 percent of adults in the United States have experienced some type of traumatic event at least once in their lives and of that, one in five will develop Post-Traumatic Stress Disorder (PTSD) as a result. Although previously thought of as a condition that affects only military combat veterans, it is a psychological condition that can affect people of all ages. PTSD can lead to depression, suicidal thoughts, and other health issues. Therefore, early diagnosis is key to not only saving lives, but also to returning them to normal. However, PTSD symptoms are often ignored or misdiagnosed. Medical professionals and researchers have sought ways to improve the reliability of traditional PTSD symptom detection and classification methods as well as increase the speed at which diagnosis can be made. Various technologies, including heart rate monitors, electroencephalography (EEG), audio recorders, and eye tracking peripherals are now being used to capture and analyze neurological and physiological data to identify markers for the condition. In this survey, we review and present issues with PTSD diagnosis and methods of symptom detection found in current literature. We evaluate the techniques employed, discuss some of the advantages and disadvantages of the technologies utilized, and recommend ways in which data collection and analysis could be improved for increased reliability of PTSD diagnosis in the future.","PeriodicalId":295028,"journal":{"name":"2019 IEEE 20th International Conference on Information Reuse and Integration for Data Science (IRI)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116662147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}