2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...最新文献
C. Giossi, D. Maier, K. Tufte, Elliot Gall, M. Barnes
{"title":"Towards Agile Integration: Specification-based Data Alignment","authors":"C. Giossi, D. Maier, K. Tufte, Elliot Gall, M. Barnes","doi":"10.1109/IRI49571.2020.00055","DOIUrl":"https://doi.org/10.1109/IRI49571.2020.00055","url":null,"abstract":"Utilizing data sets from multiple domains is a common procedure in scientific research. For example, research on the performance of buildings may require data from multiple sources that lack a singular standard for data reporting. The Building Management System might report data at regular 5minute intervals, whereas an air-quality sensor might capture values only when there has been significant change from the previous value. Many systems exist to help integrate multiple data sources into a single system or interface. However, such systems do not necessarily make it easy to modify an integration plan, for example, to accommodate data exploration, new and changing data sets or shifts in the questions of interest. We propose an agile data-integration system to enable quick and adaptive analysis across many data sets, concentrating initially on the data alignment step: combining data values from multiple time-series based data sets whose time schedules. To this end, we adopt a Domain Specific Language approach where we construct a domain model for alignment, provide a specification language for describing alignments in the model and implement an interpreter for specification in that language. Our implementation exploits a rank-based join in SQL that produces faster alignment times than the commonly suggested method of aligning data sets in a database. We present experiments to demonstrate the advantage of our method and exploit data properties for optimization.","PeriodicalId":93159,"journal":{"name":"2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...","volume":"62 1","pages":"333-340"},"PeriodicalIF":0.0,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87576325","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Detection Methods of Slow Read DoS Using Full Packet Capture Data","authors":"Clifford Kemp, Chad L. Calvert, T. Khoshgoftaar","doi":"10.1109/IRI49571.2020.00010","DOIUrl":"https://doi.org/10.1109/IRI49571.2020.00010","url":null,"abstract":"Detecting Denial of Service (DoS) attacks on web servers has become extremely popular with cybercriminals and organized crime groups. A successful DoS attack on network resources reduces availability of service to a web site and backend resources, and could easily result in a loss of millions of dollars in revenue depending on company size. There are many DoS attack methods, each of which is critical to providing an understanding of the nature of the DoS attack class. There has been a rise in recent years of application-layer DoS attack methods that target web servers and are challenging to detect. An attack may be disguised to look like legitimate traffic, except it targets specific application packets or functions. Slow Read DoS attack is one type of slow HTTP attack targeting the application-layer. Slow Read attacks are often used to exploit weaknesses in the HTTP protocol, as it is the most widely used protocol on the Internet. In this paper, we use Full Packet Capture (FPC) datasets for detecting Slow Read DoS attacks with machine learning methods. All data collected originates in a live network environment. Our approach produces FPC features taken from network packets at the IP and TCP layers. Experimental results show that the machine learners were quite successful in identifying the Slow Read attacks with high detection and low false alarm rates using FPC data. Our experiment evaluates FPC datasets to determine the accuracy and efficiency of several detection models for Slow Read attacks. The experiment demonstrates that FPC features are discriminative enough to detect such attacks.","PeriodicalId":93159,"journal":{"name":"2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...","volume":"31 1","pages":"9-16"},"PeriodicalIF":0.0,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78789284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-Class Cardiovascular Diseases Diagnosis from Electrocardiogram Signals using 1-D Convolution Neural Network","authors":"Mehdi Fasihi, M. Nadimi-Shahraki, A. Jannesari","doi":"10.1109/IRI49571.2020.00060","DOIUrl":"https://doi.org/10.1109/IRI49571.2020.00060","url":null,"abstract":"The electrocardiogram (ECG) is an important signal in the health informatics for the detection of cardiac abnormalities. There have been several researches on using machine learning techniques for analyzing ECG. However, they need additional computation owning to ECG signals challenges. We introduce a new architecture of 1-D convolution neural network (CNN) to diagnose arrhythmia diseases automatically. The proposed architecture consists of four convolution layers, three pooling layers, and three fully connected layers evaluated on the arrhythmia dataset. All previous researches are conducted to classify healthy people from people with Arrhythmia disease. In this paper, we propose to go further multiclass classification with two classes of cardiac diseases and one class of healthy people. The results are compared with common 1-D CNN and seven different classifiers. The experimental results demonstrate that the proposed architecture is superior to existing classifiers and also competitive with state of the art in terms of accuracy.","PeriodicalId":93159,"journal":{"name":"2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...","volume":"33 1","pages":"372-378"},"PeriodicalIF":0.0,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73696595","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ankit Srivastava, Samira Pouyanfar, Joshua Allen, Ken Johnston, Qida Ma
{"title":"Distributed Differentially Private Mutual Information Ranking and Its Applications","authors":"Ankit Srivastava, Samira Pouyanfar, Joshua Allen, Ken Johnston, Qida Ma","doi":"10.1109/IRI49571.2020.00021","DOIUrl":"https://doi.org/10.1109/IRI49571.2020.00021","url":null,"abstract":"Computation of Mutual Information (MI) helps understand the amount of information shared between a pair of random variables. Automated feature selection techniques based on MI ranking are regularly used to extract information from sensitive datasets exceeding petabytes in size, over millions of features and classes. Series of one-vs-all MI computations can be cascaded to produce n-fold MI results, rapidly pinpointing informative relationships. This ability to quickly pinpoint the most informative relationships from datasets of billions of users creates privacy concerns. In this paper, we present Distributed Differentially Private Mutual Information (DDP-MI), a privacy-safe fast batch MI, across various scenarios such as feature selection, segmentation, ranking, and query expansion. This distributed implementation is protected with global model differential privacy to provide strong assurances against a wide range of privacy attacks. We also show that our DDP-MI can substantially improve the efficiency of MI calculations compared to standard implementations on a large-scale public dataset.","PeriodicalId":93159,"journal":{"name":"2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...","volume":"4 1","pages":"90-96"},"PeriodicalIF":0.0,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82039517","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"IRI 2020 TOC","authors":"Rashmi Jha, David Kapp, Thuong Khanh Tran","doi":"10.1109/iri49571.2020.00004","DOIUrl":"https://doi.org/10.1109/iri49571.2020.00004","url":null,"abstract":"","PeriodicalId":93159,"journal":{"name":"2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...","volume":"39 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73830219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"KGdiff: Tracking the Evolution of Knowledge Graphs","authors":"Abbas Keshavarzi, K. Kochut","doi":"10.1109/IRI49571.2020.00047","DOIUrl":"https://doi.org/10.1109/IRI49571.2020.00047","url":null,"abstract":"A Knowledge Graph (KG) is a machine-readable, labeled graph-like representation of human knowledge. As the main goal of KG is to represent data by enriching it with computer-processable semantics, the knowledge graph creation usually involves acquiring data from external resources and datasets. In many domains, especially in biomedicine, the data sources continuously evolve, and KG engineers and domain experts must not only track the changes in KG entities and their interconnections but introduce changes to the KG schema and the graph population software. We present a framework to track the KG evolution both in terms of the schema and individuals. KGdiff is a software tool that incrementally collects the relevant meta-data information from a KG and compares it to a prior version the KG. The KG is represented in OWL/RDF/RDFS and the meta-data is collected using domain-independent queries. We evaluate our method on different RDF/OWL data sets (ontologies).","PeriodicalId":93159,"journal":{"name":"2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...","volume":"31 9 1","pages":"279-286"},"PeriodicalIF":0.0,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81635781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Data Driven Relational Constraint Programming","authors":"Michael Valdron, K. Pu","doi":"10.1109/IRI49571.2020.00030","DOIUrl":"https://doi.org/10.1109/IRI49571.2020.00030","url":null,"abstract":"We propose a data-driven constraint programming environment that merges the power of two separate domains: databases and SAT-solvers. While a database system offers flexible data models and query languages, SAT solvers offer the ability to satisfy logical constraints and optimization objectives. In this paper, we describe a goal-oriented declarative algebra that seamlessly integrates both worlds. Bring from proven practices in functional programming, we express constants, variables and constraints in a unified relational query language. The language is implemented on top of industrial strength database engines and SAT solvers.In order to support iterative constraint programming with debugging, we propose several debugging operators to assist with interactive constraint solving.","PeriodicalId":93159,"journal":{"name":"2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...","volume":"8 1","pages":"156-163"},"PeriodicalIF":0.0,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91072677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bo Ma, Jinsong Wu, William Liu, L. Chiaraviglio, Xing Ming
{"title":"Combating Hard or Soft Disasters with Privacy-Preserving Federated Mobile Buses-and-Drones based Networks","authors":"Bo Ma, Jinsong Wu, William Liu, L. Chiaraviglio, Xing Ming","doi":"10.1109/IRI49571.2020.00013","DOIUrl":"https://doi.org/10.1109/IRI49571.2020.00013","url":null,"abstract":"It is foreseeable the popularity of the mobile edge computing enabled infrastructure for wireless networks in the incoming fifth generation (5G) and future sixth generation (6G) wireless networks. Especially after a ‘hard’ disaster such as earthquakes or a ‘soft’ disaster such as COVID-19 pandemic, the existing telecommunication infrastructure, including wired and wireless networks, is often seriously compromised or with infectious disease risks and should-not-close-contact, thus cannot guarantee regular coverage and reliable communications services. These temporarily-missing communications capabilities are crucial to rescuers, health-carers, or affected or infected citizens as the responders need to effectively coordinate and communicate to minimize the loss of lives and property, where the 5G/6G mobile edge network helps. On the other hand, the federated machine learning (FML) methods have been newly developed to address the privacy leakage problems of the traditional machine learning held normally by one centralized organization, associated with the high risks of a single point of hacking. After detailing current state-of-the-art both in privacy-preserving, federated learning, and mobile edge communications networks for ‘hard’ and ‘soft’ disasters, we consider the main challenges that need to be faced. We envision a privacy-preserving federated learning enabled buses-and-drones based mobile edge infrastructure (ppFL-AidLife) for disaster or pandemic emergency communications. The ppFL-AidLife system aims at a rapidly deployable resilient network capable of supporting flexible, privacy-preserving and low-latency communications to serve large-scale disaster situations by utilizing the existing public transport networks, associated with drones to maximally extend their radio coverage to those hard-to-reach disasters or should-not-close-contact pandemic zones.","PeriodicalId":93159,"journal":{"name":"2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...","volume":"158 1","pages":"31-36"},"PeriodicalIF":0.0,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86730072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"IRI 2020 Breaker Page","authors":"","doi":"10.1109/iri49571.2020.00003","DOIUrl":"https://doi.org/10.1109/iri49571.2020.00003","url":null,"abstract":"","PeriodicalId":93159,"journal":{"name":"2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...","volume":"43 4 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83489308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Semantic Embeddings for Medical Providers and Fraud Detection","authors":"Justin M. Johnson, T. Khoshgoftaar","doi":"10.1109/IRI49571.2020.00039","DOIUrl":"https://doi.org/10.1109/IRI49571.2020.00039","url":null,"abstract":"A medical provider’s specialty is a significant predictor for detecting fraudulent providers with machine learning algorithms. When the specialty variable is encoded using a one-hot representation, however, models are subjected to sparse and uninformative feature vectors. We explore three techniques for representing medical provider types with dense, semantic embeddings that capture specialty similarities. The first two methods (GloVe and Med-Word2Vec) use pre-trained word embeddings to convert provider specialty descriptions to short phrase embeddings. Next, we propose a method for constructing semantic provider type embeddings from the procedure-level activity within each specialty group. For each embedding technique, we use Principal Component Analysis to compare the performance of embedding sizes between 32-128. Each embedding technique is evaluated on a highly imbalanced Medicare fraud prediction task using Logistic Regression (LR), Random Forest (RF), Gradient Boosted Tree (GBT), and Multilayer Perceptron (MLP) learners. Experiments are repeated 30 times and confidence intervals show that all three semantic embeddings significantly outperform one-hot representations when using RF and GBT learners. Our contributions include a novel method for embedding medical specialties from procedure codes and a comparison of three semantic embedding techniques for Medicare fraud detection.","PeriodicalId":93159,"journal":{"name":"2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...","volume":"42 1","pages":"224-230"},"PeriodicalIF":0.0,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84713156","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}