2017 28th International Workshop on Database and Expert Systems Applications (DEXA)最新文献

Semantic Analysis Supporting De-Radicalisation 支持去激进化的语义分析

2017 28th International Workshop on Database and Expert Systems Applications (DEXA) Pub Date : 2017-08-01 DOI: 10.1109/DEXA.2017.20

N. Derbas, F. Segond, Muntsa Padró, Emmanuelle Dusserre, Teodora Dobre, S. Monaci, Gustavo Mastrobuoni

引用次数: 4

Protein-Protein Interaction Prediction: Recent Advances 蛋白质-蛋白质相互作用预测:最新进展

2017 28th International Workshop on Database and Expert Systems Applications (DEXA) Pub Date : 2017-08-01 DOI: 10.1109/DEXA.2017.30

M. Shatnawi

引用次数: 2

Principled Data Preprocessing: Application to Biological Aquatic Indicators of Water Pollution 原则数据预处理:在水污染水生生物指标中的应用

2017 28th International Workshop on Database and Expert Systems Applications (DEXA) Pub Date : 2017-08-01 DOI: 10.1109/DEXA.2017.27

Eva C. Serrano Balderas, Laure Berti-Équille, Maria Aurora Armienta Hernandez, C. Grac

{"title":"Principled Data Preprocessing: Application to Biological Aquatic Indicators of Water Pollution","authors":"Eva C. Serrano Balderas, Laure Berti-Équille, Maria Aurora Armienta Hernandez, C. Grac","doi":"10.1109/DEXA.2017.27","DOIUrl":"https://doi.org/10.1109/DEXA.2017.27","url":null,"abstract":"In many biological studies, statistical and data mining methods are extensively used to analyze the data and discover actionable knowledge. But, bad data quality causing incorrect analysis results and wrong interpretations may induce misleading conclusions and inadequate decisions. To ensure the validity of the results, avoid bias and data misuse, it is necessary to control not only the whole analytical pipeline, but most importantly the quality of the data with appropriate data preprocessing choices. Since various preprocessing techniques and alternative strategies may lead to dramatically different outputs, it is crucial to rely on a principled and rigorous method to select the optimal set of data preprocessing steps that depends both on the input data distributional characteristics and on the inherent characteristics of the targeted statistical or data mining methods. In this paper, we propose a method that selects, given a dataset, the optimal set of preprocessing tasks to apply to the data such that the overall data preprocessing output maximizes the quality of the analytical results for various techniques of clustering, regression, and classification. We present some promising results that validate our approach on biomonitoring data preparation.","PeriodicalId":127009,"journal":{"name":"2017 28th International Workshop on Database and Expert Systems Applications (DEXA)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134029696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

An Error Correction Algorithm for NGS Data 一种NGS数据纠错算法

2017 28th International Workshop on Database and Expert Systems Applications (DEXA) Pub Date : 2017-08-01 DOI: 10.1109/DEXA.2017.33

M. Kchouk, J. Gibrat, M. Elloumi

引用次数: 1

A Corpus of Narratives Related to Luxembourg for the Period 1945-1975 《1945-1975年与卢森堡有关的叙事文集》

2017 28th International Workshop on Database and Expert Systems Applications (DEXA) Pub Date : 2017-08-01 DOI: 10.1109/DEXA.2017.39

O. Parisot, T. Tamisier

引用次数: 1

A Machine Learning Approach towards Detecting Extreme Adopters in Digital Communities 一种用于检测数字社区极端采用者的机器学习方法

2017 28th International Workshop on Database and Expert Systems Applications (DEXA) Pub Date : 2017-08-01 DOI: 10.1109/DEXA.2017.17

A. Shrestha, Lisa Kaati, Katie Cohen

引用次数: 8

A Tool for Statistical Analysis on Network Big Data 网络大数据统计分析工具

2017 28th International Workshop on Database and Expert Systems Applications (DEXA) Pub Date : 2017-08-01 DOI: 10.1109/DEXA.2017.23

C. Ordonez, T. Johnson, D. Srivastava, Simon Urbanek

{"title":"A Tool for Statistical Analysis on Network Big Data","authors":"C. Ordonez, T. Johnson, D. Srivastava, Simon Urbanek","doi":"10.1109/DEXA.2017.23","DOIUrl":"https://doi.org/10.1109/DEXA.2017.23","url":null,"abstract":"Due to advances in parallel file systems for big data (i.e. HDFS) and larger capacity hardware (multicore CPUs, large RAM) it is now feasible to manage and query network data in a parallel DBMS supporting SQL, but performing statistical analysis remains a challenge.On the statistics side, the R language is popular, but it presents important limitations: R is limited by main memory, R works in a different address space from query processing, R cannot analyze large disk-resident data sets efficiently, and R has no data management capabilities. Moreover, some R libraries allow R to work in parallel, but without data management capabilities. Considering the challenges and limitations described above, we present a system that allows combining SQL queries and R functions in a seamless manner. We justify a parallel DBMS and the R runtime are two different systems that benefit from a low-level integration. Our parallel DBMS is built on top of HDFS, programmed in Java and C++, with a flexible scale out architecture, whereas R is programmed purely in C. The user or developer can make calls in both directions: (1) R calling SQL, to evaluate analytic queries or retrieve data from materialized views (transferring result tables in RAM in a streaming fashion and analyzing them in R), and vice-versa (2) SQL calling R, allowing SQL to convert relational tables to matrices or vectors and making complex computations on them. We give a summary of network monitoring tasks at ATT and present specific programming examples, showing language calls in both directions (i.e. R calls SQL, SQL calls R).","PeriodicalId":127009,"journal":{"name":"2017 28th International Workshop on Database and Expert Systems Applications (DEXA)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129027907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Opinion Expression Detection via Deep Bidirectional C-GRUs 基于深度双向c - gru的意见表达检测

2017 28th International Workshop on Database and Expert Systems Applications (DEXA) Pub Date : 2017-08-01 DOI: 10.1109/DEXA.2017.40

Xiaoxia Xie

引用次数: 4

Classifying Web Exploits with Topic Modeling 利用主题建模对Web漏洞进行分类

2017 28th International Workshop on Database and Expert Systems Applications (DEXA) Pub Date : 2017-08-01 DOI: 10.1109/DEXA.2017.35

Jukka Ruohonen

引用次数: 13

Introducing Design Patterns to Knowledge Processing Systems in the Context of Big Data and Cloud Platforms 大数据和云平台背景下知识处理系统的设计模式引入

2017 28th International Workshop on Database and Expert Systems Applications (DEXA) Pub Date : 2017-08-01 DOI: 10.1109/DEXA.2017.26

Stefan Nadschläger

引用次数: 0