{"title":"Polyflow: a Polystore-compliant Mechanism to Provide Interoperability to Heterogeneous Provenance Graphs","authors":"Yan Mendes, Daniel de Oliveira, Victor Ströele","doi":"10.5753/JIDM.2020.2017","DOIUrl":"https://doi.org/10.5753/JIDM.2020.2017","url":null,"abstract":"Many scientific experiments are modeled as workflows. Workflows usually output massive amounts of data. To guarantee the reproducibility of workflows, they are usually orchestrated by Workflow Management Systems (WfMS), that capture provenance data. Provenance represents the lineage of a data fragment throughout its transformations by activities in a workflow. Provenance traces are usually represented as graphs. These graphs allows scientists to analyze and evaluate results produced by a workflow. However, each WfMS has a proprietary format for provenance and do it in different granularity levels. Therefore, in more complex scenarios in which the scientist needs to interpret provenance graphs generated by multiple WfMSs and workflows, a challenge arises. To first understand the research landscape, we conduct a Systematic Literature Mapping, assessing existing solutions under several different lenses. With a clearer understanding of the state of the art, we propose a tool called Polyflow, which is based on the concept of Polystore systems, integrating several databases of heterogeneous origin by adopting a global ProvONE schema. Polyflow allows scientists to query multiple provenance graphs in an integrated way. Polyflow was evaluated by experts using provenance data collected from real experiments that generate phylogenetic trees through workflows. The experiment results suggest that Polyflow is a viable solution for interoperating heterogeneous provenance data generated by different WfMSs, from both a usability and performance standpoint.","PeriodicalId":301338,"journal":{"name":"J. Inf. Data Manag.","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114905368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
João A. O. Pedrosa, D. Oliveira, Wagner Meira, A. L. Ribeiro
{"title":"Automated classification of cardiology diagnoses based on textual medical reports","authors":"João A. O. Pedrosa, D. Oliveira, Wagner Meira, A. L. Ribeiro","doi":"10.5753/kdmile.2020.11975","DOIUrl":"https://doi.org/10.5753/kdmile.2020.11975","url":null,"abstract":"Automatic diagnoses of diseases has been a long term challenge for Computer Science and related disciplines. Textual clinical reports can be used as a great source of data for such diagnoses. However, building classification models from them is not a trivial task. The problem tackled in this work is the identification of the medical diagnoses that are indicated in these reports. In the past, several methods have been proposed for addressing this problem, but a method developed for reports in the cardiology area that are written in Portuguese is still needed. In this paper we describe a method that is able to handle the peculiarities of clinical reports, including the medical terminology, and that is implemented to estimate correctly the disease based on raw clinical reports and a list of the possible diagnoses. Experimental results show that our method has a high degree of accuracy, even for infrequent classes and complex databases.","PeriodicalId":301338,"journal":{"name":"J. Inf. Data Manag.","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122087409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Query co-planning for shared execution in Key-Value Stores","authors":"J. Ttito, Renato Marroquín, Sérgio Lifschitz","doi":"10.5753/sbbd.2020.13643","DOIUrl":"https://doi.org/10.5753/sbbd.2020.13643","url":null,"abstract":"Key-value stores propose a very simple yet powerful data model. Data is modeled using key-value pairs where values can be arbitrary objects and can be written/read using the key associated with it. In addition to their simple interface, such data stores also provide read operations such as full and range scans. However, due to the simplicity of its interface, trying to optimize data accesses becomes challenging. This work aims to enable the shared execution of concurrent range and point queries on key-value stores. Thus, reducing the overall data movement when executing a complete workload. To accomplish this, we analyze different possible data structures and propose our variation of a segment tree, Updatable Interval Tree. This data structure helps us co-planning and co-executing multiple range queries together, as we show in our evaluation.","PeriodicalId":301338,"journal":{"name":"J. Inf. Data Manag.","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127576652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Weighted Linking Decomposition: Mining Denser and More Compact Hierarchies for Bipartite Graphs","authors":"Edré Moreira, G. Campos, Wagner Meira Jr","doi":"10.5753/JIDM.2020.2031","DOIUrl":"https://doi.org/10.5753/JIDM.2020.2031","url":null,"abstract":"Dense subgraph detection is a well-known problem in graph theory. The hierarchical organization of graphs as dense subgraphs, however, goes beyond simple clustering, as it allows the analysis of the network at different scales.Although there are several hierarchical decomposition methods for unipartite graphs, only a few approaches for the bipartite case have been proposed. In this work, we explore the problem of hierarchical decomposition for bipartite graphs.We propose an algorithm called Weighted Linking that identifies denser and more compact hierarchies than the state of the art approach. We also propose a new score to help choose the best between two hierarchical decompositions of the same graph.The proposed algorithm was evaluated experimentally using six real-world datasets and identified smaller and denser hierarchies on most of them.","PeriodicalId":301338,"journal":{"name":"J. Inf. Data Manag.","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132896306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. P. S. Alves, Lucas G. S. Félix, C. M. Barbosa, Vitor Elisiário Carmo, V. D. F. Vieira, C. R. Xavier
{"title":"World Cups Impact Analysis in the Soccer Players Transaction and Soccer Globalization using Complex Network Techniques","authors":"A. P. S. Alves, Lucas G. S. Félix, C. M. Barbosa, Vitor Elisiário Carmo, V. D. F. Vieira, C. R. Xavier","doi":"10.5753/jidm.2019.2035","DOIUrl":"https://doi.org/10.5753/jidm.2019.2035","url":null,"abstract":"In this paper, we propose an analysis of the relationship between World Cup results and the number of transfers of soccer players of their national teams. For this study, networks are collected, modeled and generated for periods of time before each world cup since 1966. The effects of these events were evaluated by investigating the best and worst teams transfers networks, at each edition of the cups. We also investigated sociological theories that associate globalization to transfer networks in soccer, being able to show through quantitative data, the hypotheses raised and to renew these proposals showing the rise of new markets, such as those from Asia. To carry out the analysis, complex networks and data mining techniques were combined and this evaluation showed that countries that perform many transactions do not necessarily perform well in the world cups. However, part of the countries involved in numerous transfers can have a good performance, standing in good positions after the world cups.","PeriodicalId":301338,"journal":{"name":"J. Inf. Data Manag.","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123721714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
V. D. F. Vieira, Lucas G. S. Félix, C. M. Barbosa, C. R. Xavier
{"title":"Investigating the Relation Between Companies with Topological Analysis of a Network of Stock Exchange in Brazil","authors":"V. D. F. Vieira, Lucas G. S. Félix, C. M. Barbosa, C. R. Xavier","doi":"10.5753/jidm.2019.2033","DOIUrl":"https://doi.org/10.5753/jidm.2019.2033","url":null,"abstract":"B3 (Brasil, Bolsa, Balcão) is the official stock exchange in Brazil and plays a key role in the world financial market. Stock exchange allows people and companies to relate through the shareholding and the purchase and sale of shares. The study of the relationship between people and companies can reveal valuable information about the operation of the stock exchange and, consequently, the financial market as a whole. In this work, the relations in B3 are modeled as a network, in which the vertices represent companies and people and the edges represent shareholdings. From the built network, several analyzes are performed with the objective of understanding and characterizing the patterns found in relationships. Investigation on the topology of the network is performed under different perspectives, such as the centrality of the vertices, organization of vertices in communities, the robustness and the diffusion of influence. The results show a strong community structure in the B3 network and, even though the network is fragile for the removal of vetices, the definition of the criterion of vertices to be chosen as a target can be determinant in the characterization of the robustness.","PeriodicalId":301338,"journal":{"name":"J. Inf. Data Manag.","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132286282","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Generating Links for Patent Documents: an Automatic Approach using Computational Intelligence","authors":"C. M. Souza, M. E. Santos, M. Meireles","doi":"10.5753/jidm.2019.2032","DOIUrl":"https://doi.org/10.5753/jidm.2019.2032","url":null,"abstract":"Patents are organized into classification systems, which assist offices and users in the process of seeking and retrieving such documents. A wide variety of users use the patent systems and the information contained in these documents. However, patents are complex legal documents with a significant number of technical and descriptive details, which makes it difficult to identify and analyze the information contained in these documents. An automatic link system associated with some of the terms found in the patents would provide quick access to the concepts contained in specific knowledge bases. This work presents results of a project in which the objective is the automatic generation of links in patent documents. The experiments were conducted with four subgroups of the United States Patent and Trademark Office (USPTO), which uses the Cooperative Patent Classification (CPC) system. As the patent documents did not have keywords, the meaningful terms were selected using the algorithm χ2, for which the contents of the entire patent document were used. Some keywords with more than one meaning were disambiguated using a specific algorithm, generating a file with useful information used in the experiments. The links were generated based on Wikipedia articles and the USPTO patent database. The use of the patent database as a possible destination for the link is intended to cover cases in which Wikipedia has no articles on certain terms and also to provide an alternative source that may assist readers in understanding those documents. It is expected, with the creation of automated links, to make it easier to access concepts related to the terms presented by the documents and to understand the information disclosed by the inventors.","PeriodicalId":301338,"journal":{"name":"J. Inf. Data Manag.","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122539707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exploring Deep Learning for the Analysis of Emotional Reactions to Terrorist Events on Twitter","authors":"Karin Becker, Jonathas G. D. Harb, Régis Ebeling","doi":"10.5753/jidm.2019.2039","DOIUrl":"https://doi.org/10.5753/jidm.2019.2039","url":null,"abstract":"Terrorist events have a substantial emotional impact on the population, and understanding these effects is very important to design effective assistance programs. However, investigating community-wide traumas is a complex and costly task, where most challenges are related to the data collection process. Social media has been used as a relevant source of data to investigate people’s sentiments and ideas. In this article, we study the emotional reactions of Twitter users regarding two terrorist events that occurred in the United Kingdom. The contributions are twofold: a) we experiment two deep learning architectures to develop an emotion classifier, and b) we develop an analysis on tweets related to terrorist events to underst and whether there is an emotional shift due to a terrorist attack andwhether the emotional reactions are dependent on the event, or on the demographics of the users. Both models, based on convolutional and recurrent neural architectures, presented very similar performances. The analyses revealed an emotion shift due to the events and a difference in the reactions to each specific event, where gender is the most significant factor.","PeriodicalId":301338,"journal":{"name":"J. Inf. Data Manag.","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117139866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D. Braga, T. C. D. Silva, A. D. Rocha, Gustavo Coutinho, R. P. Magalhães, Paulo T. Guerra, J. Macêdo, Simone D. J. Barbosa
{"title":"Time Series Forecasting to Support Irrigation Management","authors":"D. Braga, T. C. D. Silva, A. D. Rocha, Gustavo Coutinho, R. P. Magalhães, Paulo T. Guerra, J. Macêdo, Simone D. J. Barbosa","doi":"10.5753/jidm.2019.2037","DOIUrl":"https://doi.org/10.5753/jidm.2019.2037","url":null,"abstract":"Irrigated agriculture is the most water-consuming sector in Brazil, representing one of the main challenges for the sustainable use of water. This study has investigated and evaluated popular machine learning techniques like Gradient Boosting and Random Forest, deep learning models and univariate time series models to predict the value of reference evapotranspiration, a metric of water loss from the crop to the environment. The reference evapotranspiration ET0, plays an essential role in irrigation management since it can be used to reduce the amount of water that will not be absorbed by the crop. We performed the experiments with two real datasets generated by weather stations. The results show that the deep learning models are data-hungry, even when we increased the training set it was not enough to outperform multivariate models like Random Forest, Gradient Boosting and M5’ which indeed execute faster than the deep learning models during the training phase. However, the univariate time series model as the evaluated deep learning models (stacked LSTM and BLSTM) is a viable and lower-cost solution for predicting ET0, since we need to monitor only one variable.","PeriodicalId":301338,"journal":{"name":"J. Inf. Data Manag.","volume":"145 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131756337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Interactive Visualization of Trivariate Georeferenced Data","authors":"Tarsus Magnus Pinheiro, Claudio Esperança","doi":"10.5753/jidm.2018.2043","DOIUrl":"https://doi.org/10.5753/jidm.2018.2043","url":null,"abstract":"This paper describes an online interactive thematic map for simultaneously visualizing up to three scalar variables and which supports data filtering, panning and zooming in levels of detail. The visual encoding of the map mixes the use of colors and textures as well as simple operations like border detection and intersection identification. The user experience is enhanced by means of queries posed through manipulation tools that produce instant visual feedback. This is possible through the high rendering rates achieved by the system through the use of GPU programming to assemble and manipulate previously rasterized tiles with location information recorded in the color space of pixels. This procedure allows the implementation of interactive animated actions and spatial data decomposition.","PeriodicalId":301338,"journal":{"name":"J. Inf. Data Manag.","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133607684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}