J. Inf. Data Manag.最新文献_第8页

A Platform for Collaborative Historical Research based on Volunteered Geographical Information 基于志愿地理信息的协同历史研究平台

J. Inf. Data Manag. Pub Date : 2018-12-30 DOI: 10.5753/jidm.2018.2046

K. Ferreira, L. Ferla, G. R. Queiroz, N. Vijaykumar, Carlos A. Noronha, R. Mariano, Denis Taveira, Gabriel Sansigolo, Orlando Guarnieri, Thomas Rogers, J. Lesser, M. Page, Fernando Atique, D. Musa, Janaina Y. Santos, Diego S. Morais, Cristiane R. Miyasaka, C. Almeida, L. Nascimento, Jaine A. Diniz, M. Santos

{"title":"A Platform for Collaborative Historical Research based on Volunteered Geographical Information","authors":"K. Ferreira, L. Ferla, G. R. Queiroz, N. Vijaykumar, Carlos A. Noronha, R. Mariano, Denis Taveira, Gabriel Sansigolo, Orlando Guarnieri, Thomas Rogers, J. Lesser, M. Page, Fernando Atique, D. Musa, Janaina Y. Santos, Diego S. Morais, Cristiane R. Miyasaka, C. Almeida, L. Nascimento, Jaine A. Diniz, M. Santos","doi":"10.5753/jidm.2018.2046","DOIUrl":"https://doi.org/10.5753/jidm.2018.2046","url":null,"abstract":"Digital humanities research promotes the intersection between digital technologies and humanities, emphasizing free knowledge sharing and collaborative work. Based on digital humanities features, this paper describes the architecture of a computational platform for collaborative historical research designed and developed in an ongoing project called Pauliceia 2.0. This project aims to produce historical data of São Paulo city from 1870 to 1940 and to develop a computational platform that allows researchers to explore, integrate and share urban historical data sets. The Pauliceia 2.0 platform main goal is to use volunteered geographical information (VGI) and crowdsourcing concepts to produce past geographical data and to allow historians to share historical data sets resulting from their researches. In this work, we present the Pauliceia 2.0 platform architecture and its underlying VGI protocol.","PeriodicalId":301338,"journal":{"name":"J. Inf. Data Manag.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115332100","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

A Statistical Method for Detecting Move, Stop, and Noise: A Case Study with Bus Trajectories 一种检测移动、停止和噪声的统计方法:以公共汽车轨迹为例

J. Inf. Data Manag. Pub Date : 2018-12-30 DOI: 10.5753/jidm.2018.2041

T. P. Nogueira, C. Celes, H. Martin, A. Loureiro, Rossana M. C. Andrade

{"title":"A Statistical Method for Detecting Move, Stop, and Noise: A Case Study with Bus Trajectories","authors":"T. P. Nogueira, C. Celes, H. Martin, A. Loureiro, Rossana M. C. Andrade","doi":"10.5753/jidm.2018.2041","DOIUrl":"https://doi.org/10.5753/jidm.2018.2041","url":null,"abstract":"The proliferation of devices with positioning capability has allowed new possibilities for studies and applications in the context of urban mobility. However, the process of analyzing raw trajectories poses several challenges. In this work, we investigate one of the main tasks in this process of trajectory analysis: detecting stops from GPS trajectories. Stops can reveal interesting behavior aspects of a moving object such as its daily routine, bottlenecks in traffic jams, or visiting times of touristic places. Although there are some efforts in this direction, most current methods ignore the presence of noise segments, which typically occur many times in trajectories. In this sense, we present a method that exploits gaps in time and space to identify episodes of movement, stop, and periods where some classification is inconclusive, which we define as noise. In addition, our method does not rely on contextual information as opposed to some current solutions, which make our proposal also suitable for trajectories recorded in free space. We compare our method to the state of the art highlighting its advantages in terms of manipulating noise, supporting spatial filtering and being independent of external resources. Moreover, we conduct an experimental evaluation using a large-scale bus dataset to show the effectiveness of our method in a real application scenario.","PeriodicalId":301338,"journal":{"name":"J. Inf. Data Manag.","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123922570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Beyond Hit-or-Miss: A Comparative Study of Synopses for Similarity Searching 超越偶然性:相似检索概要的比较研究

J. Inf. Data Manag. Pub Date : 2018-06-20 DOI: 10.5753/jidm.2018.1635

M. Bedo, Daniel de Oliveira, A. Traina, C. Traina

{"title":"Beyond Hit-or-Miss: A Comparative Study of Synopses for Similarity Searching","authors":"M. Bedo, Daniel de Oliveira, A. Traina, C. Traina","doi":"10.5753/jidm.2018.1635","DOIUrl":"https://doi.org/10.5753/jidm.2018.1635","url":null,"abstract":"A DBMS optimizer module takes its decisions by modeling the query costs upon the distribution of the data space. Cost modeling of similarity queries, however, requires the representation of distances’ rather than data distributions. Therefore, the finding of a suitable representation (or synopsis) for the distance distribution has a major impact in the optimization of similarity searches. In this study, we evaluate the quality of estimates drawn from five synopses of distinct paradigms regarding two common query criteria. Moreover, we embed the synopses into a new parametric cost model, called Stockpile, for the cost estimation of similarity queries on metric trees. The model uses the synopses estimation for calculating the probability of traversing a metric tree node, which defines the expected number of both disk accesses (I/O costs) and distance calculations (CPU costs). We performed an extensive set of experiments on real-world data sources regarding the estimates of each synopsis (and its parametric variations) by using paired ranking tests. In global terms, three synopses have outperformed their competitors regarding selectivity estimation, whereas two of them have also surpassed the others in the prediction of both I/O and CPU costs with respect to Stockpile model predictions. Additionally, results also indicate the choice of the most suitable synopsis may depend on characteristics of the distance distribution.","PeriodicalId":301338,"journal":{"name":"J. Inf. Data Manag.","volume":"42 8","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113986197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Mind Your Dependencies for Semantic Query Optimization 注意语义查询优化的依赖关系

J. Inf. Data Manag. Pub Date : 2018-06-20 DOI: 10.5753/jidm.2018.1633

Eduardo H. M. Pena, Erik Falk, J. Meira, E. Almeida

{"title":"Mind Your Dependencies for Semantic Query Optimization","authors":"Eduardo H. M. Pena, Erik Falk, J. Meira, E. Almeida","doi":"10.5753/jidm.2018.1633","DOIUrl":"https://doi.org/10.5753/jidm.2018.1633","url":null,"abstract":"Semantic query optimization uses dependencies between attributes to formulate query transformations and revise the number of processed rows, with direct impact on performance. Commercial databases present facilities to define dependencies as not enforced constraints. The goal is to help the query optimizer in cases where the database is denormalized or simply lost dependencies in the design. However, feeding these facilities is a manual task which is tedious and error-prone. An attractive alternative is the automatic discovery of dependencies, but the cost of finding dependencies increases with the number of rows and attributes in the dataset. In this paper, we stick to the automatic discovery approach, but to reduce the cost we focus on dependencies matching the current queries in the pipe (ie., workload). Initially, we rely on a large set of functional dependencies computed in batch with state of the art algorithms in the literature. Over time our focused dependency selector (FDSel) chooses exemplars to feed the query optimizer. Therewith we eliminate further manual interactions. The automatically selected exemplars exhibit statistical properties that resemble those of the initial dependency set. This demonstrates the effectiveness of our proposed approach. In the best case scenario, by applying the FDSel for join elimination on a real-world database, we reduce query response time by more than one order of magnitude.","PeriodicalId":301338,"journal":{"name":"J. Inf. Data Manag.","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132918173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Cutting-edge Relational Graph Data Management with Edge-k: From One to Multiple Edges in the Same Row 前沿的关系图数据管理与Edge-k:从一个到多个边在同一行

J. Inf. Data Manag. Pub Date : 2018-06-20 DOI: 10.5753/jidm.2018.1634

L. C. Scabora, Paulo H. Oliveira, Gabriel Spadon, D. S. Kaster, José F. Rodrigues, A. Traina, C. Traina

{"title":"Cutting-edge Relational Graph Data Management with Edge-k: From One to Multiple Edges in the Same Row","authors":"L. C. Scabora, Paulo H. Oliveira, Gabriel Spadon, D. S. Kaster, José F. Rodrigues, A. Traina, C. Traina","doi":"10.5753/jidm.2018.1634","DOIUrl":"https://doi.org/10.5753/jidm.2018.1634","url":null,"abstract":"Relational Database Management Systems (RDBMSs) are widely employed in several applications, including those that deal with data modeled as graphs. Existing solutions store every edge in a distinct row in the edge table, however, for most cases, such modeling does not provide adequate performance. In this work, we propose Edge-k, a technique to group the vertex neighborhood into a reduced number of rows in a table through additional columns that stores up to k edges per row. The technique provides a better table organization and reduces both table size and query processing time. We evaluate Edge-k table management for insert, update, delete and bulkload operations, and compare the query processing performance both with the conventional edge table — adopted by the existing frameworks — and with the Neo4j graph database. Experiments using Single-Source Shortest Path (SSSP) queries reveal that our new proposal approach always outperforms the conventional edge table as well as it was faster than Neo4j for the first iterations, being slightly slower than Neo4j only for iterations after having loaded the whole graph from disk to memory. It was able to reach a speedup of 66% over a representative real dataset, with an average reduction of up to 58% in our tests. The average speedup over synthetic datasets was up to 54%. Edge-k was also the best one when performing graph degree distribution queries. Moreover, the Edge-k table obtained a processing time reduction of 70% for bulkload operations, despite having an overhead of 50% for individual insert, update and delete operations. Finally, Edge-k advances the state of the art for graph data management within relational database systems.","PeriodicalId":301338,"journal":{"name":"J. Inf. Data Manag.","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115954972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

LABAREDA: A Predictive and Elastic Load Balancing Service for Cloud-Replicated Databases LABAREDA:用于云复制数据库的预测和弹性负载平衡服务

J. Inf. Data Manag. Pub Date : 2018-06-20 DOI: 10.5753/jidm.2018.1639

Carlos S. S. Marinho, L. O. Moreira, E. Coutinho, José S. Costa Filho, F. R. C. Sousa, Javam C. Machado

引用次数: 7

STACY: Strength of Ties Automatic-Classifier over the Years 史黛西:多年来领带自动分类器的强度

J. Inf. Data Manag. Pub Date : 2018-06-20 DOI: 10.5753/jidm.2018.1636

Michele A. Brandão, Pedro O. S. Vaz de Melo, Mirella M. Moro

引用次数: 5

Tie Strength Metrics to Rank Pairs of Developers from GitHub 将强度指标与GitHub的开发人员配对

J. Inf. Data Manag. Pub Date : 2018-06-20 DOI: 10.5753/jidm.2018.1637

Natércia A. Batista, Guilherme A. de Sousa, Michele A. Brandão, Ana Paula Couto da Silva, Mirella M. Moro

{"title":"Tie Strength Metrics to Rank Pairs of Developers from GitHub","authors":"Natércia A. Batista, Guilherme A. de Sousa, Michele A. Brandão, Ana Paula Couto da Silva, Mirella M. Moro","doi":"10.5753/jidm.2018.1637","DOIUrl":"https://doi.org/10.5753/jidm.2018.1637","url":null,"abstract":"The Web provides huge volumes of data, which makes efficient data collecting and processing not easy tasks. An example of such volumes is in software repositories, a type of Web storage platform for software and projects,their developers and companies. In this work, we first present a systematic literature review over topics related to such repositories. Then, we extract their data and enrich it by building a development network. Based on such a network, we investigate tie strength metrics on their capability of defining new information through a correlation analysis. We also use the metrics to rank pairs of developers by considering three different aggregate methods. Our experimental analysis shows different results for each ranking method when considering all pairs of developers, which reveals the difficulty of choosing the best way to rank pairs of developers. However, when considering the top 10 best ranked pairs, two methods present similar results. Also, the combination of tie strength metrics with ranking aggregated methods allows to identify important developers in the network and their collaboration strength.","PeriodicalId":301338,"journal":{"name":"J. Inf. Data Manag.","volume":"73 7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128035828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Towards an Empirical Evaluation of Scientific Data Indexing and Querying 科学数据索引与查询的实证评价

J. Inf. Data Manag. Pub Date : 2018-06-20 DOI: 10.5753/jidm.2018.1638

Thaylon Guedes, V. Silva, J. Camata, M. Bedo, M. Mattoso, Daniel de Oliveira

{"title":"Towards an Empirical Evaluation of Scientific Data Indexing and Querying","authors":"Thaylon Guedes, V. Silva, J. Camata, M. Bedo, M. Mattoso, Daniel de Oliveira","doi":"10.5753/jidm.2018.1638","DOIUrl":"https://doi.org/10.5753/jidm.2018.1638","url":null,"abstract":"Computational simulations usually produce large amounts of data on a regular time-step basis. Heterogeneous simulation outputs are stored in different file formats and on distinct storage devices. Therefore, the main challenges for accessing simulation data are related to time-to-query, which is the effort spent for setting all data into a common framework, the issuing of a high-level query statement, and obtaining the result set. The simulation data loading into DataBase Management Systems (DBMS) are either unpractical, as they demand a prohibitive time for data preparation, or unfeasible, as data files are still needed in their original form (scientific applications still need to read and write contents to those files). In this article, we discuss the complementary approaches of adaptive querying and raw data file indexing for accessing simulation results stored in multiple sources (e.g., raw data files) without data loading. In particular, we review (i) NoDB PostgresRAW routines for adaptive query processing, and (ii) FastBit methods for raw data file indexing and querying. We examine the behavior of both strategies regarding a real case study of computational fluid dynamics simulation in the domain of sediment deposition. In this experimental evaluation, we measured the elapsed time for index construction and query processing regarding six distinct query categories over 62 time steps, which sums up to different 372 queries on 44,160 files (12.2 GB) produced by the computational simulation. Results show that FastBit is faster than PostgresRAW for query execution in all but low-selectivity query scenarios. In a complementary manner, results also show PostgresRAW outperforms FastBit whenever users are interested in reducing time-to-query rather than the query execution time itself.","PeriodicalId":301338,"journal":{"name":"J. Inf. Data Manag.","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116788432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

ProQua: a system for evaluating logic-based scoring functions on uncertain relational data ProQua:一个对不确定关系数据进行基于逻辑的评分函数评估的系统

J. Inf. Data Manag. Pub Date : 2012-09-27 DOI: 10.1145/2452376.2452474

S. Lehrack, S. Saretz

引用次数: 4