{"title":"Using Musical and Statistical Analysis of the Predominant Melody of the Voice to Create datasets from a Database of Popular Brazilian Hit Songs","authors":"André A. Bertoni, Rodrigo P. Lemos","doi":"10.5753/jidm.2022.2336","DOIUrl":"https://doi.org/10.5753/jidm.2022.2336","url":null,"abstract":"This work deals with the creation and optimization of a large set of features extracted from a database of 882 popular brazilian hit songs and non-hit songs, from 2014 to May 2019. From this database of songs, we created four datasets of musical features. The first comprises 3215 statistical features, while the second, third and fourth are completely new, as they were formed from the predominant melody of the Voice and previously there were no similar databases available for study. The second set of data represents the graph of the time-frequency spectrogram of the singer’s voice during the first 90 seconds of each song. The third dataset results from a statistical analysis carried out on the predominant melody of the voice. The fourth is the most peculiar of all, as it results from the musical semantic analysis of the predominant melody of the voice, which allowed the construction of a table with the most frequent melodic sequences of each song. Our datasets use only Brazilian songs and focus their data on a limited and contemporary period. The idea behind these datasets is to encourage the study of Machine Learning techniques that require musical information. The extracted features can help develop new studies in Music and Computer Science in the future.","PeriodicalId":293511,"journal":{"name":"Journal of Information and Data Management","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121071535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jeziel C. Marinho, Rafael T. Anchiêta, Raimundo S. Moura
{"title":"Essay-BR: a Brazilian Corpus to Automatic Essay Scoring Task","authors":"Jeziel C. Marinho, Rafael T. Anchiêta, Raimundo S. Moura","doi":"10.5753/jidm.2022.2340","DOIUrl":"https://doi.org/10.5753/jidm.2022.2340","url":null,"abstract":"Automatic Essay Scoring (AES) is the computer technology that evaluates and scores the written essays, aiming to provide computational models to grade essays automatically or with minimal human involvement. While there are several AES studies in a variety of languages, few of them are focused on the Portuguese language. The main reason is the lack of a corpus with manually graded essays. In order to bridge this gap, in this paper we extended a corpus of essays written by Brazilian high school students in an online platform. All of the essays are argumentative and were scored across five competences by experts. Moreover, we conducted an experiment with the extended corpus to show some challenges posed by the Portuguese language. The corpus are publicly available at https://github.com/lplnufpi/essay-br.","PeriodicalId":293511,"journal":{"name":"Journal of Information and Data Management","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115455963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Impact of Privacy Regulations on DB Systems","authors":"Javam C. Machado, Paulo R. P. Amora","doi":"10.5753/jidm.2021.1958","DOIUrl":"https://doi.org/10.5753/jidm.2021.1958","url":null,"abstract":"Personal data usage and collection are activities that used to grow unrestricted. However, several laws in the physical world ensure rights to people regarding their privacy and information usage. In the last years, legislators passed many laws, regulations, and acts to replicate these rights to the digital world. By doing so, new constraints, rights, and duties appear on every component of the data usage and collection workflow. In this paper, we discuss legislations’ implications, identifying impacts that these regulations introduce to current DBMS, and survey recent works that aim to solve the problems raised by these impacts, highlighting research opportunities and identifying how solutions can be achieved for the problems.","PeriodicalId":293511,"journal":{"name":"Journal of Information and Data Management","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134398801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mariana M. Silva, Iago C. Chaves, Javam C. Machado
{"title":"Private Reverse Top-k Algorithms Applied on Public Data of COVID-19 in the State of Ceará","authors":"Mariana M. Silva, Iago C. Chaves, Javam C. Machado","doi":"10.5753/jidm.2021.1941","DOIUrl":"https://doi.org/10.5753/jidm.2021.1941","url":null,"abstract":"In this article we propose a differentially private reverse top-k query. Our strategy allows obtaining the less frequent data according to a search criteria, with a high guarantee of privacy of the individuals who contributed with personal data in the original database. We apply our strategy on public data for COVID-19 in the State of Ceará using two different queries. Our experimental results show that the result of the proposed top-k query returns a high degree of similarity to the result of a conventional top-k query, when the chosen budget is suitable, providing useful results for researchers, while ensuring a low probability of re-identification of individuals arising from the properties of differential privacy.","PeriodicalId":293511,"journal":{"name":"Journal of Information and Data Management","volume":"121 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131612842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Manuel E. B. Filho, Eduardo R. Duarte Neto, Javam C. Machado
{"title":"Privacy-preserving of patients with Differential Privacy: an experimental evaluation in COVID-19 dataset","authors":"Manuel E. B. Filho, Eduardo R. Duarte Neto, Javam C. Machado","doi":"10.5753/jidm.2021.1947","DOIUrl":"https://doi.org/10.5753/jidm.2021.1947","url":null,"abstract":"The pandemic of the new coronavirus (COVID-19) has brought new challenges to health systems in almost every corner of the world, many of them overburdened. The data analysis has given support in the fight against the coronavirus. Through this analysis, government authorities, together with health care providers, adopted effective strategies. Yet, those strategies can not be careless of privacy concerns. The individuals’ privacy is a right of each citizen. Privacy techniques guarantee the analysis of health data without exposing individuals’ private information. However, a balance between data privacy and utility is essential for a good analysis of the data. This work will demonstrate that it is possible to guarantee the privacy of infected patients and maintain the utility of the data, allowing a sound analysis on them, from the visualization of the application of differentially private mechanisms on queries in the data of patients tested in the State of Ceará - Brazil.","PeriodicalId":293511,"journal":{"name":"Journal of Information and Data Management","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133871382","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Roberto U. Paiva, Sávio S. T. Oliveira, Luiz M. L. Pascoal, Leandro L. Parente, Wellington S. Martins
{"title":"Parallel Processing of Remote Sensing Time Series Applied to Land-Use and Land-Cover Classification","authors":"Roberto U. Paiva, Sávio S. T. Oliveira, Luiz M. L. Pascoal, Leandro L. Parente, Wellington S. Martins","doi":"10.5753/jidm.2021.1785","DOIUrl":"https://doi.org/10.5753/jidm.2021.1785","url":null,"abstract":"The increase in satellite launches into Earth's orbit in recent years has generated a huge amount of remote sensing data. These data, in the form of time series, have been used in automated classification approaches, generating land-use and land-cover (LULC) products for different landscapes around the world. Dynamic Time Warping (DTW) is a well-known computational method used to measure the similarity between time series. Tt has been used in many algorithms for remote sensing time series analysis. These DTW-based algorithms are capable of generating similarity measures between time series and patterns. These measures can be used as meta-features to increase the accuracy results of classification models. However, DTW-based algorithms require a lot of computational resources and have a high execution time, which makes them difficult to use in large volumes of data. This article presents a parallel and fully scalable solution to optimize the construction of meta-features through remote sensing time series (RSTS). In addition, results of the application of the generated meta-features in the training and evaluation of classification models using Random Forest are presented. The results show that the proposed approaches have led to improvements in execution time and accuracy when compared to traditional strategies.","PeriodicalId":293511,"journal":{"name":"Journal of Information and Data Management","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133614938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Carlos A. Felgueiras, Jussara O. Ortiz, Eduardo C. G. Camargo, Laércio M. Namikawa, Thales S. Körting
{"title":"Exploring Geostatistical Modeling and VisualizationTechniques of Uncertainties for Categorical Spatial Data","authors":"Carlos A. Felgueiras, Jussara O. Ortiz, Eduardo C. G. Camargo, Laércio M. Namikawa, Thales S. Körting","doi":"10.5753/jidm.2021.1786","DOIUrl":"https://doi.org/10.5753/jidm.2021.1786","url":null,"abstract":"This article presents and analyzes the indicator geostatistical modeling and some visualization techniques of uncertainty models for categorical spatial attributes. A set of sample points of some categorical attribute is used as input information. The indicator approach requires a transformation of sample points on fields of indicator samples according to the classes of interest. Experimental and theoretical semivariograms of the indicator fields are defined representing the spatial variation of the indicator information. The indicator fields, along with their semivariograms, are used to determine the uncertainty model, the conditioned probability distribution function, of the attribute at any location inside the geographic region delimited by the samples. The probability functions are considered for producing prediction and probability maps based on the maximum class probability criterion. These maps can be visualized using different techniques. In this work, it is considered individual visualization of the predicted and probability maps and a combination of them. The predicted maps can also be visualized with or without constraints related to the uncertainty probabilities. The combined visualizations are based on three-dimensional (3D) planar projection and on the Red-Green-Blue to Intensity-Hue-Saturation (RGB-IHS) fusion transformation techniques. The methodology of this article is illustrated by a case study with real data, a sample set of soil textures observed in an experimental farm located in the region of São Carlos city in São Paulo State, Brazil. The resulting maps of this case study are presented and the advantages and the drawbacks of the visualization options are analyzed and discussed.","PeriodicalId":293511,"journal":{"name":"Journal of Information and Data Management","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121639644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
João V. O. Novaes, Lúcio F. D. Santos, Luiz Olmes Carvalho, Daniel de Oliveira, Marcos V. N. Bedo, Agma J. M. Traina, Caetano Traina Jr.
{"title":"J-EDA: A workbench for tuning similarity and diversity search parameters in content-based image retrieval","authors":"João V. O. Novaes, Lúcio F. D. Santos, Luiz Olmes Carvalho, Daniel de Oliveira, Marcos V. N. Bedo, Agma J. M. Traina, Caetano Traina Jr.","doi":"10.5753/jidm.2021.1990","DOIUrl":"https://doi.org/10.5753/jidm.2021.1990","url":null,"abstract":"Similarity searches can be modeled by means of distances following the Metric Spaces Theory and constitute a fast and explainable query mechanism behind content-based image retrieval (CBIR) tasks. However, classical distance-based queries, e.g., Range and k-Nearest Neighbors, may be unsuitable for exploring large datasets because the retrieved elements are often similar among themselves. Although similarity searching is enriched with the imposition of rules to foster result diversification, the fine-tuning of the diversity query is still an open issue, which is is usually carried out with and a non-optimal expensive computational inspection. This paper introduces J-EDA, a practical workbench implemented in Java that supports the tuning of similarity and diversity search parameters by enabling the automatic and parallel exploration of multiple search settings regarding a user-posed content-based image retrieval task. J-EDA implements a wide variety of classical and diversity-driven search queries, as well as many CBIR settings such as feature extractors for images, distance functions, and relevance feedback techniques. Accordingly, users can define multiple query settings and inspect their performances for spotting the most suitable parameterization for a content-based image retrieval problem at hand. The workbench reports the experimental performances with several internal and external evaluation metrics such as P × R and Mean Average Precision (mAP), which are calculated towards either incremental or batch procedures performed with or without human interaction.","PeriodicalId":293511,"journal":{"name":"Journal of Information and Data Management","volume":"210 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134529795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
L. H. M. Jacintho, T. P. da Silva, A. R. S. Parmezan, G. E. A. P. A. Batista
{"title":"Analysing Spatio-Temporal Voting Patterns in Brazilian Elections Through a Simple Data Science Pipeline","authors":"L. H. M. Jacintho, T. P. da Silva, A. R. S. Parmezan, G. E. A. P. A. Batista","doi":"10.5753/jidm.2021.1932","DOIUrl":"https://doi.org/10.5753/jidm.2021.1932","url":null,"abstract":"Since 1989, the first year of the democratic presidential election after a long period of a dictatorship regime, Brazil conducted eight presidential elections. Short and long-term shifts of power and two impeachment processes marked such a period. This instability is a research case in electoral studies, mainly regarding the understanding of citizens' voting behavior. Comprehending patterns in the population behavior can give us insight into phenomena and processes that affect democratic political decisions. In light of this, our paper analyses Brazilian electoral data at the municipal level from 1998 to 2018 using a simple data science pipeline, which consists of five steps: (i) data selection; (ii) data preprocessing; (iii) identification of spatial patterns, in which we seek to understand the role of space in the election results employing spatial auto-correlation techniques; (iv) identification of temporal patterns, where we investigate similar trends of votes over the years applying a hierarchical clustering method; and (v) evaluation of results. We study the presidential elections focusing on the right and left-wing parties most relevant for the period: the Brazilian Social Democracy Party~(PSDB) and the Workers' Party~(PT). We also analyse the congressman election data regarding parties ideologically to the right and left in the political spectrum. Through the obtained results, we found the existence of spatial dependence in every electoral year investigated. Moreover, despite the changes in the political-economic context over the years, neighboring cities seem to present similar voting behavior trends.","PeriodicalId":293511,"journal":{"name":"Journal of Information and Data Management","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124645582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Weakly Supervised Learning Algorithm to Eliminate Irrelevant Association Rules in Large Knowledge Bases","authors":"Bruno B. Cifarelli, Rafael G. L. Miani","doi":"10.5753/jidm.2020.2025","DOIUrl":"https://doi.org/10.5753/jidm.2020.2025","url":null,"abstract":"The construction and population of large knowledge bases have been widely explored in the past few years. Many techniques were developed in order to accomplish this purpose. Association rule mining algorithms can also be used to help populate these knowledge bases. Nevertheless, analyzing the amount of association rules generated can be a challenge and time-consuming task. The technique described in this article aims to eliminate irrelevant association rules in order to facilitate the rules evaluation process. To achieve that, this article presents a weakly supervised learning technique to prune irrelevant association rules. The proposed method uses irrelevant rules already discovered in past iterations and prunes off those with the same pattern. Experiments showed that the new technique can reduce and eliminate the amount of rules by about 60%, decreasing the effort required to evaluate them.","PeriodicalId":293511,"journal":{"name":"Journal of Information and Data Management","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123509212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}