Journal of Information and Data Management最新文献

筛选
英文 中文
Capturing Provenance from Deep Learning Applications Using Keras-Prov and Colab: a Practical Approach 使用keras - prove和Colab从深度学习应用程序捕获来源:一种实用的方法
Journal of Information and Data Management Pub Date : 2022-12-19 DOI: 10.5753/jidm.2022.2544
Débora Pina, L. Kunstmann, Felipe Bevilaqua, Isabela Siqueira, Alan Lyra, Daniel de Oliveira, M. Mattoso
{"title":"Capturing Provenance from Deep Learning Applications Using Keras-Prov and Colab: a Practical Approach","authors":"Débora Pina, L. Kunstmann, Felipe Bevilaqua, Isabela Siqueira, Alan Lyra, Daniel de Oliveira, M. Mattoso","doi":"10.5753/jidm.2022.2544","DOIUrl":"https://doi.org/10.5753/jidm.2022.2544","url":null,"abstract":"Due to the exploratory nature of DNNs, DL specialists often need to modify the input dataset, change a filter when preprocessing input data, or fine-tune the models’ hyperparameters, while analyzing the evolution of the training. However, the specialist may lose track of what hyperparameter configurations have been used and tuned if these data are not properly registered. Thus, these configurations must be tracked and made available for the user’s analysis. One way of doing this is to use provenance data derivation traces to help the hyperparameter’s fine-tuning by providing a global data picture with clear dependencies. Current provenance solutions present provenance data disconnected from W3C PROV recommendation, which is difficult to reproduce and compare to other provenance data. To help with these challenges, we present Keras-Prov, an extension to the Keras deep learning library to collect provenance data compliant with PROV. To show the flexibility of Keras-Prov, we extend a previous Keras-Prov demonstration paper with larger experiments using GPUs with the help of Google Colab. Despite the challenges of running a DBMS with virtual environments, DL analysis with provenance has added trust and persistence in databases and PROV serializations. Experiments show Keras-Prov data analysis, during training execution, to support hyperparameter fine-tuning decisions, favoring the comparison, and reproducibility of such DL experiments. Keras-Prov is open source and can be downloaded from https://github.com/dbpina/keras-prov.","PeriodicalId":293511,"journal":{"name":"Journal of Information and Data Management","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114233017","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Consistent Design of Relational Databases using EERCASE 基于EERCASE的关系数据库一致性设计
Journal of Information and Data Management Pub Date : 2022-12-19 DOI: 10.5753/jidm.2022.2537
Robson N. Fidalgo, Edson A. Silva
{"title":"Consistent Design of Relational Databases using EERCASE","authors":"Robson N. Fidalgo, Edson A. Silva","doi":"10.5753/jidm.2022.2537","DOIUrl":"https://doi.org/10.5753/jidm.2022.2537","url":null,"abstract":"This article introduces EERCASE, a Computer Aided Software Engineering tool that is based on the best practices of the Model Driven Development paradigm to provide a consistent environment for relational database design. EERCASE follows the graphical notation of the Enhanced Entity–Relationship model according to Elmasri and Navathe, implements the EERMM metamodel to avoid syntactically invalid constructs, shows and describes static semantic errors, and generates data definition code that takes into account advanced structural validations. The theoretical and technical framework used for the implementation of EERCASE is discussed, with emphasis on the restrictive and informative validations performed by it. In addition, considering feedbacks on modeling errors and code generation, EERCASE is also presented as a computational environment that favors active learning.","PeriodicalId":293511,"journal":{"name":"Journal of Information and Data Management","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132131908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Searching for Researchers: an Ontology-based NoSQL Database System Approach and Practical Implementation 寻找研究人员:一种基于本体的NoSQL数据库系统方法及实际实现
Journal of Information and Data Management Pub Date : 2022-12-19 DOI: 10.5753/jidm.2022.2601
Mariana D. A. Salgueiro, Verônica dos Santos, André L. C. Rêgo, Daniel S. Guimarães, Jefferson B. Santos, Edward H. Haeusler, Marcos V. Villas, Sérgio Lifschitz
{"title":"Searching for Researchers: an Ontology-based NoSQL Database System Approach and Practical Implementation","authors":"Mariana D. A. Salgueiro, Verônica dos Santos, André L. C. Rêgo, Daniel S. Guimarães, Jefferson B. Santos, Edward H. Haeusler, Marcos V. Villas, Sérgio Lifschitz","doi":"10.5753/jidm.2022.2601","DOIUrl":"https://doi.org/10.5753/jidm.2022.2601","url":null,"abstract":"This work presents the design and implementation of two web-based search systems, Busc@NIMA and Quem@PUC. Both systems allow the identification of research and development projects, besides existing competencies in laboratories and departments involving professors and researchers at PUC-Rio University. Our applications are based on a list of search-related terms that are matched to the dataset composed of PUC-Rio’s Lattes CVs offered courses, information from administrative systems, and specific keywords that are input by the professors/researchers themselves. To integrate all the needed data, we consider multiple database and search technologies, such as XML, RDF, TripleStores, and Relational Databases. Search results include professor’s name, academic papers, teaching activities, contact links, keywords, and laboratories of those involved with the subject represented by the set of keywords input. We describe the main features that show how our systems work.","PeriodicalId":293511,"journal":{"name":"Journal of Information and Data Management","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128852363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Scientific Collaboration Network Views: A Brazilian Computer Science Graduate Programs Case 科学合作网络观点:巴西计算机科学研究生项目案例
Journal of Information and Data Management Pub Date : 2022-10-03 DOI: 10.5753/jidm.2022.2695
Aurelio Ribeiro Costa, Vanessa Tavares Nunes, Célia Ghedini Ralha
{"title":"Scientific Collaboration Network Views: A Brazilian Computer Science Graduate Programs Case","authors":"Aurelio Ribeiro Costa, Vanessa Tavares Nunes, Célia Ghedini Ralha","doi":"10.5753/jidm.2022.2695","DOIUrl":"https://doi.org/10.5753/jidm.2022.2695","url":null,"abstract":"Scientific collaboration networks can present different views of researchers’ interactions. This work presents SCI-synergy, an online navigable artifact aiming to promote mechanisms and views of scientific collaboration networks. The artifact focuses on the researchers’ interaction in the co-authorship of publications considering intra- and interprogram relationships. SCI-synergy is developed upon the design science research paradigm using scientific publication data available on the large Digital Bibliography & Library Project (DBLP) repository. Official data from the Sucupira repository of six Brazilian graduate program members including Federal University of Minas Gerais (UFMG), State University of São Paulo (USP), Federal University of Rio Grande do Norte (UFRN), Federal University of Amazonas (UFAM), University of Brasília (UnB), and University of Vale do Rio dos Sinos (UNISINOS) is used. Data from these graduate programs illustrate the artifact usage regarding the scientific collaboration network of each program, how each researcher cooperates, and what relationship patterns exist in intra- and inter-programs views. We advocate that, even though it is necessary to consider data from each program’s history and current contextualization regarding politics, economics, and administration, the collaboration network views provided by SCI-synergy might help to understand collaboration network patterns.","PeriodicalId":293511,"journal":{"name":"Journal of Information and Data Management","volume":"58 8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123304080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SentiLexBR: An Automatic Methodology of Building Sentiment Lexicons for the Portuguese Language SentiLexBR:一种自动构建葡萄牙语情感词汇的方法
Journal of Information and Data Management Pub Date : 2022-09-21 DOI: 10.5753/jidm.2022.2504
Tiago de Melo
{"title":"SentiLexBR: An Automatic Methodology of Building Sentiment Lexicons for the Portuguese Language","authors":"Tiago de Melo","doi":"10.5753/jidm.2022.2504","DOIUrl":"https://doi.org/10.5753/jidm.2022.2504","url":null,"abstract":"User reviews are readily available on the Web and widely used for sentiment analysis tasks. Sentiment lexicons plays an important role in sentiment analysis, where each sentiment word is given a sentiment label (positive or negative) or score (1 or -1). However, a sentiment lexicon may express different sentiment polarity according different domain. In addition, only a few studies on Portuguese sentiment analysis are reported due to the lack of resources including domain-specific sentiment lexical corpora. In this paper, we present an effective methodology, called SentiLexBR, using probabilities of the Bayes’ Theorem for building a set of sentiment lexicons. An unsupervised algorithm is proposed to automatically identify sentiment lexicons with their polarities for the Portuguese language. Experimental results on user reviews datasets in 12 different domains indicate the effectiveness of our methodology in domain-specific sentiment lexicon generation for Portuguese. In addition, the sentiment lexicon produced by SentiLexBR also significantly outperforms several alternative approaches of building domain-specific sentiment lexicons.","PeriodicalId":293511,"journal":{"name":"Journal of Information and Data Management","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116037388","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluation of Automatic Speech Recognition Approaches 自动语音识别方法的评价
Journal of Information and Data Management Pub Date : 2022-09-21 DOI: 10.5753/jidm.2022.2514
Regis Pires Magalhães, Daniel Jean Rodrigues Vasconcelos, Guilherme Sales Fernandes, Lívia Almada Cruz, Matheus Xavier Sampaio, José Antônio Fernandes de Macêdo, Ticiana Linhares Coelho da Silva
{"title":"Evaluation of Automatic Speech Recognition Approaches","authors":"Regis Pires Magalhães, Daniel Jean Rodrigues Vasconcelos, Guilherme Sales Fernandes, Lívia Almada Cruz, Matheus Xavier Sampaio, José Antônio Fernandes de Macêdo, Ticiana Linhares Coelho da Silva","doi":"10.5753/jidm.2022.2514","DOIUrl":"https://doi.org/10.5753/jidm.2022.2514","url":null,"abstract":"Automatic Speech Recognition (ASR) is essential for many applications like automatic caption generation for videos, voice search, voice commands for smart homes, and chatbots. Due to the increasing popularity of these applications and the advances in deep learning models for transcribing speech into text, this work aims to evaluate the performance of commercial solutions for ASR that use deep learning models, such as Facebook Wit.ai, Microsoft Azure Speech, Google Cloud Speech-to-Text, Wav2Vec, and AWS Transcribe. We performed the experiments with two real and public datasets, the Mozilla Common Voice and the Voxforge. The results demonstrate that the evaluated solutions slightly differ. However, Facebook Wit.ai outperforms the other analyzed approaches for the quality metrics collected like WER, BLEU, and METEOR. We also experiment to fine-tune Jasper Neural Network for ASR with four datasets different with no intersection to the ones we collect the quality metrics. We study the performance of the Jasper model for the two public datasets, comparing its results with the other pre-trained models.","PeriodicalId":293511,"journal":{"name":"Journal of Information and Data Management","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115998589","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
FASED: A Framework for Data Ecosystems Health Evaluation 基于:数据生态系统健康评估框架
Journal of Information and Data Management Pub Date : 2022-09-21 DOI: 10.5753/jidm.2022.2511
Glória de Fátima B. Lima, Marcelo Iury S. Oliveira, Bernadette Farias Lóscio
{"title":"FASED: A Framework for Data Ecosystems Health Evaluation","authors":"Glória de Fátima B. Lima, Marcelo Iury S. Oliveira, Bernadette Farias Lóscio","doi":"10.5753/jidm.2022.2511","DOIUrl":"https://doi.org/10.5753/jidm.2022.2511","url":null,"abstract":"The growing availability of data in digital media has contributed to the creation of a large number of data ecosystems. However, having successful Data Ecosystem is still a challenge. In order to prevent the failure of a Data Ecosystem and ensure its survival, evaluating its health becomes fundamental. In a general way, the health of a Data Ecosystem can be defined as its ability to grow and survive over time. Indicators such as productivity, robustness, niche creation and sustainability can be employed to evaluate the health of a Data Ecosystem. In this paper, we propose a framework for data Ecosystem health evaluation composed of a set of indicators and metrics, which assess the Data Ecosystem’s current state and its ability to stay healthy over time. The results obtained when using the proposed framework offers evidence to assist in decision making on how data has being published and consumed in a Data Ecosystem, as well as to evaluate which ecosystems are more prosperous or need more investments.","PeriodicalId":293511,"journal":{"name":"Journal of Information and Data Management","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125675480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
FeatSet+: Visual Features Extracted from Public Image Datasets feature set +:从公共图像数据集中提取的视觉特征
Journal of Information and Data Management Pub Date : 2022-08-15 DOI: 10.5753/jidm.2022.2328
Mirela T. Cazzolato, Lucas C. Scabora, Guilherme F. Zabot, Marco A. Gutierrez, Caetano Traina Jr., Agma J. M. Traina
{"title":"FeatSet+: Visual Features Extracted from Public Image Datasets","authors":"Mirela T. Cazzolato, Lucas C. Scabora, Guilherme F. Zabot, Marco A. Gutierrez, Caetano Traina Jr., Agma J. M. Traina","doi":"10.5753/jidm.2022.2328","DOIUrl":"https://doi.org/10.5753/jidm.2022.2328","url":null,"abstract":"Real-world applications generate large amounts of images every day. With the generalized use of social media, users frequently share images acquired by smartphones. Also, hospitals, clinics, exhibits, factories, and other facilities generate images with potential use for many applications. Processing the generated images usually requires feature extraction, which can be time-consuming and laborious. In this paper, we present FeatSet+, a compilation of color, texture and shape visual features extracted from 17 open image datasets reported in the literature. FeatSet+ provides a collection of 11 distinct visual features, extracted by well-known Feature Extraction Methods (FEMs) such as LBP, Haralick, and Color Layout. We organized the available features in a standard collection, including the metadata and labels, when available. Eleven of the datasets also contain classes, which aid the evaluation of supervised methods such as classifiers and clustering tasks. FeatSet+ is available for download in a public repository as sql scripts and csv files. Additionally, FeatSet+ provides a description of the domain of each dataset, including the reference to the original work and link. We show the potential applicability of FeatSet+ in four computational tasks: multi-attribute analysis and retrieval, visual analysis using Multidimensional Scaling (MDS) and Principal Components Analysis (PCA), global feature classification, and dimensionality reduction. FeatSet+ can be employed to evaluate supervised and non-supervised learning tasks, also widely supporting Content-Based Image Retrieval (CBIR) applications and complex data indexing using Metric Access Methods (MAMs).","PeriodicalId":293511,"journal":{"name":"Journal of Information and Data Management","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127017013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Collecting, extracting and storing web research survey questionnaires data 收集、提取和存储网络研究调查问卷数据
Journal of Information and Data Management Pub Date : 2022-08-15 DOI: 10.5753/jidm.2022.2318
Carina F. Dorneles, Gilney N. Mathias
{"title":"Collecting, extracting and storing web research survey questionnaires data","authors":"Carina F. Dorneles, Gilney N. Mathias","doi":"10.5753/jidm.2022.2318","DOIUrl":"https://doi.org/10.5753/jidm.2022.2318","url":null,"abstract":"Companies or institutions can use survey questionnaires to evaluate items or products, analyze their employees/customers’ satisfaction or collect any data they consider helpful. Furthermore, questionnaires can be used to collect data that can be used in research studies. Some problems in creating such questionnaires involve: deciding which questions to ask, how to ask them, and how to organize them. Many research communities, especially in the healthcare field, maintain repositories that are publicly accessible and include different questionnaires that help professionals and researchers analyze the results of questions, add new questions, or even point out nonsense questions. In this paper, we describe: (i) web crawler, which scans the Web searching for sites that possibly contain questionnaires; (ii) an extractor, which extracts the questionnaires from the list of pages collected by the crawler and saves them into a relational database; and (iii) the public dataset we have created to persist the questionnaires. The database created can then serve to analyze these data and/or as a centralized base of examples to prepare new questionnaires or reuse existing questions. The experiments we have conducted demonstrate that our crawler has achieved 94,47%, and the extractor has achieved a precision between 90% and 92%.","PeriodicalId":293511,"journal":{"name":"Journal of Information and Data Management","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127426504","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Musical Success in the United States and Brazil: Novel Datasets and Temporal Analyses 音乐在美国和巴西的成功:新的数据集和时间分析
Journal of Information and Data Management Pub Date : 2022-08-15 DOI: 10.5753/jidm.2022.2350
Gabriel P. Oliveira, Gabriel R. G. Barbosa, Bruna C. Melo, Juliana E. Botelho, Mariana O. Silva, Danilo B. Seufitelli, Mirella M. Moro
{"title":"Musical Success in the United States and Brazil: Novel Datasets and Temporal Analyses","authors":"Gabriel P. Oliveira, Gabriel R. G. Barbosa, Bruna C. Melo, Juliana E. Botelho, Mariana O. Silva, Danilo B. Seufitelli, Mirella M. Moro","doi":"10.5753/jidm.2022.2350","DOIUrl":"https://doi.org/10.5753/jidm.2022.2350","url":null,"abstract":"Music is not only a worldwide essential cultural industry but also one of the most dynamic. The increasing volume of complex music-related data defines new challenges and opportunities for extracting knowledge, benefiting not only different music segments but also the Music Information Retrieval research field. In this article, we assess musical success in the United States and Brazil, two of the biggest music markets in the world. We first introduce MUHSIC and MUHSIC-BR, two novel datasets with enhanced success information that combine chart-related data with acoustic metadata to describe the temporal evolution of musical careers. Then, we use such enriched and curated data to cluster artists according to their success level by considering their high-impact periods (hot streaks). Our results reveal three groups with distinct success behavior over time. Furthermore, Brazil and the US present specific music success patterns regarding artists and genres, reflecting the importance of analyzing regional markets individually.","PeriodicalId":293511,"journal":{"name":"Journal of Information and Data Management","volume":"1770 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127139647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信