Caio Viktor S. Avila, Wellington Franco, A. D. P. Venceslau, T. V. Rolim, V. Vidal, V. Pequeno
{"title":"MediBot: An Ontology-Based Chatbot to Retrieve Drug Information and Compare its Prices","authors":"Caio Viktor S. Avila, Wellington Franco, A. D. P. Venceslau, T. V. Rolim, V. Vidal, V. Pequeno","doi":"10.5753/jidm.2021.2148","DOIUrl":"https://doi.org/10.5753/jidm.2021.2148","url":null,"abstract":"In this article, we present the MediBot. MediBot is a chatbot for querying drugs information. The presented system acted as a single access point for natural and simplified information retrieval of drugs, prices, and its risks. The chatbot has two modes of operation: Quick Response and Interactive modes. The first answers questions asked in natural language, while the second has three interactive tasks, namely Browser, Query, and Price Comparison. We present here the system architecture, the Linked Data Mashup’s construction process, and Chatbot MediBot’s activities modes, focusing on the new Price Comparison’s task. This task presents the best prices for medicines and their best potential substitutes extracted in real-time from the Web with the help of the information obtained from a linked data mashup.","PeriodicalId":301338,"journal":{"name":"J. Inf. Data Manag.","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132402234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R. Mello, Carlos Henrique Cândido, Milton Bittencourt S. Neto
{"title":"brModelo: An Initiative for Aiding Database Design","authors":"R. Mello, Carlos Henrique Cândido, Milton Bittencourt S. Neto","doi":"10.5753/jidm.2021.1983","DOIUrl":"https://doi.org/10.5753/jidm.2021.1983","url":null,"abstract":"\u0000\u0000\u0000The brModelo tool is a initiative of the UFSC Database Group. Its first version was developed in 2005, and its main purpose is to help teaching of relational database design. Compared to similar tools, its main differentials are the support to all steps of the classical database design methodology, user interaction during the logical design step, as well as the support to all extended Entity-Relationship concepts. With more than fifteen years of existence, the brModelo was very well-accepted by the brazilian Database community, which motivated the development and release of several versions of the tool. This article presents the history of brModelo, including its available versions and their functionalities. Additionally, we detail its functionalities and compare it with popular related tools.\u0000\u0000\u0000","PeriodicalId":301338,"journal":{"name":"J. Inf. Data Manag.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128398750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Régis Ebeling, Carlos Abel Córdova Sáenz, J. Nobre, Karin Becker
{"title":"The effect of political polarization on social distance stances in the Brazilian COVID-19 scenario","authors":"Régis Ebeling, Carlos Abel Córdova Sáenz, J. Nobre, Karin Becker","doi":"10.5753/jidm.2021.1889","DOIUrl":"https://doi.org/10.5753/jidm.2021.1889","url":null,"abstract":"The COVID-19 pandemic changed the routine and concerns of people around the world since 2020. The alarming contagious rate and the lack of treatment or vaccine evoked different reactions to controlling and mitigating the virus's contagious. In this paper, we developed a case study on the Brazilian COVID scenario, investigating the influence of the political polarization in the pro/against stances of social isolation, represented in Twitter by two groups referred to as the Cloroquiners and Quarenteners. We analyzed these groups according to multiple dimensions: a) concerns expressed by each group and main arguments representing each stance; b) techniques to automatically infer from users political orientation, c) network analysis and community detection to characterize their behavior as a social network group and d) analysis of linguistic characteristics to identify psychological aspects. We propose combining two topic modeling techniques, LDA and BERTopics, to understand each stance's concerns in different granularity levels. Our main findings confirm that Cloroquiners are right-wing partisans, whereas Quarenteners are more related to the left-wing. Cloroquiners and Quarenteners' political polarization influences the arguments of economy and life and a stronger support/opposition to the president. As a group, the network of Cloroquiners is more closed and connected, and Quarenteners have a more diverse political engagement with a community of users polarized only with left-wing politicians and his supporters. In terms of psychological aspects, polarized groups come together on cognitive issues and negative emotions.","PeriodicalId":301338,"journal":{"name":"J. Inf. Data Manag.","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125954321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
L. Verçosa, R. Lira, R. Monteiro, Kleber D. M. Silva, Jailson O. L. Magalhães, A. M. A. Maciel, B. Bezerra, C. B. Filho
{"title":"Analysis of Distinct Feature Groups in the Credit Scoring Problem","authors":"L. Verçosa, R. Lira, R. Monteiro, Kleber D. M. Silva, Jailson O. L. Magalhães, A. M. A. Maciel, B. Bezerra, C. B. Filho","doi":"10.5753/jidm.2021.1930","DOIUrl":"https://doi.org/10.5753/jidm.2021.1930","url":null,"abstract":"Registration and financial data have been traditionally used for the credit scoring problem. However,slight improvements in the reliability of the scores positively impacts financial companies. Therefore, exploring newfeatures is a strategic task. This work analyzes the importance of new feature groups not commonly employed forthe credit scoring task and others already used. We categorized features from open credit scoring datasets, suchas German and Australian and compared their groups with the ones of a company dataset used in this work. Ourdataset contains unusual feature groups, such as historical, geolocation, web behavior, and demographic data. In ouranalyzes, we first conducted bivariate tests with each feature-pair to assess their individual importance. Secondly, weran XGBoost machine learning model with each feature group to evaluate each group importance. We also appliedfeature selection with binary Particle Swarm Optimization to assess the groups importance when combined. Next, weemployed correlation tests to find inner and inter-correlation among the features groups. Finally, we used the companydataset and employed AdaBoost, Multilayer Perceptron, and XGBoost algorithms to find the best model for the task.Some of our main findings were that the unusual features added a slight improvement to registration features. We alsodetected reasonable inner correlation among some feature groups and found that all groups were relevant for the taskwith the Historical Group as the most promising. Lastly, XGBoost obtained the best performance over AdaBoost andMultilayer-perceptron for the task.","PeriodicalId":301338,"journal":{"name":"J. Inf. Data Manag.","volume":"15 Suppl 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114861133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Carlos Abel Córdova Sáenz, Marcelo Dias, Karin Becker
{"title":"Assessing the combination of DistilBERT news representations and difusion topological features to classify fake news","authors":"Carlos Abel Córdova Sáenz, Marcelo Dias, Karin Becker","doi":"10.5753/jidm.2021.1895","DOIUrl":"https://doi.org/10.5753/jidm.2021.1895","url":null,"abstract":"Fake news (FN) have affected people’s lives in unimaginable ways. The automatic classification of FN is a vital tool to prevent their dissemination and support fact-checking. Related work has shown that FN spread faster, deeper, and more broadly than truthful news on social media. Deep learning has produced state-of-the-art solutions in this field, mainly based on textual attributes. In this paper, we propose to combine compact representations of the textual news properties generated using DistilBERT, with topological metrics extracted from their propagation network in social media. Using a dataset related to politics and distinct learning algorithms, we extensively assessed the components of the proposed solution. Regarding the textual attributes, we reached results comparable to stateof-the-art solutions using only the news title and contents, which is useful for FN early detection. We assessed the influential topological metrics, and the effect of their combination with the news textual features. We also explored the use of ensembles. Our results were very promising, revealing the potential of the features proposed and the adoption of ensembles.","PeriodicalId":301338,"journal":{"name":"J. Inf. Data Manag.","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116725123","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Denio Duarte, I. Puerari, Guilherme Dal Bianco, Julyane Felipette Lima
{"title":"Exploratory Analysis of Electronic Health Records using Topic Modeling","authors":"Denio Duarte, I. Puerari, Guilherme Dal Bianco, Julyane Felipette Lima","doi":"10.5753/jidm.2020.2024","DOIUrl":"https://doi.org/10.5753/jidm.2020.2024","url":null,"abstract":"The rapid growth of electronic health record (EHR) systems brings an increase in available information about patients in hospitals. This massive amount of text information presents an opportunity to extract unknown information about medical history, medication, diseases, allergies, among others. Extracting the main topics that represent the subjects covered by a text collection can give valuable insights. To this end, approaches for topic modeling have been used to tackle such problems as information discovery and topic extraction with thematic information. In this context, this work presents an exploratory analysis of a collection of electronic health records from an intensive care unit (ICU). The collection is split into two sub-collections: discharged patients and patients who progressed to death. We apply an LDA-based approach to discover the latent topics from the collections. The analyses show that some topics are more recurrent in the deceased patients (the death collection), like renal diseases, and others are more recurrent in the discharge collection, for example, diabetes. The results of the analyses can be useful for improving intensive care services since the topics can be a guide to understanding the patterns in discharge and death situations.","PeriodicalId":301338,"journal":{"name":"J. Inf. Data Manag.","volume":"172 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128352560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Frontmatter","authors":"Angelo Brayner, M. Holanda","doi":"10.1515/ijb-2021-frontmatter2","DOIUrl":"https://doi.org/10.1515/ijb-2021-frontmatter2","url":null,"abstract":"","PeriodicalId":301338,"journal":{"name":"J. Inf. Data Manag.","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126672241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Silva, Hermano Lustosa, Daniel Nascimento Ramos da Silva, F. Porto, P. Valduriez
{"title":"SAVIME: An Array DBMS for Simulation Analysis and ML Models Prediction","authors":"A. Silva, Hermano Lustosa, Daniel Nascimento Ramos da Silva, F. Porto, P. Valduriez","doi":"10.5753/JIDM.2020.2021","DOIUrl":"https://doi.org/10.5753/JIDM.2020.2021","url":null,"abstract":"Limitations in current DBMSs prevent their wide adoption in scientific applications. In order to make them benefit from DBMS support, enabling Declarative data analysis and visualization over scientific data, we present an in-memory array DBMS system called SAVIME. In this work we describe the system SAVIME, along with its data model. Our preliminary evaluation show how SAVIME, by using a simple storage definition language (SDL) can outperform the state-of-the-art array database system, SciDB, during the process of data ingestion. We also show that it is possible to use SAVIME as a storage alternative for a numerical solver without affecting its scalability, making it useful for modern ML based applications.","PeriodicalId":301338,"journal":{"name":"J. Inf. Data Manag.","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130046608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Mining Temporal Exception Rules from Multivariate Time Series Using a new Support Measure","authors":"Thábata Amaral, Elaine P. M. de Sousa","doi":"10.5753/jidm.2020.2020","DOIUrl":"https://doi.org/10.5753/jidm.2020.2020","url":null,"abstract":"Association rules are a common task to discover useful and comprehensive relationships among frequent and infrequent data. Frequent patterns describe a usual behavior, while infrequent ones represent uncommon knowledge. Our interest lies in finding exception rules, a class of infrequent patterns that may have critical effects as a consequence. Existing approaches for exception rules mining usually handle “itemsets databases”, where transactions are organized with no temporal information. However, temporality may be inherent to some real contexts and should be considered to improve the semantic quality of results. Moreover, these approaches implement a non-discriminatory support measure to estimate the relevance of an item, thus interpreting a large volume of data that may be merely occasional as patterns. Aiming to overcome these drawbacks, we propose TRiER (TempoRal Exception Ruler), an efficient method for mining temporal exception rules that not only discover exceptional behaviors and their causative agents, but also identifies how long consequences take to appear. We also present a new support measure to manipulate time series. This measure considers the context in which a pattern occurs, thus incorporating more semantics to the results obtained. We performed an extensive experimental analysis in real multivariate time series to verify the practical applicability of TRiER. Our results show TRiER has lower computational cost and is more scalable than existing approaches while finding a succinct and relevant set of patterns.","PeriodicalId":301338,"journal":{"name":"J. Inf. Data Manag.","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129522827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Francisco D. B. S. Praciano, Italo C. Abreu, Javam C. Machado
{"title":"An Experimental Analysis of the Use of Different Storage Technologies on a Relational DBMS","authors":"Francisco D. B. S. Praciano, Italo C. Abreu, Javam C. Machado","doi":"10.5753/JIDM.2020.1868","DOIUrl":"https://doi.org/10.5753/JIDM.2020.1868","url":null,"abstract":"Traditional Database Management Systems (DBMSs) are built with the premise that magnetic disks such as hard disks drives (HDDs) store the data. Recently, several alternatives to HDDs have emerged, such as the solid-state drives (SSDs) based on non-volatile memory (NVM) technology such as 3D XPoint and the new generations of dynamic random access memories (DRAMs). Different characteristics of these storage technologies may impact the performance of DBMSs. In this work, we analyze the performance of a DBMS using three storage technologies as data locations:HDD, SSD NVM, and DRAM, as well as a hybrid way combining all three. To do this, we use two workloads, analytical and transactional, and we observe throughput as well as latency. After, we discuss the reasons for the results obtained for each type of storage. We also show that the query processing can benefit from the different characteristics of each storage technology to perform faster queries and, finally, we analyze the benefits of using a hybrid storage system.","PeriodicalId":301338,"journal":{"name":"J. Inf. Data Manag.","volume":"123 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126019072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}