{"title":"An approach to on-demand extension of multidimensional cubes in multi-model settings: Application to IoT-based agro-ecology","authors":"Sandro Bimonte , Fagnine Alassane Coulibaly , Stefano Rizzi","doi":"10.1016/j.datak.2023.102267","DOIUrl":"10.1016/j.datak.2023.102267","url":null,"abstract":"<div><p><span>Managing unstructured and heterogeneous data<span>, integrating them, and enabling their analysis are among the key challenges in data ecosystems, together with the need to accommodate a progressive growth in these systems by seamlessly supporting extensibility. This is particularly relevant for OLAP analyses on multidimensional cubes stored in data warehouses (DWs), which naturally span large portions of heterogeneous data, possibly relying on different data models (relational, document-based, graph-based). While the management of model heterogeneity in DWs, using for instance multi-model databases, has already been investigated, not much has been done to support extensibility. In a previous paper we have investigated a schema-on-read scenario aimed at granting the extensibility of multidimensional cubes by proposing an architecture to support it and discussing the main open issues associated. This paper takes a step further by presenting </span></span><em>xCube</em><span>, an approach to provide on-demand extensibility of multidimensional cubes in a supply-driven fashion. xCube lets users choose a multidimensional element to be extended, using additional data, possibly uploaded from a data lake. Then, the multidimensional schema is extended by considering the functional dependencies implied by these additional data, and the extended multidimensional schema is made available to users for OLAP analyses. After explaining our approach with reference to a motivating case study in agro-ecology, we propose a proof-of-concept implementation using AgensGraph and Mondrian.</span></p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"150 ","pages":"Article 102267"},"PeriodicalIF":2.5,"publicationDate":"2023-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139031847","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Martin Paczona , Heinrich C. Mayr , Guenter Prochart
{"title":"Increase development productivity by domain-specific conceptual modeling","authors":"Martin Paczona , Heinrich C. Mayr , Guenter Prochart","doi":"10.1016/j.datak.2023.102263","DOIUrl":"10.1016/j.datak.2023.102263","url":null,"abstract":"<div><p>This paper addresses the question of whether and how the development and use of a domain-specific modeling method (DSMM) can increase productivity in the development of technical systems in an industrial setting. This is because an essential prerequisite for DSMMs to become established in operational practice is that productivity increases can be achieved with them and qualitative benefits such as quality assurance, innovation potential, and the like can be exploited. After all, managers’ decisions are ultimately based on whether or not the use of a new method pays off. We illustrate our findings using the example of a DSMM development for the design and realization of electric vehicle testbeds, which we carried out as part of a cooperation project. This work sets the base for possible generalization into other automotive, mechatronic, and technical areas.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"150 ","pages":"Article 102263"},"PeriodicalIF":2.5,"publicationDate":"2023-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0169023X23001234/pdfft?md5=04e4fde34990bf78c3bd54b41b8496e0&pid=1-s2.0-S0169023X23001234-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138685877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improving speech emotion recognition by fusing self-supervised learning and spectral features via mixture of experts","authors":"Jonghwan Hyeon, Yung-Hwan Oh, Young-Jun Lee, Ho-Jin Choi","doi":"10.1016/j.datak.2023.102262","DOIUrl":"10.1016/j.datak.2023.102262","url":null,"abstract":"<div><p>Speech Emotion Recognition (SER) is an important area of research in speech processing that aims to identify and classify emotional states conveyed through speech signals. Recent studies have shown considerable performance in SER by exploiting deep contextualized speech representations from self-supervised learning (SSL) models. However, SSL models pre-trained on clean speech data may not perform well on emotional speech data due to the domain shift problem. To address this problem, this paper proposes a novel approach that simultaneously exploits an SSL model and a domain-agnostic spectral feature (SF) through the Mixture of Experts (MoE) technique. The proposed approach achieves the state-of-the-art performance on weighted accuracy compared to other methods in the IEMOCAP dataset. Moreover, this paper demonstrates the existence of the domain shift problem of SSL models in the SER task.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"150 ","pages":"Article 102262"},"PeriodicalIF":2.5,"publicationDate":"2023-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0169023X23001222/pdfft?md5=48b44d06659bb1ef2a62c484d7369d5b&pid=1-s2.0-S0169023X23001222-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138631035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Recognition algorithm for cross-texting in text chat conversations","authors":"Da-Young Lee, Hwan-Gue Cho","doi":"10.1016/j.datak.2023.102261","DOIUrl":"10.1016/j.datak.2023.102261","url":null,"abstract":"<div><p>As the development of the Internet and IT technology, short-text based communication is so popular compared with voice based one. Chat-based communication enables rapid, short and massive exchange of message with many people, creates new social problems. ‘Cross-texting’ is one of them. It refers to accidentally sending a text to an unintended person during the concurrent conversations with separated multiple people. Cross-texting would be a serious problem in languages where respectful expressions are required. As text-based communication is getting popular, it is a crucial work to prevent cross-texting by detecting it in advance in languages with honorifics expression such as Korean. In this paper, we proposed two methods detecting a cross-text using a deep learning model<span>. The first model is the formal feature vector, which models dialog by explicitly defining the politeness and completeness features. The second one is the grpah2vec based ChatGram-net model, which models the dialog based on the syllable occurrence relationship. To evaluate the detection performance, we suggest a generating method for cross-text datasets from a actual messenger corpus. In experiment we show that both proposed models detected cross-text effectively, and exceeded the performance of the baseline models.</span></p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"150 ","pages":"Article 102261"},"PeriodicalIF":2.5,"publicationDate":"2023-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138576764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tao Wu , Xiaolin You , Xingping Xian , Xiao Pu , Shaojie Qiao , Chao Wang
{"title":"Towards deep understanding of graph convolutional networks for relation extraction","authors":"Tao Wu , Xiaolin You , Xingping Xian , Xiao Pu , Shaojie Qiao , Chao Wang","doi":"10.1016/j.datak.2023.102265","DOIUrl":"10.1016/j.datak.2023.102265","url":null,"abstract":"<div><p><span><span>Relation extraction aims at identifying semantic relations between pairs of named entities from unstructured texts and is considered an essential prerequisite for many downstream tasks in </span>natural language processing (NLP). Owing to the ability in expressing complex relationships and </span>interdependency<span><span><span>, graph neural networks<span> (GNNs) have been gradually used to solve the relation extraction problem and have achieved state-of-the-art results. However, the designs of GNN-based relation extraction methods are mostly based on empirical intuition, heuristic, and experimental trial-and-error. A clear understanding of why and how GNNs perform well in relation extraction tasks is lacking. In this study, we investigate three well-known GNN-based relation extraction models, CGCN, AGGCN, and SGCN, and aim to understand the underlying mechanisms of the extractions. In particular, we provide a </span></span>visual analytic to reveal the dynamics of the models and provide insight into the function of intermediate </span>convolutional layers. We determine that entities, particularly subjects and objects in them, are more important features than other words for relation extraction tasks. With various masking strategies, the significance of entity type to relation extraction is recognized. Then, from the perspective of the model architecture, we find that graph structure modeling and aggregation mechanisms in GCN do not significantly affect the performance improvement of GCN-based relation extraction models. The above findings are of great significance in promoting the development of GNNs. Based on these findings, an engineering oriented MLP-based GNN relation extraction model is proposed to achieve a comparable performance and greater efficiency.</span></p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"149 ","pages":"Article 102265"},"PeriodicalIF":2.5,"publicationDate":"2023-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138546596","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Generating psychological analysis tables for children's drawings using deep learning","authors":"Moonyoung Lee , Youngho Kim , Young-Kuk Kim","doi":"10.1016/j.datak.2023.102266","DOIUrl":"10.1016/j.datak.2023.102266","url":null,"abstract":"<div><p>The usefulness of drawing-based psychological testing has been demonstrated in a variety of studies. By using the familiar medium of drawing, drawing-based psychological testing can be applied to a wide range of age groups and is particularly effective with children who have difficulty expressing themselves verbally. Drawing tests are usually implemented face-to-face, requiring specialized counseling staff, and can be time-consuming and expensive to apply to large numbers of children. These problems seem to be solved by applying highly developed artificial intelligence<span> techniques. If artificial intelligence (AI) can analyze children's drawings and perform psychological analysis, it will be possible to use it as a service and take tests online or through smartphones. There have been various attempts to automate the drawing of psychological tests by utilizing deep learning technology to process images. Previous studies using classification have been limited in their ability to extract structural information. In this paper, we analyze the House-Tree-Person Test (HTP), one of the drawing psychological tests widely used in clinical practice, by utilizing object detection technology that can extract more diverse information from images. In addition, we extend the existing research that has been limited to the extraction of relatively simple psychological features and generate a psychological analysis table based on the extracted features that can be used to assist experts in the process of psychological testing. Our research findings indicate that the object detection performance achieves a mean Average Precision (mAP) of approximately 92.6∼94.1 %, and the average accuracy of the psychological analysis table is 94.4 %.</span></p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"149 ","pages":"Article 102266"},"PeriodicalIF":2.5,"publicationDate":"2023-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138546528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Blockchain-based ontology driven reference framework for security risk management","authors":"Mubashar Iqbal , Aleksandr Kormiltsyn , Vimal Dwivedi , Raimundas Matulevičius","doi":"10.1016/j.datak.2023.102257","DOIUrl":"10.1016/j.datak.2023.102257","url":null,"abstract":"<div><p>Security risk management<span><span> (SRM) is crucial for protecting valuable assets from malicious harm. While blockchain technology has been proposed to mitigate security threats in traditional applications, it is not a perfect solution, and its security threats must be managed. This paper addresses the research problem of having no unified and formal knowledge models to support the SRM of traditional applications using blockchain and the SRM of blockchain-based applications. In accordance with this, we present a blockchain-based reference model (BbRM) and an ontology driven reference framework (OntReF) for the SRM of traditional and blockchain-based applications. The BbRM consolidates security threats of traditional and blockchain-based applications, structured following the SRM domain model and offers guidance for creating the OntReF using the domain model. OntReF is grounded on unified foundational ontology (UFO) and provides semantic interoperability and supporting the dynamic knowledge representation and </span>instantiation of information security knowledge for the SRM. Our evaluation approaches demonstrate that OntReF is practical to use.</span></p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"149 ","pages":"Article 102257"},"PeriodicalIF":2.5,"publicationDate":"2023-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138534223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rafael Gaspar de Sousa , Antonio Carlos Meira Neto , Marcelo Fantinato , Sarajane Marques Peres , Hajo Alexander Reijers
{"title":"Integrated detection and localization of concept drifts in process mining with batch and stream trace clustering support","authors":"Rafael Gaspar de Sousa , Antonio Carlos Meira Neto , Marcelo Fantinato , Sarajane Marques Peres , Hajo Alexander Reijers","doi":"10.1016/j.datak.2023.102253","DOIUrl":"10.1016/j.datak.2023.102253","url":null,"abstract":"<div><p><span>Process mining can help organizations by extracting knowledge from event logs. However, process mining techniques often assume business processes are stationary, while actual business processes are constantly subject to change because of the complexity of organizations and their external environment. Thus, addressing process changes over time – known as </span><em>concept drifts</em><span><span><span><span> – allows for a better understanding of process behavior and can provide a competitive edge for organizations, especially in an online data stream scenario. Current approaches to handling process concept drift focus primarily on detecting and locating concept drifts, often through an integrated, albeit offline, approach. However, part of these integrated approaches rely on complex </span>data structures<span> related to tree-based process models, usually discovered through algorithms whose results are influenced by specific heuristic rules. Moreover, most of the proposed approaches have not been tested on public true concept drift-labeled event logs commonly used as benchmark, making comparative analysis difficult. In this article, we propose an online approach to detect and localize concept drifts in an integrated way using batch and stream trace clustering support. In our approach, cluster models provide input information for both concept drift detection and </span></span>localization methods. Each cluster abstracts a behavior profile underlying the process and reveals </span>descriptive information about the discovered concept drifts. Experiments with benchmark synthetic event logs with different control-flow changes, as well as with real-world event logs, showed that our approach, when relying on the same clustering model, is competitive in relation to baselines concept drift detection method. In addition, the experiment showed our approach is able to correctly locate the concept drifts detected and allows the analysis of such concept drifts through different process behavior profiles.</span></p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"149 ","pages":"Article 102253"},"PeriodicalIF":2.5,"publicationDate":"2023-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138534225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DeepScraper: A complete and efficient tweet scraping method using authenticated multiprocessing","authors":"Jaebeom You , Kisung Lee , Hyuk-Yoon Kwon","doi":"10.1016/j.datak.2023.102260","DOIUrl":"10.1016/j.datak.2023.102260","url":null,"abstract":"<div><p>In this paper, we propose a scraping method for collecting tweets, which we call <em>DeepScraper</em><span>. DeepScraper provides the complete scraping for the entire tweets written by a certain group of users or them containing search keywords<span> with a fast speed. To improve the crawling speed of DeepScraper, we devise a multiprocessing architecture while providing authentication<span> to the multiple processes based on the simulation of the user access behavior to Twitter. This allows us to maximize the parallelism of crawling even in a single machine. Through extensive experiments, we show that DeepScraper can crawl the entire tweets of 99 users, which amounts to 5,798,052 tweets while Twitter standard API can crawl only 243,650 tweets of them due to the constraints of the number of tweets to scrape. In other words, DeepScraper could collect 23.7 times more tweets for the 99 users than the standard API. We also show the efficiency of DeepScraper. First, we show the effect of the authenticated multiprocessing by showing that it increases the crawling speed from 2.03</span></span></span><span><math><mo>∼</mo></math></span>10.57 times as the number of running processes increases from 2 to 32 compared to DeepScraper with a single process. Then, we compare the crawling speed of DeepScraper with the existing studies. The result shows that DeepScraper is compared to even Twitter standard APIs and Twitter4J while DeepScraper can scrape much more tweets than them. Furthermore, DeepScraper is much faster than Twitter Scrapy roughly 3.69 times while both can scrape the entire tweets for the target users or keywords.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"149 ","pages":"Article 102260"},"PeriodicalIF":2.5,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138534226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}