{"title":"The Limitations and Ethical Considerations of ChatGPT","authors":"Shangying Hua, Shuangci Jin, Shengyi Jiang","doi":"10.1162/dint_a_00243","DOIUrl":"https://doi.org/10.1162/dint_a_00243","url":null,"abstract":"\u0000 With the advancements of artificial intelligence technology, ChatGPT, a new practice of artificial intelligence, holds immense potential across multiple fields. Its user-friendly human-machine interface, rapid response capabilities, and delivery of high-quality answers have attracted considerable attention and widespread usage. Regarded by many as a groundbreaking advancement in AI, ChatGPT represents a new milestone in the field. However, as with any technological evolution, the emergence of ChatGPT brings not only benefits, but also inevitable security risks and ethical issues. This paper provides specific information about ChatGPT, including its technology, limitations, ethical issues, governance paths and future directions. Specifically, we firstly offered a thorough exploration of the technical implementation details of GPT series models. Next, we provided an intricate analysis elucidating the reasons for limitations and scrutinized the consequential impacts, such as malicious misuse, privacy violation, and so on. Finally, we explore diverse governance paths to mitigate the impacts of ChatGPT and present future directions. This review aims to equip users with crucial knowledge, facilitating well-informed decision-making, effectively handling of potential challenges in employing ChatGPT, and staying abreast with the rapidly evolving landscape of this technology.","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":"45 12","pages":""},"PeriodicalIF":3.9,"publicationDate":"2023-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138949820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"BIKAS: Bio-Inspired Knowledge Acquisition and Simulacrum—A Knowledge Database to Support Multifunctional Design Concept Generation","authors":"Pavan Tejaswi Velivela, Yaoyao Fiona Zhao","doi":"10.1162/dint_a_00240","DOIUrl":"https://doi.org/10.1162/dint_a_00240","url":null,"abstract":"A detailed acquisition, analysis, and representation of biological systems exhibiting different functions is required to develop unique bio-inspired multifunctional conceptual designs and methods. This paper presents BIKAS: Bio-inspired Knowledge Acquisition and Simulacrum, a knowledge database of biological systems exhibiting various functionalities, developed based on case-based bio-inspired examples from literature. The knowledge database represents the biological features, their characteristics, and the function exhibited by the biological feature as a combination of its integrated structure and structural strategy. Furthermore, this knowledge database is utilized by the Expandable Domain Integrated Design (xDID) model that works on classifying, mapping, and representing biological features into their respective geometric designations called Domains. The combination of features from the Domains results in the generation of multifunctional conceptual designs. In addition, Meta-level design factors are proposed to aid designers in filtering the biological features and their respective functions having a similar structural strategy, thus aiding designers in rapidly selecting and emulating biological functions.","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":"257 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2023-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139173222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wei Bao, Jian Dong, Yang Xu, Yuanyuan Yang, Xiaoke Qi
{"title":"Exploring Attentive Siamese LSTM for Low-Resource Text Plagiarism Detection","authors":"Wei Bao, Jian Dong, Yang Xu, Yuanyuan Yang, Xiaoke Qi","doi":"10.1162/dint_a_00242","DOIUrl":"https://doi.org/10.1162/dint_a_00242","url":null,"abstract":"Low-resource text plagiarism detection faces a significant challenge due to the limited availability of labeled data for training. This task requires the development of sophisticated algorithms capable of identifying similarities and differences in texts, particularly in the realm of semantic rewriting and translation-based plagiarism detection. In this paper, we present an enhanced attentive Siamese Long Short-Term Memory (LSTM) network designed for Tibetan-Chinese plagiarism detection. Our approach begins with the introduction of translation-based data augmentation, aimed at expanding the bilingual training dataset. Subsequently, we propose a pre-detection method leveraging abstract document vectors to enhance detection efficiency. Finally, we introduce an improved attentive Siamese LSTM network tailored for Tibetan-Chinese plagiarism detection. We conduct comprehensive experiments to showcase the effectiveness of our proposed plagiarism detection framework.","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":"49 ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2023-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139174828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Rule Mining Trends from 1987 to 2022: A Bibliometric Analysis and Visualization","authors":"Shiqi Zhou, Sheng Bi, Guilin Qi","doi":"10.1162/dint_a_00239","DOIUrl":"https://doi.org/10.1162/dint_a_00239","url":null,"abstract":"\u0000 Rule mining has emerged as a crucial technique in data mining and knowledge discovery, enabling the extraction of valuable insights and patterns from vast datasets. This has garnered significant attention from both academic and industrial communities. However, there is a lack of bibliometric and visualization research on rule mining, leading to an unclear delineation of research topics and trends in the field. To fill this gap, this paper provides a comprehensive and up-to-date bibliometric analysis of rule mining, covering 4524 publications published between 1987 and 2022. Using various metrics and visualization techniques, we examine the patterns, trends, and evolution of rule mining. The results show a sustained growth in rule mining research, with a significant increase in publication output in recent years, and its rapid expansion into new areas such as explainable artificial intelligence and privacy protection. While the majority of publications come from Asia, the National Natural Science Foundation of China emerges as the top funding agency in the field. We also identify highly productive authors and significant members of co-authorship networks, as well as the most influential publications and citation bursts. The need for international collaboration and the integration of diverse research perspectives is highlighted. Despite the progress in rule mining, several challenges still require further research, including scalability and efficiency, explainability, network security and privacy protection, and personalized and user-centered design. Overall, this paper provides a valuable roadmap for researchers, policymakers, and practitioners interested in rule-mining research.","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":" 12","pages":""},"PeriodicalIF":3.9,"publicationDate":"2023-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138963544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Classification and quantification of timestamp data quality issues and its impact on data quality outcome","authors":"Rex Ambe","doi":"10.1162/dint_a_00238","DOIUrl":"https://doi.org/10.1162/dint_a_00238","url":null,"abstract":"\u0000 Timestamps play a key role in process mining because it determines the chronology of which events occurred and subsequently how they are ordered in process modelling. The timestamp in process mining gives an insight on process performance, conformance, and modelling. This therefore means problems with the timestamp will result in misrepresentations of the mined process. A few articles have been published on the quantification of data quality problems but just one of the articles at the time of this paper is based on the quantification of timestamp quality problems. This article evaluates the quality of timestamps in event log across two axes using eleven quality dimensions and four levels of potential data quality problems. The eleven data quality dimensions were obtained by doing a thorough literature review of more than fifty process mining articles which focus on quality dimensions. This evaluation resulted in twelve data quality quantification metrics and the metrics were applied to the MIMIC-III dataset as an illustration. The outcome of the timestamp quality quantification using the proposed typology enabled the user to appreciate the quality of the event log and thus makes it possible to evaluate the risk of carrying out specific data cleaning measures to improve the process mining outcome.","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":"123 2","pages":""},"PeriodicalIF":3.9,"publicationDate":"2023-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138995057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Comparison of Parallel Genetic Algorithm and Particle Swarm Optimization for Parameter Calibration in Hydrological Simulation","authors":"Xinyu Zhang, Yang Li, Genshen Chu","doi":"10.1162/dint_a_00221","DOIUrl":"https://doi.org/10.1162/dint_a_00221","url":null,"abstract":"Parameter calibration is an important part of hydrological simulation and affects the final simulation results. In this paper, we introduce heuristic optimization algorithms, genetic algorithm (GA) to cope with the complexity of the parameter calibration problem, and use particle swarm optimization algorithm (PSO) as a comparison. For large scale hydrological simulations, we use a multilevel parallel parameter calibration framework to make full use of processor resources, accelerate the process of solving high-dimensional parameter calibration. Further, we test and apply the experiments on domestic supercomputers. The results of parameter calibration with GA and PSO can basically reach the ideal value of 0.65 and above, with PSO achieving a speedup of 58.52 on TianHe-2 supercomputer. The experimental results indicate that by using a parallel implementation on multicore CPUs, high-dimensional parameter calibration in large scale hydrological simulation is possible. Moreover, our comparison of the two algorithms shows that the GA obtains better calibration results, and the PSO has a more pronounced acceleration effect.","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":" 23","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135192134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
César H. Bernabé, Lieze Thielemans, Rajaram Kaliyaperumal, Claudio Carta, Shuxin Zhang, Celia W.G. van Gelder, Nirupama Benis, Luiz Olavo Bonino da Silva Santos, Ronald Cornet, Bruna dos Santos Vieira, Nawel Lalout, Ines Henriques, Alberto Cámara Ballesteros, Kees Burger, Martijn G. Kersloot, Friederike Ehrhart, Esther van Enckevort, Chris T. Evelo, Alasdair J. G. Gray, Marc Hanauer, Kristina Hettne, Joep de Ligt, Arnaldo Pereira, Núria Queralt-Rosinach, Erik Schultes, Domenica Taruscio, Andra Waagmeester, Mark D. Wilkinson, Egon L. Willighagen, Mascha Jansen, Barend Mons, Marco Roos, Annika Jacobsen
{"title":"Building expertise on FAIR through evolving Bring Your Own Data (BYOD) workshops: describing the data, software, and management- focused approaches and their evolution","authors":"César H. Bernabé, Lieze Thielemans, Rajaram Kaliyaperumal, Claudio Carta, Shuxin Zhang, Celia W.G. van Gelder, Nirupama Benis, Luiz Olavo Bonino da Silva Santos, Ronald Cornet, Bruna dos Santos Vieira, Nawel Lalout, Ines Henriques, Alberto Cámara Ballesteros, Kees Burger, Martijn G. Kersloot, Friederike Ehrhart, Esther van Enckevort, Chris T. Evelo, Alasdair J. G. Gray, Marc Hanauer, Kristina Hettne, Joep de Ligt, Arnaldo Pereira, Núria Queralt-Rosinach, Erik Schultes, Domenica Taruscio, Andra Waagmeester, Mark D. Wilkinson, Egon L. Willighagen, Mascha Jansen, Barend Mons, Marco Roos, Annika Jacobsen","doi":"10.1162/dint_a_00236","DOIUrl":"https://doi.org/10.1162/dint_a_00236","url":null,"abstract":"Abstract Since 2014, “Bring Your Own Data” workshops (BYODs) have been organised to inform people about the process and benefits of making resources Findable, Accessible, Interoperable, and Reusable (FAIR, and the FAIRification process). The BYOD workshops’ content and format differ depending on their goal, context, and the background and needs of participants. Data-focused BYODs educate domain experts on how to make their data FAIR to find new answers to research questions. Management-focused BYODs promote the benefits of making data FAIR and instruct project managers and policy-makers on the characteristics of FAIRification projects. Software-focused BYODs gather software developers and experts on FAIR to implement or improve software resources that are used to support FAIRification. Overall, these BYODs intend to foster collaboration between different types of stakeholders involved in data management, curation, and reuse (e.g. domain experts, trainers, developers, data owners, data analysts, FAIR experts). The BYODs also serve as an opportunity to learn what kind of support for FAIRification is needed from different communities and to develop teaching materials based on practical examples and experience. In this paper, we detail the three different structures of the BYODs and describe examples of early BYODs related to plant breeding data, and rare disease registries and biobanks, which have shaped the structure of the workshops. We discuss the latest insights into making BYODs more productive by leveraging our almost ten years of training experience in these workshops, including successes and encountered challenges. Finally, we examine how the participants’ feedback has motivated the research on FAIR, including the development of workflows and software.","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":"50 21","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135432718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ChatGPT is a Remarkable Tool—For Experts","authors":"Amos Azaria, Rina Azoulay, Shulamit Reches","doi":"10.1162/dint_a_00235","DOIUrl":"https://doi.org/10.1162/dint_a_00235","url":null,"abstract":"Abstract This paper investigates the capabilities of ChatGPT as an automated assistant in diverse domains, including scientific writing, mathematics, education, programming, and healthcare. We explore the potential of ChatGPT to enhance productivity, streamline problem-solving processes, and improve writing style. Furthermore, we highlight the potential risks associated with excessive reliance on ChatGPT in these fields. These limitations encompass factors like incorrect and fictitious responses, inaccuracies in code, limited logical reasoning abilities, overconfidence, and critical ethical concerns of copyright and privacy violation. We outline areas and objectives where ChatGPT proves beneficial, applications where it should be used judiciously, and scenarios where its reliability may be limited. In light of observed limitations, and given that the tool's fundamental errors may pose a special challenge for non-experts, ChatGPT should be used with a strategic methodology. By drawing from comprehensive experimental studies, we offer methods and flowcharts for effectively using ChatGPT. Our recommendations emphasize iterative interaction with ChatGPT and independent verification of its outputs. Considering the importance of utilizing ChatGPT judiciously and with expertise, we recommend its usage for experts who are well-versed in the respective domains.","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":"50 23","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135432717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Henrique Santos, Ke Shen, Alice M. Mulvehill, Mayank Kejriwal, Deborah L. McGuinness
{"title":"A Theoretically Grounded Question Answering Data Set for Evaluating Machine Common Sense","authors":"Henrique Santos, Ke Shen, Alice M. Mulvehill, Mayank Kejriwal, Deborah L. McGuinness","doi":"10.1162/dint_a_00234","DOIUrl":"https://doi.org/10.1162/dint_a_00234","url":null,"abstract":"ABSTRACT Achieving machine common sense has been a longstanding problem within Artificial Intelligence. Thus far, benchmark data sets that are grounded in a theory of common sense and can be used to conduct rigorous, semantic evaluations of common sense reasoning (CSR) systems have been lacking. One expectation of the AI community is that neuro-symbolic reasoners can help bridge this gap towards more dependable systems with common sense. We propose a novel benchmark, called Theoretically Grounded common sense Reasoning (TG-CSR), modeled as a set of question answering instances, with each instance grounded in a semantic category of common sense, such as space, time, and emotions. The benchmark is few-shot i.e., only a few training and validation examples are provided in the public release to avoid the possibility of overfitting. Results from recent evaluations suggest that TG-CSR is challenging even for state-of-the-art statistical models. Due to its semantic rigor, this benchmark can be used to evaluate the common sense reasoning capabilities of neuro-symbolic systems.","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":"14 8","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135431467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improving Extraction of Chinese Open Relations Using Pre-trained Language Model and Knowledge Enhancement","authors":"Chaojie Wen, Xudong Jia, Tao Chen","doi":"10.1162/dint_a_00227","DOIUrl":"https://doi.org/10.1162/dint_a_00227","url":null,"abstract":"Abstract Open Relation Extraction (ORE) is a task of extracting semantic relations from a text document. Current ORE systems have significantly improved their efficiency in obtaining Chinese relations, when compared with conventional systems which heavily depend on feature engineering or syntactic parsing. However, the ORE systems do not use robust neural networks such as pre-trained language models to take advantage of large-scale unstructured data effectively. In respons to this issue, a new system entitled Chinese Open Relation Extraction with Knowledge Enhancement (CORE-KE) is presented in this paper. The CORE-KE system employs a pre-trained language model (with the support of a Bidirectional Long Short-Term Memory (BiLSTM) layer and a Masked Conditional Random Field (Masked CRF) layer) on unstructured data in order to improve Chinese open relation extraction. Entity descriptions in Wikidata and additional knowledge (in terms of triple facts) extracted from Chinese ORE datasets are used to fine-tune the pre-trained language model. In addition, syntactic features are further adopted in the training stage of the CORE-KE system for knowledge enhancement. Experimental results of the CORE-KE system on two large-scale datasets of open Chinese entities and relations demonstrate that the CORE-KE system is superior to other ORE systems. The F1-scores of the CORE-KE system on the two datasets have given a relative improvement of 20.1% and 1.3%, when compared with benchmark ORE systems, respectively. The source code is available at https://github.com/cjwen15/CORE-KE.","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":"50 7","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135432728","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}