Towards an understanding and explanation for mixed-initiative artificial scientific text detection

IF 1.8 4区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Information Visualization Pub Date : 2024-04-08 DOI:10.1177/14738716241240156

Luoxuan Weng, Shi Liu, Hang Zhu, Jiashun Sun, Wong Kam-Kwai, Dongming Han, Minfeng Zhu, Wei Chen

{"title":"Towards an understanding and explanation for mixed-initiative artificial scientific text detection","authors":"Luoxuan Weng, Shi Liu, Hang Zhu, Jiashun Sun, Wong Kam-Kwai, Dongming Han, Minfeng Zhu, Wei Chen","doi":"10.1177/14738716241240156","DOIUrl":null,"url":null,"abstract":"Large language models (LLMs) have gained popularity in various fields for their exceptional capability of generating human-like text. Their potential misuse has raised social concerns about plagiarism in academic contexts. However, effective artificial scientific text detection is a non-trivial task due to several challenges, including (1) the lack of a clear understanding of the differences between machine-generated and human-written scientific text, (2) the poor generalization performance of existing methods caused by out-of-distribution issues, and (3) the limited support for human-machine collaboration with sufficient interpretability during the detection process. In this paper, we first identify the critical distinctions between machine-generated and human-written scientific text through a quantitative experiment. Then, we propose a mixed-initiative workflow that combines human experts’ prior knowledge with machine intelligence, along with a visual analytics system to facilitate efficient and trustworthy scientific text detection. Finally, we demonstrate the effectiveness of our approach through two case studies and a controlled user study. We also provide design implications for interactive artificial text detection tools in high-stakes decision-making scenarios.","PeriodicalId":50360,"journal":{"name":"Information Visualization","volume":"1 1","pages":""},"PeriodicalIF":1.8000,"publicationDate":"2024-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Visualization","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1177/14738716241240156","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

Large language models (LLMs) have gained popularity in various fields for their exceptional capability of generating human-like text. Their potential misuse has raised social concerns about plagiarism in academic contexts. However, effective artificial scientific text detection is a non-trivial task due to several challenges, including (1) the lack of a clear understanding of the differences between machine-generated and human-written scientific text, (2) the poor generalization performance of existing methods caused by out-of-distribution issues, and (3) the limited support for human-machine collaboration with sufficient interpretability during the detection process. In this paper, we first identify the critical distinctions between machine-generated and human-written scientific text through a quantitative experiment. Then, we propose a mixed-initiative workflow that combines human experts’ prior knowledge with machine intelligence, along with a visual analytics system to facilitate efficient and trustworthy scientific text detection. Finally, we demonstrate the effectiveness of our approach through two case studies and a controlled user study. We also provide design implications for interactive artificial text detection tools in high-stakes decision-making scenarios.

查看原文本刊更多论文

理解和解释混合倡议人工科学文本检测

大语言模型（LLM）因其生成类人文本的卓越能力而在各个领域广受欢迎。其潜在的滥用已引起社会对学术剽窃的关注。然而，有效的人工科学文本检测并不是一件容易的事，这其中有几个挑战，包括：(1) 对机器生成的科学文本和人类撰写的科学文本之间的差异缺乏清晰的认识；(2) 现有方法的泛化性能较差，这是由分布外问题造成的；(3) 在检测过程中，对具有充分可解释性的人机协作的支持有限。在本文中，我们首先通过定量实验确定了机器生成的科学文本与人类撰写的科学文本之间的关键区别。然后，我们提出了一种混合倡议工作流程，将人类专家的先验知识与机器智能相结合，再加上可视化分析系统，以促进高效、可信的科学文本检测。最后，我们通过两个案例研究和一项对照用户研究证明了我们方法的有效性。我们还为高风险决策场景中的交互式人工文本检测工具提供了设计意义。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Information Visualization COMPUTER SCIENCE, SOFTWARE ENGINEERING-

CiteScore

5.40

自引率

0.00%

发文量

审稿时长

>12 weeks

期刊介绍： Information Visualization is essential reading for researchers and practitioners of information visualization and is of interest to computer scientists and data analysts working on related specialisms. This journal is an international, peer-reviewed journal publishing articles on fundamental research and applications of information visualization. The journal acts as a dedicated forum for the theories, methodologies, techniques and evaluations of information visualization and its applications. The journal is a core vehicle for developing a generic research agenda for the field by identifying and developing the unique and significant aspects of information visualization. Emphasis is placed on interdisciplinary material and on the close connection between theory and practice. This journal is a member of the Committee on Publication Ethics (COPE).