Shilei Tan , Yongcheng Zhou , Haoxiang Liu , Xuesong Wang , Si Chen , Wei Gong
{"title":"DeTinyLLM: Efficient detection of machine-generated text via compact paraphrase transformation","authors":"Shilei Tan , Yongcheng Zhou , Haoxiang Liu , Xuesong Wang , Si Chen , Wei Gong","doi":"10.1016/j.inffus.2025.103713","DOIUrl":null,"url":null,"abstract":"<div><div>The growing fusion of human-written and machine-generated text poses significant challenges in distinguishing their origins, as advanced large language models (LLMs) increasingly mimic human linguistic patterns. Existing detection methods, such as SimLLM, rely on querying proprietary LLMs for proofreading to measure similarity, which incurs high computational costs and instability due to dependency on fluctuating model updates. To address these limitations, we propose DeTinyLLM, a novel framework that leverages fusion-driven compact paraphrase models for efficient and stable detection. First, we train a lightweight transformation model (e.g., fine-tuned T5-large) to rewrite machine-generated text into human-like text, effectively “de-AI-ifying” it through iterative fusion of syntactic and semantic features. For detection, the input text and its rewritten version are fused and classified via a hybrid neural network, capitalizing on divergence patterns between human and machine text. Experiments across diverse datasets demonstrate that DeTinyLLM achieves state-of-the-art accuracy (surpassing SimLLM by 4.3 % in ROC-AUC) while reducing inference latency by 77.2 %. By eliminating reliance on proprietary LLMs and integrating multi-level fusion of linguistic signals, this work advances scalable, cost-effective solutions for real-world deployment in AI-generated text detection systems.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"127 ","pages":"Article 103713"},"PeriodicalIF":15.5000,"publicationDate":"2025-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Fusion","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1566253525007675","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
The growing fusion of human-written and machine-generated text poses significant challenges in distinguishing their origins, as advanced large language models (LLMs) increasingly mimic human linguistic patterns. Existing detection methods, such as SimLLM, rely on querying proprietary LLMs for proofreading to measure similarity, which incurs high computational costs and instability due to dependency on fluctuating model updates. To address these limitations, we propose DeTinyLLM, a novel framework that leverages fusion-driven compact paraphrase models for efficient and stable detection. First, we train a lightweight transformation model (e.g., fine-tuned T5-large) to rewrite machine-generated text into human-like text, effectively “de-AI-ifying” it through iterative fusion of syntactic and semantic features. For detection, the input text and its rewritten version are fused and classified via a hybrid neural network, capitalizing on divergence patterns between human and machine text. Experiments across diverse datasets demonstrate that DeTinyLLM achieves state-of-the-art accuracy (surpassing SimLLM by 4.3 % in ROC-AUC) while reducing inference latency by 77.2 %. By eliminating reliance on proprietary LLMs and integrating multi-level fusion of linguistic signals, this work advances scalable, cost-effective solutions for real-world deployment in AI-generated text detection systems.
期刊介绍:
Information Fusion serves as a central platform for showcasing advancements in multi-sensor, multi-source, multi-process information fusion, fostering collaboration among diverse disciplines driving its progress. It is the leading outlet for sharing research and development in this field, focusing on architectures, algorithms, and applications. Papers dealing with fundamental theoretical analyses as well as those demonstrating their application to real-world problems will be welcome.