From Data to Deployment: A Comprehensive Analysis of Risks in Large Language Model Research and Development

IF 2.6 4区计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

IET Information Security Pub Date : 2025-06-23 DOI:10.1049/ise2/7358963

Tianshu Zhang, Ruidan Su, Anli Zhong, Minwei Fang, Yu-dong Zhang

{"title":"From Data to Deployment: A Comprehensive Analysis of Risks in Large Language Model Research and Development","authors":"Tianshu Zhang, Ruidan Su, Anli Zhong, Minwei Fang, Yu-dong Zhang","doi":"10.1049/ise2/7358963","DOIUrl":null,"url":null,"abstract":"<p>Large language models (LLMs) have evolved significantly, achieving unprecedented linguistic capabilities that underpin a wide range of AI applications. However, they also pose risks and challenges such as ethical concerns, bias and computational sustainability. How to balance the high performance in revolutionising information processing with the risks they pose is critical to their future development. LLM is a type of NLP model and many of the LLM risks are also risks that NLP has experienced in the past. We, therefore, summarise these risks, focusing more on the underlying understanding of these risks/technical tools, rather than simply describing their occurrence in LLM. In this paper, we first discuss and compare the current state of research on the four main risks in the process of developing LLMs: data, system, pretraining and inference, and then, try to summarise the rationale, complexity, prospects and challenges of the key issues and challenges in each phase. Finally, this review concludes with a discussion of the fundamental issues that should be of most concern and risk and that should be addressed in the early stages of modelling research, including the correlated issues of privacy preservation and countering attacks and model robustness. Based on the LLM research and development (R&D) process perspective, this review summarises the actual risks and provides guidance for research directions, with the aim of helping researchers to identify these risk points and technology directions worth investigating, as well as helping to establish a safe and efficient R&D process.</p>","PeriodicalId":50380,"journal":{"name":"IET Information Security","volume":"2025 1","pages":""},"PeriodicalIF":2.6000,"publicationDate":"2025-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ise2/7358963","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IET Information Security","FirstCategoryId":"94","ListUrlMain":"https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/ise2/7358963","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Large language models (LLMs) have evolved significantly, achieving unprecedented linguistic capabilities that underpin a wide range of AI applications. However, they also pose risks and challenges such as ethical concerns, bias and computational sustainability. How to balance the high performance in revolutionising information processing with the risks they pose is critical to their future development. LLM is a type of NLP model and many of the LLM risks are also risks that NLP has experienced in the past. We, therefore, summarise these risks, focusing more on the underlying understanding of these risks/technical tools, rather than simply describing their occurrence in LLM. In this paper, we first discuss and compare the current state of research on the four main risks in the process of developing LLMs: data, system, pretraining and inference, and then, try to summarise the rationale, complexity, prospects and challenges of the key issues and challenges in each phase. Finally, this review concludes with a discussion of the fundamental issues that should be of most concern and risk and that should be addressed in the early stages of modelling research, including the correlated issues of privacy preservation and countering attacks and model robustness. Based on the LLM research and development (R&D) process perspective, this review summarises the actual risks and provides guidance for research directions, with the aim of helping researchers to identify these risk points and technology directions worth investigating, as well as helping to establish a safe and efficient R&D process.

Abstract Image

查看原文本刊更多论文

从数据到部署：大型语言模型研究与开发风险的综合分析

大型语言模型（llm）已经发生了重大变化，实现了前所未有的语言能力，为广泛的人工智能应用奠定了基础。然而，它们也带来了伦理问题、偏见和计算可持续性等风险和挑战。如何在革命性的信息处理的高性能与它们所带来的风险之间取得平衡，对它们的未来发展至关重要。LLM是NLP模型的一种，LLM的许多风险也是NLP过去经历过的风险。因此，我们总结了这些风险，更多地关注对这些风险/技术工具的潜在理解，而不是简单地描述它们在法学硕士中的发生。本文首先对法学硕士开发过程中的数据、系统、预训练和推理四个主要风险的研究现状进行了讨论和比较，然后试图总结每个阶段的关键问题和挑战的理论基础、复杂性、前景和挑战。最后，本综述总结了应该最关注和风险的基本问题，以及应该在建模研究的早期阶段解决的问题，包括隐私保护和对抗攻击以及模型鲁棒性的相关问题。本文基于法学硕士研发过程的视角，总结了实际存在的风险，并对研究方向进行了指导，旨在帮助研究人员识别这些风险点和值得研究的技术方向，帮助建立安全高效的研发过程。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IET Information Security 工程技术-计算机：理论方法

CiteScore

3.80

自引率

7.10%

发文量

审稿时长

8.6 months

期刊介绍： IET Information Security publishes original research papers in the following areas of information security and cryptography. Submitting authors should specify clearly in their covering statement the area into which their paper falls. Scope: Access Control and Database Security Ad-Hoc Network Aspects Anonymity and E-Voting Authentication Block Ciphers and Hash Functions Blockchain, Bitcoin (Technical aspects only) Broadcast Encryption and Traitor Tracing Combinatorial Aspects Covert Channels and Information Flow Critical Infrastructures Cryptanalysis Dependability Digital Rights Management Digital Signature Schemes Digital Steganography Economic Aspects of Information Security Elliptic Curve Cryptography and Number Theory Embedded Systems Aspects Embedded Systems Security and Forensics Financial Cryptography Firewall Security Formal Methods and Security Verification Human Aspects Information Warfare and Survivability Intrusion Detection Java and XML Security Key Distribution Key Management Malware Multi-Party Computation and Threshold Cryptography Peer-to-peer Security PKIs Public-Key and Hybrid Encryption Quantum Cryptography Risks of using Computers Robust Networks Secret Sharing Secure Electronic Commerce Software Obfuscation Stream Ciphers Trust Models Watermarking and Fingerprinting Special Issues. Current Call for Papers: Security on Mobile and IoT devices - https://digital-library.theiet.org/files/IET_IFS_SMID_CFP.pdf