Tianshu Zhang, Ruidan Su, Anli Zhong, Minwei Fang, Yu-dong Zhang
{"title":"From Data to Deployment: A Comprehensive Analysis of Risks in Large Language Model Research and Development","authors":"Tianshu Zhang, Ruidan Su, Anli Zhong, Minwei Fang, Yu-dong Zhang","doi":"10.1049/ise2/7358963","DOIUrl":null,"url":null,"abstract":"<div>\n <p>Large language models (LLMs) have evolved significantly, achieving unprecedented linguistic capabilities that underpin a wide range of AI applications. However, they also pose risks and challenges such as ethical concerns, bias and computational sustainability. How to balance the high performance in revolutionising information processing with the risks they pose is critical to their future development. LLM is a type of NLP model and many of the LLM risks are also risks that NLP has experienced in the past. We, therefore, summarise these risks, focusing more on the underlying understanding of these risks/technical tools, rather than simply describing their occurrence in LLM. In this paper, we first discuss and compare the current state of research on the four main risks in the process of developing LLMs: data, system, pretraining and inference, and then, try to summarise the rationale, complexity, prospects and challenges of the key issues and challenges in each phase. Finally, this review concludes with a discussion of the fundamental issues that should be of most concern and risk and that should be addressed in the early stages of modelling research, including the correlated issues of privacy preservation and countering attacks and model robustness. Based on the LLM research and development (R&D) process perspective, this review summarises the actual risks and provides guidance for research directions, with the aim of helping researchers to identify these risk points and technology directions worth investigating, as well as helping to establish a safe and efficient R&D process.</p>\n </div>","PeriodicalId":50380,"journal":{"name":"IET Information Security","volume":"2025 1","pages":""},"PeriodicalIF":1.3000,"publicationDate":"2025-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ise2/7358963","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IET Information Security","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1049/ise2/7358963","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Large language models (LLMs) have evolved significantly, achieving unprecedented linguistic capabilities that underpin a wide range of AI applications. However, they also pose risks and challenges such as ethical concerns, bias and computational sustainability. How to balance the high performance in revolutionising information processing with the risks they pose is critical to their future development. LLM is a type of NLP model and many of the LLM risks are also risks that NLP has experienced in the past. We, therefore, summarise these risks, focusing more on the underlying understanding of these risks/technical tools, rather than simply describing their occurrence in LLM. In this paper, we first discuss and compare the current state of research on the four main risks in the process of developing LLMs: data, system, pretraining and inference, and then, try to summarise the rationale, complexity, prospects and challenges of the key issues and challenges in each phase. Finally, this review concludes with a discussion of the fundamental issues that should be of most concern and risk and that should be addressed in the early stages of modelling research, including the correlated issues of privacy preservation and countering attacks and model robustness. Based on the LLM research and development (R&D) process perspective, this review summarises the actual risks and provides guidance for research directions, with the aim of helping researchers to identify these risk points and technology directions worth investigating, as well as helping to establish a safe and efficient R&D process.
期刊介绍:
IET Information Security publishes original research papers in the following areas of information security and cryptography. Submitting authors should specify clearly in their covering statement the area into which their paper falls.
Scope:
Access Control and Database Security
Ad-Hoc Network Aspects
Anonymity and E-Voting
Authentication
Block Ciphers and Hash Functions
Blockchain, Bitcoin (Technical aspects only)
Broadcast Encryption and Traitor Tracing
Combinatorial Aspects
Covert Channels and Information Flow
Critical Infrastructures
Cryptanalysis
Dependability
Digital Rights Management
Digital Signature Schemes
Digital Steganography
Economic Aspects of Information Security
Elliptic Curve Cryptography and Number Theory
Embedded Systems Aspects
Embedded Systems Security and Forensics
Financial Cryptography
Firewall Security
Formal Methods and Security Verification
Human Aspects
Information Warfare and Survivability
Intrusion Detection
Java and XML Security
Key Distribution
Key Management
Malware
Multi-Party Computation and Threshold Cryptography
Peer-to-peer Security
PKIs
Public-Key and Hybrid Encryption
Quantum Cryptography
Risks of using Computers
Robust Networks
Secret Sharing
Secure Electronic Commerce
Software Obfuscation
Stream Ciphers
Trust Models
Watermarking and Fingerprinting
Special Issues. Current Call for Papers:
Security on Mobile and IoT devices - https://digital-library.theiet.org/files/IET_IFS_SMID_CFP.pdf