{"title":"A survey on chatbots and large language models: Testing and evaluation techniques","authors":"Sonali Uttam Singh, Akbar Siami Namin","doi":"10.1016/j.nlp.2025.100128","DOIUrl":null,"url":null,"abstract":"<div><div>Chatbots have been quite developed in the recent decades and evolved along with the field of Artificial Intelligence (AI), enabling powerful capabilities in tasks such as text generation and summarization, sentiment analysis, and many other interesting Natural Language Processing (NLP) based tasks. Advancements in language models (LMs), specifically LLMs, have played an important role in improving the capabilities of chatbots. This survey paper provides a comprehensive overview in chatbot with the integration of LLMs, primarily focusing on the testing, evaluation and performance techniques and frameworks associated with it. The paper discusses the foundational concepts of chatbots and their evolution, highlights the challenges and opportunities they present by reviewing the state-of-the-art papers associated with the chatbots design, testing and evaluation. The survey also delves into the key components of chatbot systems, including Natural Language Understanding (NLU), dialogue management, and Natural Language Generation (NLG), and examine how LLMs have influenced each of these components. Furthermore, the survey examines the ethical considerations and limitations associated with LLMs. The paper primarily focuses on investigating the evaluation techniques and metrics used to assess the performance and effectiveness of these language models. This paper aims to provide an overview of chatbots and highlights the need for an appropriate framework in regards to testing and evaluating these chatbots and the LLMs associated with it in order to provide efficient and proper knowledge to user and potentially improve its quality based on advancements in the field of machine learning.</div></div>","PeriodicalId":100944,"journal":{"name":"Natural Language Processing Journal","volume":"10 ","pages":"Article 100128"},"PeriodicalIF":0.0000,"publicationDate":"2025-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Natural Language Processing Journal","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2949719125000044","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Chatbots have been quite developed in the recent decades and evolved along with the field of Artificial Intelligence (AI), enabling powerful capabilities in tasks such as text generation and summarization, sentiment analysis, and many other interesting Natural Language Processing (NLP) based tasks. Advancements in language models (LMs), specifically LLMs, have played an important role in improving the capabilities of chatbots. This survey paper provides a comprehensive overview in chatbot with the integration of LLMs, primarily focusing on the testing, evaluation and performance techniques and frameworks associated with it. The paper discusses the foundational concepts of chatbots and their evolution, highlights the challenges and opportunities they present by reviewing the state-of-the-art papers associated with the chatbots design, testing and evaluation. The survey also delves into the key components of chatbot systems, including Natural Language Understanding (NLU), dialogue management, and Natural Language Generation (NLG), and examine how LLMs have influenced each of these components. Furthermore, the survey examines the ethical considerations and limitations associated with LLMs. The paper primarily focuses on investigating the evaluation techniques and metrics used to assess the performance and effectiveness of these language models. This paper aims to provide an overview of chatbots and highlights the need for an appropriate framework in regards to testing and evaluating these chatbots and the LLMs associated with it in order to provide efficient and proper knowledge to user and potentially improve its quality based on advancements in the field of machine learning.