{"title":"人工智能在词典学中的投资回报率","authors":"Erin McKean, Will Fitzgerald","doi":"10.1558/lexi.27569","DOIUrl":null,"url":null,"abstract":"Large Language Models (LLMs) are being used for many language-based tasks, including translation, summarization and paraphrasing, sentiment analysis, and for content-generation tasks, such as code generation, answering search queries in natural language, and to power chatbots in customer service and other domains. Since much modern lexicography is based on investigation and analysis of large-scale corpora analogous to the (much larger) corpora used to train LLMs, we hypothesize that LLMs could be used for typical lexicographic tasks. A commercially-available LLM API (OpenAI’s ChatGPT gpt-3.5-turbo) was used to complete typical lexicographic tasks, such as headword expansion, phrase and form finding, and creation of definitions and examples. The results showed that the output of this LLM is not up to the standard of human editorial work, requiring significant oversight because of errors and “hallucinations” (the tendency of LLMs to invent facts). In addition, the externalities of LLM use, including concerns about environmental impact and replication of bias, add to the overall cost.","PeriodicalId":515202,"journal":{"name":"Lexicography","volume":" 46","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"The ROI of AI in lexicography\",\"authors\":\"Erin McKean, Will Fitzgerald\",\"doi\":\"10.1558/lexi.27569\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Large Language Models (LLMs) are being used for many language-based tasks, including translation, summarization and paraphrasing, sentiment analysis, and for content-generation tasks, such as code generation, answering search queries in natural language, and to power chatbots in customer service and other domains. Since much modern lexicography is based on investigation and analysis of large-scale corpora analogous to the (much larger) corpora used to train LLMs, we hypothesize that LLMs could be used for typical lexicographic tasks. A commercially-available LLM API (OpenAI’s ChatGPT gpt-3.5-turbo) was used to complete typical lexicographic tasks, such as headword expansion, phrase and form finding, and creation of definitions and examples. The results showed that the output of this LLM is not up to the standard of human editorial work, requiring significant oversight because of errors and “hallucinations” (the tendency of LLMs to invent facts). In addition, the externalities of LLM use, including concerns about environmental impact and replication of bias, add to the overall cost.\",\"PeriodicalId\":515202,\"journal\":{\"name\":\"Lexicography\",\"volume\":\" 46\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-07-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Lexicography\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1558/lexi.27569\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Lexicography","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1558/lexi.27569","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Large Language Models (LLMs) are being used for many language-based tasks, including translation, summarization and paraphrasing, sentiment analysis, and for content-generation tasks, such as code generation, answering search queries in natural language, and to power chatbots in customer service and other domains. Since much modern lexicography is based on investigation and analysis of large-scale corpora analogous to the (much larger) corpora used to train LLMs, we hypothesize that LLMs could be used for typical lexicographic tasks. A commercially-available LLM API (OpenAI’s ChatGPT gpt-3.5-turbo) was used to complete typical lexicographic tasks, such as headword expansion, phrase and form finding, and creation of definitions and examples. The results showed that the output of this LLM is not up to the standard of human editorial work, requiring significant oversight because of errors and “hallucinations” (the tendency of LLMs to invent facts). In addition, the externalities of LLM use, including concerns about environmental impact and replication of bias, add to the overall cost.