{"title":"及时改进还是微调?在计算社会科学任务中使用 LLM 的最佳实践","authors":"Anders Giovanni Møller, Luca Maria Aiello","doi":"arxiv-2408.01346","DOIUrl":null,"url":null,"abstract":"Large Language Models are expressive tools that enable complex tasks of text\nunderstanding within Computational Social Science. Their versatility, while\nbeneficial, poses a barrier for establishing standardized best practices within\nthe field. To bring clarity on the values of different strategies, we present\nan overview of the performance of modern LLM-based classification methods on a\nbenchmark of 23 social knowledge tasks. Our results point to three best\npractices: select models with larger vocabulary and pre-training corpora; avoid\nsimple zero-shot in favor of AI-enhanced prompting; fine-tune on task-specific\ndata, and consider more complex forms instruction-tuning on multiple datasets\nonly when only training data is more abundant.","PeriodicalId":501043,"journal":{"name":"arXiv - PHYS - Physics and Society","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Prompt Refinement or Fine-tuning? Best Practices for using LLMs in Computational Social Science Tasks\",\"authors\":\"Anders Giovanni Møller, Luca Maria Aiello\",\"doi\":\"arxiv-2408.01346\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Large Language Models are expressive tools that enable complex tasks of text\\nunderstanding within Computational Social Science. Their versatility, while\\nbeneficial, poses a barrier for establishing standardized best practices within\\nthe field. To bring clarity on the values of different strategies, we present\\nan overview of the performance of modern LLM-based classification methods on a\\nbenchmark of 23 social knowledge tasks. Our results point to three best\\npractices: select models with larger vocabulary and pre-training corpora; avoid\\nsimple zero-shot in favor of AI-enhanced prompting; fine-tune on task-specific\\ndata, and consider more complex forms instruction-tuning on multiple datasets\\nonly when only training data is more abundant.\",\"PeriodicalId\":501043,\"journal\":{\"name\":\"arXiv - PHYS - Physics and Society\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - PHYS - Physics and Society\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2408.01346\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - PHYS - Physics and Society","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.01346","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Prompt Refinement or Fine-tuning? Best Practices for using LLMs in Computational Social Science Tasks
Large Language Models are expressive tools that enable complex tasks of text
understanding within Computational Social Science. Their versatility, while
beneficial, poses a barrier for establishing standardized best practices within
the field. To bring clarity on the values of different strategies, we present
an overview of the performance of modern LLM-based classification methods on a
benchmark of 23 social knowledge tasks. Our results point to three best
practices: select models with larger vocabulary and pre-training corpora; avoid
simple zero-shot in favor of AI-enhanced prompting; fine-tune on task-specific
data, and consider more complex forms instruction-tuning on multiple datasets
only when only training data is more abundant.