Laurence-Olivier M. Foisy, Étienne Proulx, Hubert Cadieux, Jérémy Gilbert, Jozef Rivest, Alexandre Bouillon, Yannick Dufresne
{"title":"提示机器:介绍一种面向社会科学家的LLM数据提取方法","authors":"Laurence-Olivier M. Foisy, Étienne Proulx, Hubert Cadieux, Jérémy Gilbert, Jozef Rivest, Alexandre Bouillon, Yannick Dufresne","doi":"10.1177/08944393251344865","DOIUrl":null,"url":null,"abstract":"This research note addresses a methodological gap in the study of large language models (LLMs) in social sciences: the absence of standardized data extraction procedures. While existing research has examined biases and the reliability of LLM-generated content, the establishment of transparent extraction protocols necessarily precedes substantive analysis. The paper introduces a replicable procedural framework for extracting structured political data from LLMs via API, designed to enhance transparency, accessibility, and reproducibility. Canadian federal and Quebec provincial politicians serve as an illustrative case to demonstrate the extraction methodology, encompassing prompt engineering, output processing, and error handling mechanisms. The procedure facilitates systematic data collection across multiple LLM versions, enabling inter-model comparisons while addressing extraction challenges such as response variability and malformed outputs. The contribution is primarily methodological—providing researchers with a foundational extraction protocol adaptable to diverse research contexts. This standardized approach constitutes an essential preliminary step for subsequent evaluation of LLM-generated content, establishing procedural clarity in this methodologically developing research domain.","PeriodicalId":49509,"journal":{"name":"Social Science Computer Review","volume":"151 1","pages":""},"PeriodicalIF":3.0000,"publicationDate":"2025-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Prompting the Machine: Introducing an LLM Data Extraction Method for Social Scientists\",\"authors\":\"Laurence-Olivier M. Foisy, Étienne Proulx, Hubert Cadieux, Jérémy Gilbert, Jozef Rivest, Alexandre Bouillon, Yannick Dufresne\",\"doi\":\"10.1177/08944393251344865\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This research note addresses a methodological gap in the study of large language models (LLMs) in social sciences: the absence of standardized data extraction procedures. While existing research has examined biases and the reliability of LLM-generated content, the establishment of transparent extraction protocols necessarily precedes substantive analysis. The paper introduces a replicable procedural framework for extracting structured political data from LLMs via API, designed to enhance transparency, accessibility, and reproducibility. Canadian federal and Quebec provincial politicians serve as an illustrative case to demonstrate the extraction methodology, encompassing prompt engineering, output processing, and error handling mechanisms. The procedure facilitates systematic data collection across multiple LLM versions, enabling inter-model comparisons while addressing extraction challenges such as response variability and malformed outputs. The contribution is primarily methodological—providing researchers with a foundational extraction protocol adaptable to diverse research contexts. This standardized approach constitutes an essential preliminary step for subsequent evaluation of LLM-generated content, establishing procedural clarity in this methodologically developing research domain.\",\"PeriodicalId\":49509,\"journal\":{\"name\":\"Social Science Computer Review\",\"volume\":\"151 1\",\"pages\":\"\"},\"PeriodicalIF\":3.0000,\"publicationDate\":\"2025-05-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Social Science Computer Review\",\"FirstCategoryId\":\"90\",\"ListUrlMain\":\"https://doi.org/10.1177/08944393251344865\",\"RegionNum\":2,\"RegionCategory\":\"社会学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Social Science Computer Review","FirstCategoryId":"90","ListUrlMain":"https://doi.org/10.1177/08944393251344865","RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
Prompting the Machine: Introducing an LLM Data Extraction Method for Social Scientists
This research note addresses a methodological gap in the study of large language models (LLMs) in social sciences: the absence of standardized data extraction procedures. While existing research has examined biases and the reliability of LLM-generated content, the establishment of transparent extraction protocols necessarily precedes substantive analysis. The paper introduces a replicable procedural framework for extracting structured political data from LLMs via API, designed to enhance transparency, accessibility, and reproducibility. Canadian federal and Quebec provincial politicians serve as an illustrative case to demonstrate the extraction methodology, encompassing prompt engineering, output processing, and error handling mechanisms. The procedure facilitates systematic data collection across multiple LLM versions, enabling inter-model comparisons while addressing extraction challenges such as response variability and malformed outputs. The contribution is primarily methodological—providing researchers with a foundational extraction protocol adaptable to diverse research contexts. This standardized approach constitutes an essential preliminary step for subsequent evaluation of LLM-generated content, establishing procedural clarity in this methodologically developing research domain.
期刊介绍:
Unique Scope Social Science Computer Review is an interdisciplinary journal covering social science instructional and research applications of computing, as well as societal impacts of informational technology. Topics included: artificial intelligence, business, computational social science theory, computer-assisted survey research, computer-based qualitative analysis, computer simulation, economic modeling, electronic modeling, electronic publishing, geographic information systems, instrumentation and research tools, public administration, social impacts of computing and telecommunications, software evaluation, world-wide web resources for social scientists. Interdisciplinary Nature Because the Uses and impacts of computing are interdisciplinary, so is Social Science Computer Review. The journal is of direct relevance to scholars and scientists in a wide variety of disciplines. In its pages you''ll find work in the following areas: sociology, anthropology, political science, economics, psychology, computer literacy, computer applications, and methodology.