Jeffrey L. Jensen, Daniel Karell, Cole Tanigawa-Lau, Nizar Habash, Mai Oudah, Dhia Fairus Shofia Fani
{"title":"社会学研究中的语言模型:在大型行政数据分类和宗教信仰测量中的应用","authors":"Jeffrey L. Jensen, Daniel Karell, Cole Tanigawa-Lau, Nizar Habash, Mai Oudah, Dhia Fairus Shofia Fani","doi":"10.1177/00811750211053370","DOIUrl":null,"url":null,"abstract":"Computational methods have become widespread in the social sciences, but probabilistic language models remain relatively underused. We introduce language models to a general social science readership. First, we offer an accessible explanation of language models, detailing how they estimate the probability of a piece of language, such as a word or sentence, on the basis of the linguistic context. Second, we apply language models in an illustrative analysis to demonstrate the mechanics of using these models in social science research. The example application uses language models to classify names in a large administrative database; the classifications are then used to measure a sociologically important phenomenon: the spatial variation of religiosity. This application highlights several advantages of language models, including their effectiveness in classifying text that contains variation around the base structures, as is often the case with localized naming conventions and dialects. We conclude by discussing language models’ potential to contribute to sociological research beyond classification through their ability to generate language.","PeriodicalId":48140,"journal":{"name":"Sociological Methodology","volume":"52 1","pages":"30 - 52"},"PeriodicalIF":2.4000,"publicationDate":"2021-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Language Models in Sociological Research: An Application to Classifying Large Administrative Data and Measuring Religiosity\",\"authors\":\"Jeffrey L. Jensen, Daniel Karell, Cole Tanigawa-Lau, Nizar Habash, Mai Oudah, Dhia Fairus Shofia Fani\",\"doi\":\"10.1177/00811750211053370\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Computational methods have become widespread in the social sciences, but probabilistic language models remain relatively underused. We introduce language models to a general social science readership. First, we offer an accessible explanation of language models, detailing how they estimate the probability of a piece of language, such as a word or sentence, on the basis of the linguistic context. Second, we apply language models in an illustrative analysis to demonstrate the mechanics of using these models in social science research. The example application uses language models to classify names in a large administrative database; the classifications are then used to measure a sociologically important phenomenon: the spatial variation of religiosity. This application highlights several advantages of language models, including their effectiveness in classifying text that contains variation around the base structures, as is often the case with localized naming conventions and dialects. We conclude by discussing language models’ potential to contribute to sociological research beyond classification through their ability to generate language.\",\"PeriodicalId\":48140,\"journal\":{\"name\":\"Sociological Methodology\",\"volume\":\"52 1\",\"pages\":\"30 - 52\"},\"PeriodicalIF\":2.4000,\"publicationDate\":\"2021-10-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Sociological Methodology\",\"FirstCategoryId\":\"90\",\"ListUrlMain\":\"https://doi.org/10.1177/00811750211053370\",\"RegionNum\":2,\"RegionCategory\":\"社会学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"SOCIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Sociological Methodology","FirstCategoryId":"90","ListUrlMain":"https://doi.org/10.1177/00811750211053370","RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"SOCIOLOGY","Score":null,"Total":0}
Language Models in Sociological Research: An Application to Classifying Large Administrative Data and Measuring Religiosity
Computational methods have become widespread in the social sciences, but probabilistic language models remain relatively underused. We introduce language models to a general social science readership. First, we offer an accessible explanation of language models, detailing how they estimate the probability of a piece of language, such as a word or sentence, on the basis of the linguistic context. Second, we apply language models in an illustrative analysis to demonstrate the mechanics of using these models in social science research. The example application uses language models to classify names in a large administrative database; the classifications are then used to measure a sociologically important phenomenon: the spatial variation of religiosity. This application highlights several advantages of language models, including their effectiveness in classifying text that contains variation around the base structures, as is often the case with localized naming conventions and dialects. We conclude by discussing language models’ potential to contribute to sociological research beyond classification through their ability to generate language.
期刊介绍:
Sociological Methodology is a compendium of new and sometimes controversial advances in social science methodology. Contributions come from diverse areas and have something useful -- and often surprising -- to say about a wide range of topics ranging from legal and ethical issues surrounding data collection to the methodology of theory construction. In short, Sociological Methodology holds something of value -- and an interesting mix of lively controversy, too -- for nearly everyone who participates in the enterprise of sociological research.