使用机器学习揭示概念的语义:从BERT文本分类器中提取的典型性度量与人类对类型典型性的判断相匹配的程度如何?

IF 2.7 2区社会学 Q1 SOCIOLOGY

Sociological Science Pub Date : 2023-03-03 DOI:10.15195/v10.a3

Gaël Le Mens, Balázs Kovács, Michael T. Hannan, Guillem Pros

{"title":"使用机器学习揭示概念的语义:从BERT文本分类器中提取的典型性度量与人类对类型典型性的判断相匹配的程度如何?","authors":"Gaël Le Mens, Balázs Kovács, Michael T. Hannan, Guillem Pros","doi":"10.15195/v10.a3","DOIUrl":null,"url":null,"abstract":"Social scientists have long been interested in understanding the extent to which the typicalities of an object in concepts relate to its valuations by social actors. Answering this question has proven to be challenging because precise measurement requires a feature-based description of objects. Yet, such descriptions are frequently unavailable. In this article, we introduce a method to measure typicality based on text data. Our approach involves training a deep-learning text classifier based on the BERT language representation and defining the typicality of an object in a concept in terms of the categorization probability produced by the trained classifier. Model training allows for the construction of a feature space adapted to the categorization task and of a mapping between feature combination and typicality that gives more weight to feature dimensions that matter more for categorization. We validate the approach by comparing the BERT-based typicality measure of book descriptions in literary genres with average human typicality ratings. The obtained correlation is higher than 0.85. Comparisons with other typicality measures used in prior research show that our BERT-based measure better reflects human typicality judgments.","PeriodicalId":22029,"journal":{"name":"Sociological Science","volume":"41 4","pages":""},"PeriodicalIF":2.7000,"publicationDate":"2023-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Using Machine Learning to Uncover the Semantics of Concepts: How Well Do Typicality Measures Extracted from a BERT Text Classifier Match Human Judgments of Genre Typicality?\",\"authors\":\"Gaël Le Mens, Balázs Kovács, Michael T. Hannan, Guillem Pros\",\"doi\":\"10.15195/v10.a3\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Social scientists have long been interested in understanding the extent to which the typicalities of an object in concepts relate to its valuations by social actors. Answering this question has proven to be challenging because precise measurement requires a feature-based description of objects. Yet, such descriptions are frequently unavailable. In this article, we introduce a method to measure typicality based on text data. Our approach involves training a deep-learning text classifier based on the BERT language representation and defining the typicality of an object in a concept in terms of the categorization probability produced by the trained classifier. Model training allows for the construction of a feature space adapted to the categorization task and of a mapping between feature combination and typicality that gives more weight to feature dimensions that matter more for categorization. We validate the approach by comparing the BERT-based typicality measure of book descriptions in literary genres with average human typicality ratings. The obtained correlation is higher than 0.85. Comparisons with other typicality measures used in prior research show that our BERT-based measure better reflects human typicality judgments.\",\"PeriodicalId\":22029,\"journal\":{\"name\":\"Sociological Science\",\"volume\":\"41 4\",\"pages\":\"\"},\"PeriodicalIF\":2.7000,\"publicationDate\":\"2023-03-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Sociological Science\",\"FirstCategoryId\":\"90\",\"ListUrlMain\":\"https://doi.org/10.15195/v10.a3\",\"RegionNum\":2,\"RegionCategory\":\"社会学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"SOCIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Sociological Science","FirstCategoryId":"90","ListUrlMain":"https://doi.org/10.15195/v10.a3","RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"SOCIOLOGY","Score":null,"Total":0}

引用次数: 5

摘要

长期以来，社会科学家一直对了解概念中对象的典型性在多大程度上与社会行为者对其的评价有关感兴趣。事实证明，回答这个问题具有挑战性，因为精确的测量需要对物体进行基于特征的描述。然而，这样的描述经常是不可用的。本文介绍了一种基于文本数据的典型度量方法。我们的方法包括训练一个基于BERT语言表示的深度学习文本分类器，并根据训练的分类器产生的分类概率来定义概念中对象的典型性。模型训练允许构建适应分类任务的特征空间，以及特征组合和典型性之间的映射，从而赋予对分类更重要的特征维度更多权重。我们通过比较基于bert的文学类型书籍描述的典型性度量与平均人类典型性评级来验证该方法。所得相关性大于0.85。与先前研究中使用的其他典型性测量方法的比较表明，我们基于bert的测量方法更好地反映了人类的典型性判断。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Using Machine Learning to Uncover the Semantics of Concepts: How Well Do Typicality Measures Extracted from a BERT Text Classifier Match Human Judgments of Genre Typicality?

Social scientists have long been interested in understanding the extent to which the typicalities of an object in concepts relate to its valuations by social actors. Answering this question has proven to be challenging because precise measurement requires a feature-based description of objects. Yet, such descriptions are frequently unavailable. In this article, we introduce a method to measure typicality based on text data. Our approach involves training a deep-learning text classifier based on the BERT language representation and defining the typicality of an object in a concept in terms of the categorization probability produced by the trained classifier. Model training allows for the construction of a feature space adapted to the categorization task and of a mapping between feature combination and typicality that gives more weight to feature dimensions that matter more for categorization. We validate the approach by comparing the BERT-based typicality measure of book descriptions in literary genres with average human typicality ratings. The obtained correlation is higher than 0.85. Comparisons with other typicality measures used in prior research show that our BERT-based measure better reflects human typicality judgments.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Sociological Science Social Sciences-Social Sciences (all)

CiteScore

4.90

自引率

2.90%

发文量

审稿时长

6 weeks

期刊介绍： Sociological Science is an open-access, online, peer-reviewed, international journal for social scientists committed to advancing a general understanding of social processes. Sociological Science welcomes original research and commentary from all subfields of sociology, and does not privilege any particular theoretical or methodological approach.