A Quadrilogy for (Big) Data Reliabilities

IF 3.7 1区文学 Q1 COMMUNICATION

Communication Methods and Measures Pub Date : 2021-07-03 DOI:10.1080/19312458.2020.1861592

K. Krippendorff

{"title":"A Quadrilogy for (Big) Data Reliabilities","authors":"K. Krippendorff","doi":"10.1080/19312458.2020.1861592","DOIUrl":null,"url":null,"abstract":"ABSTRACT This paper responds to the challenge of testing the reliabilities of really big data and proposes a quadrilogy of four measures of the reliability of data, applicable quite generally. These measures grew out of the recognition that crowd coded data contest big data scientists’ conviction that the social contexts and meanings of data become irrelevant in the face of their sheer volumes. Bigness has also challenged available inter–coder agreement coefficients and available software, which are either too restricted regarding the forms of data they accept or exceed computational limits when data become very large. In the course of tailoring Krippendorff’s alpha to very large data, the possibility emerged of dividing the concept of reliability into four separate kinds, serving different methodological aims in social research. They respectively assess the replicability of the process of generating data, the accuracy of generating data, the surrogacy of proposed theories, coders, formulas, or algorithms to serve as a substitute for human coders, and the decisiveness among several human judgements. Their mathematical relationships assure comparability. The paper develops this quadrilogy of agreement measures first for binary data, provides a link to software for computing it, but then extends it to nominal data – a first step towards further generalizations. It also proposes a computational path to estimate the confidence limits for each of these measures and the probabilities of accepting data as reliable when there is a chance of being below a tolerable level. It ends with a discussion of how to select reliability benchmarks appropriate for the quadrilogy of agreement measures.","PeriodicalId":47552,"journal":{"name":"Communication Methods and Measures","volume":"15 1","pages":"165 - 189"},"PeriodicalIF":3.7000,"publicationDate":"2021-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/19312458.2020.1861592","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Communication Methods and Measures","FirstCategoryId":"98","ListUrlMain":"https://doi.org/10.1080/19312458.2020.1861592","RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMMUNICATION","Score":null,"Total":0}

引用次数: 2

Abstract

ABSTRACT This paper responds to the challenge of testing the reliabilities of really big data and proposes a quadrilogy of four measures of the reliability of data, applicable quite generally. These measures grew out of the recognition that crowd coded data contest big data scientists’ conviction that the social contexts and meanings of data become irrelevant in the face of their sheer volumes. Bigness has also challenged available inter–coder agreement coefficients and available software, which are either too restricted regarding the forms of data they accept or exceed computational limits when data become very large. In the course of tailoring Krippendorff’s alpha to very large data, the possibility emerged of dividing the concept of reliability into four separate kinds, serving different methodological aims in social research. They respectively assess the replicability of the process of generating data, the accuracy of generating data, the surrogacy of proposed theories, coders, formulas, or algorithms to serve as a substitute for human coders, and the decisiveness among several human judgements. Their mathematical relationships assure comparability. The paper develops this quadrilogy of agreement measures first for binary data, provides a link to software for computing it, but then extends it to nominal data – a first step towards further generalizations. It also proposes a computational path to estimate the confidence limits for each of these measures and the probabilities of accepting data as reliable when there is a chance of being below a tolerable level. It ends with a discussion of how to select reliability benchmarks appropriate for the quadrilogy of agreement measures.

查看原文本刊更多论文

（大）数据可靠性的四边形

本文针对检验真正大数据可靠性的挑战，提出了一种具有广泛适用性的四种数据可靠性度量方法。这些措施源于这样一种认识，即大众编码数据挑战了大数据科学家的信念，即面对庞大的数据量，数据的社会背景和意义变得无关紧要。大还挑战了可用的编码间协议系数和可用的软件，它们要么在接受的数据形式方面过于限制，要么在数据变得非常大时超出了计算限制。在将Krippendorff的alpha用于非常大的数据的过程中，出现了将可靠性概念分为四种不同类型的可能性，以服务于社会研究中不同的方法目标。他们分别评估生成数据过程的可复制性、生成数据的准确性、所提出的理论、编码员、公式或算法作为人类编码员的替代品，以及几种人类判断之间的决定性。它们的数学关系保证了可比性。本文首先对二进制数据发展了这个四边形的一致性度量，提供了一个计算它的软件链接，然后将其扩展到标称数据-这是进一步推广的第一步。它还提出了一种计算路径来估计每一种测量的置信限，以及当有可能低于可容忍水平时接受数据为可靠的概率。最后讨论了如何选择适用于协议度量四方阵的可靠性基准。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Communication Methods and Measures COMMUNICATION-

CiteScore

21.10

自引率

1.80%

发文量

期刊介绍： Communication Methods and Measures aims to achieve several goals in the field of communication research. Firstly, it aims to bring attention to and showcase developments in both qualitative and quantitative research methodologies to communication scholars. This journal serves as a platform for researchers across the field to discuss and disseminate methodological tools and approaches. Additionally, Communication Methods and Measures seeks to improve research design and analysis practices by offering suggestions for improvement. It aims to introduce new methods of measurement that are valuable to communication scientists or enhance existing methods. The journal encourages submissions that focus on methods for enhancing research design and theory testing, employing both quantitative and qualitative approaches. Furthermore, the journal is open to articles devoted to exploring the epistemological aspects relevant to communication research methodologies. It welcomes well-written manuscripts that demonstrate the use of methods and articles that highlight the advantages of lesser-known or newer methods over those traditionally used in communication. In summary, Communication Methods and Measures strives to advance the field of communication research by showcasing and discussing innovative methodologies, improving research practices, and introducing new measurement methods.