{"title":"小组讨论提高了基于系统评价的定性数据的评级类别的可靠性和有效性。","authors":"Jutta Beher, Eric Treml, Brendan Wintle","doi":"10.1371/journal.pone.0326166","DOIUrl":null,"url":null,"abstract":"<p><p>The number of literature reviews in the fields of ecology and conservation has increased dramatically in recent years. Scientists conduct systematic literature reviews with the aim of drawing conclusions based on the content of a representative sample of publications. This requires subjective judgments on qualitative content, including interpretations and deductions. However, subjective judgments can differ substantially even between highly trained experts that are faced with the same evidence. Because classification of content into codes by one individual rater is prone to subjectivity and error, general guidelines recommend checking the produced data for consistency and reliability. Metrics on agreement between multiple people exist to assess the rate of agreement (consistency). These metrics do not account for mistakes or allow for their correction, while group discussions about codes that have been derived from classification of qualitative data have shown to improve reliability and accuracy. Here, we describe a pragmatic approach to reliability testing that gives insights into the error rate of multiple raters. Five independent raters rated and discussed categories for 23 variables within 21 peer-reviewed publications on conservation management plans. Mistakes, including overlooking information in the text, were the most common source of disagreement, followed by differences in interpretation and ambiguity around categories. Discussions could resolve most differences in ratings. We recommend our approach as a significant improvement on current review and synthesis approaches that lack assessment of misclassification.</p>","PeriodicalId":20189,"journal":{"name":"PLoS ONE","volume":"20 6","pages":"e0326166"},"PeriodicalIF":2.6000,"publicationDate":"2025-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12176165/pdf/","citationCount":"0","resultStr":"{\"title\":\"Group discussions improve reliability and validity of rated categories based on qualitative data from systematic review.\",\"authors\":\"Jutta Beher, Eric Treml, Brendan Wintle\",\"doi\":\"10.1371/journal.pone.0326166\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>The number of literature reviews in the fields of ecology and conservation has increased dramatically in recent years. Scientists conduct systematic literature reviews with the aim of drawing conclusions based on the content of a representative sample of publications. This requires subjective judgments on qualitative content, including interpretations and deductions. However, subjective judgments can differ substantially even between highly trained experts that are faced with the same evidence. Because classification of content into codes by one individual rater is prone to subjectivity and error, general guidelines recommend checking the produced data for consistency and reliability. Metrics on agreement between multiple people exist to assess the rate of agreement (consistency). These metrics do not account for mistakes or allow for their correction, while group discussions about codes that have been derived from classification of qualitative data have shown to improve reliability and accuracy. Here, we describe a pragmatic approach to reliability testing that gives insights into the error rate of multiple raters. Five independent raters rated and discussed categories for 23 variables within 21 peer-reviewed publications on conservation management plans. Mistakes, including overlooking information in the text, were the most common source of disagreement, followed by differences in interpretation and ambiguity around categories. Discussions could resolve most differences in ratings. We recommend our approach as a significant improvement on current review and synthesis approaches that lack assessment of misclassification.</p>\",\"PeriodicalId\":20189,\"journal\":{\"name\":\"PLoS ONE\",\"volume\":\"20 6\",\"pages\":\"e0326166\"},\"PeriodicalIF\":2.6000,\"publicationDate\":\"2025-06-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12176165/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"PLoS ONE\",\"FirstCategoryId\":\"103\",\"ListUrlMain\":\"https://doi.org/10.1371/journal.pone.0326166\",\"RegionNum\":3,\"RegionCategory\":\"综合性期刊\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q1\",\"JCRName\":\"MULTIDISCIPLINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"PLoS ONE","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1371/journal.pone.0326166","RegionNum":3,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
Group discussions improve reliability and validity of rated categories based on qualitative data from systematic review.
The number of literature reviews in the fields of ecology and conservation has increased dramatically in recent years. Scientists conduct systematic literature reviews with the aim of drawing conclusions based on the content of a representative sample of publications. This requires subjective judgments on qualitative content, including interpretations and deductions. However, subjective judgments can differ substantially even between highly trained experts that are faced with the same evidence. Because classification of content into codes by one individual rater is prone to subjectivity and error, general guidelines recommend checking the produced data for consistency and reliability. Metrics on agreement between multiple people exist to assess the rate of agreement (consistency). These metrics do not account for mistakes or allow for their correction, while group discussions about codes that have been derived from classification of qualitative data have shown to improve reliability and accuracy. Here, we describe a pragmatic approach to reliability testing that gives insights into the error rate of multiple raters. Five independent raters rated and discussed categories for 23 variables within 21 peer-reviewed publications on conservation management plans. Mistakes, including overlooking information in the text, were the most common source of disagreement, followed by differences in interpretation and ambiguity around categories. Discussions could resolve most differences in ratings. We recommend our approach as a significant improvement on current review and synthesis approaches that lack assessment of misclassification.
期刊介绍:
PLOS ONE is an international, peer-reviewed, open-access, online publication. PLOS ONE welcomes reports on primary research from any scientific discipline. It provides:
* Open-access—freely accessible online, authors retain copyright
* Fast publication times
* Peer review by expert, practicing researchers
* Post-publication tools to indicate quality and impact
* Community-based dialogue on articles
* Worldwide media coverage