对客观观察的堂吉诃德式追求

IF 0.3 4区 教育学 Q4 BIOLOGY
Douglas Allchin
{"title":"对客观观察的堂吉诃德式追求","authors":"Douglas Allchin","doi":"10.1525/abt.2023.85.2.122","DOIUrl":null,"url":null,"abstract":"What if the legendary character Don Quixote had been a scientist? Surely his quest would have been the noble pursuit of objectivity. Scientists endeavor to transcend mere opinion or individual interpretation. They strive for publicly confirmable facts. Accordingly, scientists appeal to empirical evidence, measurements, and observations—regarded as the bedrock for factual claims.Yet, at the same time, ordinary humans can be fallible observers. Their interpretations can be skewed by prior expectations or personal desires. Historians, philosophers, and sociologists of science thus now typically contend that observations are “theory laden”—easily reflecting the researchers’ assumptions. In the past, the ideal of science was expressed in the simple motto “I’ll believe it when I see it!” Now, some cynics contend, an honest scientist might admit the ironic converse: “I’ll see it when I believe it.”Are we inevitable puppets to our beliefs? To what degree are observations in science trustworthy? How else would we defend scientific claims? (How else would we resolve contentious facts in our society?) Most teachers, I think, endorse the conventional view—that scientists and their observations are inherently objective. And that this makes science privileged. Here I explore this revered view (this month’s “Sacred Bovine”). Ultimately, I maintain, we are not as perfect as in the quixotic image. Yet science has developed tools to accommodate our cognitive flaws and to rescue science’s claim to its much-vaunted objectivity.Objectivity is a hallmark principle of our justice system too. Think of the allegorical figure holding aloft the scales of justice, blindfolded and impartial. Courts need trustworthy evidence to decide whether someone is culpable or innocent. For example, they rely on witnesses.However, cognitive research has shown that observers’ perceptions can be shaped and reshaped by personal experience and prejudices. Memories are vulnerable to suggestion too. Eyewitness testimony is—counterintuitively perhaps—among the least reliable in a courtroom (see the provocative volume by Loftus et al., 2019). That is, witnesses are susceptible to observer bias. We might, therefore, turn to forensic science and physical evidence—fingerprints, blood, DNA—as more secure.But even here, observer bias can intrude. We know this because science has turned on itself, to investigate its own objectivity. Psychologists have tested forensic experts in historical crime scenarios. Their assessment of bullet and shoeprint evidence seemed pretty consistent. But when contextual information about a case was available, it could affect how they interpreted a crime scene, how they matched fingerprints, how they identified individuals from the DNA when a sample mixes DNA from multiple persons, how they interpreted bloodstain patterns, and how they assessed skin injuries, at least. Even what dog handlers believed about possible culprits could influence the behavior of their sniffer dogs (Colloff, 2018; Cooper & Meterko, 2019). What can be done to ensure justice?Managing observer bias is standard now in modern medical research. To prevent judgment about a patient’s condition being primed, the doctors are metaphorically blindfolded. They are not informed about who is receiving a new drug or treatment and who has been given an inert placebo. Bias is not possible, even unconsciously.Such practices emerged over a century ago. One landmark study was done by Adolf Bingel in 1912–1913 at the City General Hospital in Brunswick, Germany (Tröhler, 2011). For decades, diphtheria had been a major scourge across Europe. Serum therapy (recognized in the very first Nobel Prize in Physiology or Medicine in 1901) had certainly improved the situation. Bingel acknowledged its efficacy but questioned whether it worked because of a specific antitoxin in the serum. Might the serum itself—any serum—be equally effective? By this time, the notion of controls for experimental comparison was widely appreciated (Sacred Bovines, March 2020). So, Bingel established two groups. Some patients received the conventional “antitoxin” serum, and others ordinary horse serum. To avoid inadvertently biasing his sample, he methodically assigned every other admitted patient to the alternate group.Bingel was aware that given the controversial nature of his idea, the physicians’ preconceptions posed a special danger. He reminded his readers that it is “extraordinarily difficult to evaluate the influences of therapy on disease unless they are obvious, as for example, the success of a surgical operation or cure of syphilis with mercury or Salvarsan. The therapeutic optimist very easily sees improvement, and the skeptic sees nothing.” He thus wanted “to achieve an objective overall assessment,” rather than the doctor’s informal, possibly biased, “impressions.” So, “to make the trial as objective as possible,” he explained, “I have not relied on my own judgement alone but have sought the views of the [at least six] assistant physicians of the diphtheria ward, without informing them about the nature of the serum under test (namely the ordinary horse serum). Their judgement was thus completely without prejudice. I am keen to see my observations checked independently, and most warmly recommend this ‘blind’ method for the purpose” (Bingel, 1918, p. 288). Here, Bingel used the term still common today: blinding. That method gave stronger credence to Bingel’s contentious conclusion: the theoretical claims of the Nobel Prize winner were mistaken. Any serum was effective.Documenting specific instances of observer bias can be difficult. However, one can gauge the magnitude of the general problem by bulk comparison of blinded and non-blinded observations. One such analysis looked at clinical studies about a range of medical treatments, from heart conditions to wounds to psychiatric disorders (Hróbjartsson et al., 2013; Hróbjartsson et al., 2014). In the non-blinded studies—the ones open to observer bias—the conclusions were (on average) more dramatic. Probabilities of benefit were 36% higher. Effect sizes increased by 68%. Similar discrepancies were found even for lab studies on animal models (Bello et al., 2014). Overall, blinded studies seemed to yield more modest results. Even among clinical trials with large, randomized samples, unwanted observer bias can intrude and yield misleading findings.One might well imagine that observer bias would be limited to scientific studies where judgment is critical and where prior beliefs are strong. Not so. This method of comparing blinded and non-blinded studies has helped us probe that assumption (a further expression of this month’s Sacred Bovine—that one may assume by default that a scientist’s observations are immune to such influences).For example, do ants recognize nestmates (their genetic kin)? According to the theory of kin selection, the behavior of an individual should tend to benefit its closest genetic relatives. So, this apparently simple question of insect behavior has significant implications for understanding evolutionary biology. A standard way to measure such kin-oriented behavior is to observe ants from the same versus different colonies meeting, and to tally the various types of encounters between them. To what degree do they exhibit aggressive behavior toward kin (nestmates) or toward “others”? Even with the relevant behaviors clearly defined, those assessments can be subtle, it turns out. Identifying instances of “mandible flaring” or “recoil” from a tactile encounter, for example, requires some experimenter judgment. In one recent meta-analysis, investigators found 156 experiments of nestmate versus non-kin behavior (van Wilgenburg & Elgar, 2013). Of those, 53 met the criteria for analysis of observer bias. Fifteen of those used blinded behavioral analysis. As was the case in the clinical studies, the results of the non-blinded studies tended to provide stronger evidence for the predominant theory. First, “aggression among nestmates was three times more likely to be reported in blinded than non-blinded experiments.” Second, “the effect size—the differences between the level of aggression among nestmates and that among non-nestmates—in non-blind experiments was twice that of blind experiments.” Here, blinded experiments seem to have escaped bias from theoretical expectations.Another unlikely topic for observational error might be plant herbivory: namely, how much tree foliage do insects consume? One might envision a fairly straightforward task of sampling leaves and measuring the amount of loss—scan their surface area, weigh them, or count the proportion of leaves with damage. Or estimate defoliation visually, from photos of whole trees (and cross-check this method with some direct sampling). Simple measurements—manageable even by introductory students?This topic, too, has been examined for evidence of observer bias—based on 42 publications of insect herbivory in Brazil (Kozlov et al., 2014). Again, blinded and non-blinded studies were compared. The plant damage differed by a factor of five to ten, depending on the methods used. Non-blinded studies reported significantly more damage than blinded studies. That is, they matched the widespread assumption that such rates are very high in the tropics. In addition, studies that focused on only one or a few species (1–3) found twice as much damage as those studying 10 or more species. Thus, the researcher’s choice of individual species seems to have been a biasing factor. Perhaps one chooses a species because the damage is more noticeable (or “typical”) to the observer who is seeking to measure it? Or the species is more prevalent, enabling easy sampling. But the selected species apparently did not fairly represent all species, and this error has led to misleading claims about insect herbivory in the tropics in general.In a follow-up analysis (based on 125 publications), the same team identified other ways apparently insignificant choices seem to unconsciously bias such research: selection of study site; selection of timing (season and duration); and selection of individual branches or leaves to be sampled (Zvereva & Kozlov, 2019). Casual (technically, “haphazard”) sampling can open the way to observer bias. In addition, primary authors who participated in the sampling or measurement, or others who knew where the samples had originated, inevitably inflated the magnitude of the results. The reviewers concluded sadly, “Our ecological and environmental knowledge is considerably biased due to an unconscious tendency of researchers to lend support for their hypotheses and expectations, which generally leads to overestimation of the effects under study.” Blinding matters.These studies—of serum therapy, forensic analysis, clinical trials, ant behavior, and insect herbivory—document the widespread occurrence of unconscious observer bias in biology. Ironically, they equally indicate how blinding is effective in reducing its effects. Objectivity in science may be threatened by the infelicities of human observation, but it can also be salvaged by appropriate countermeasures. Accordingly, the custom of blinding—familiar to medical and psychological researchers for over a century now—is gradually informing more fields of science. (Note, too, its relevance to NGSS’s third Scientific and Engineering Practice: Planning and Carrying Out Investigations.)Observer bias is insidious, surely. Unconscious and easily hidden. It can severely threaten the quixotic ideal of objectivity in science. Yet turning a “blind eye” to such flaws only compounds the problem, allowing bias to fester at a yet deeper level. Fortunately, perhaps, while observer bias is unintentional, it can nonetheless be managed intentionally—through the strategy of blinding. In a society where facts are disputed, and allegations of prejudiced observations are rampant, such tools for reclaiming objectivity might well be more widely known—and perhaps fruitfully applied even by nonscientists.","PeriodicalId":50960,"journal":{"name":"American Biology Teacher","volume":"21 1","pages":"0"},"PeriodicalIF":0.3000,"publicationDate":"2023-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"The Quixotic Quest for Objectivity in Observation\",\"authors\":\"Douglas Allchin\",\"doi\":\"10.1525/abt.2023.85.2.122\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"What if the legendary character Don Quixote had been a scientist? Surely his quest would have been the noble pursuit of objectivity. Scientists endeavor to transcend mere opinion or individual interpretation. They strive for publicly confirmable facts. Accordingly, scientists appeal to empirical evidence, measurements, and observations—regarded as the bedrock for factual claims.Yet, at the same time, ordinary humans can be fallible observers. Their interpretations can be skewed by prior expectations or personal desires. Historians, philosophers, and sociologists of science thus now typically contend that observations are “theory laden”—easily reflecting the researchers’ assumptions. In the past, the ideal of science was expressed in the simple motto “I’ll believe it when I see it!” Now, some cynics contend, an honest scientist might admit the ironic converse: “I’ll see it when I believe it.”Are we inevitable puppets to our beliefs? To what degree are observations in science trustworthy? How else would we defend scientific claims? (How else would we resolve contentious facts in our society?) Most teachers, I think, endorse the conventional view—that scientists and their observations are inherently objective. And that this makes science privileged. Here I explore this revered view (this month’s “Sacred Bovine”). Ultimately, I maintain, we are not as perfect as in the quixotic image. Yet science has developed tools to accommodate our cognitive flaws and to rescue science’s claim to its much-vaunted objectivity.Objectivity is a hallmark principle of our justice system too. Think of the allegorical figure holding aloft the scales of justice, blindfolded and impartial. Courts need trustworthy evidence to decide whether someone is culpable or innocent. For example, they rely on witnesses.However, cognitive research has shown that observers’ perceptions can be shaped and reshaped by personal experience and prejudices. Memories are vulnerable to suggestion too. Eyewitness testimony is—counterintuitively perhaps—among the least reliable in a courtroom (see the provocative volume by Loftus et al., 2019). That is, witnesses are susceptible to observer bias. We might, therefore, turn to forensic science and physical evidence—fingerprints, blood, DNA—as more secure.But even here, observer bias can intrude. We know this because science has turned on itself, to investigate its own objectivity. Psychologists have tested forensic experts in historical crime scenarios. Their assessment of bullet and shoeprint evidence seemed pretty consistent. But when contextual information about a case was available, it could affect how they interpreted a crime scene, how they matched fingerprints, how they identified individuals from the DNA when a sample mixes DNA from multiple persons, how they interpreted bloodstain patterns, and how they assessed skin injuries, at least. Even what dog handlers believed about possible culprits could influence the behavior of their sniffer dogs (Colloff, 2018; Cooper & Meterko, 2019). What can be done to ensure justice?Managing observer bias is standard now in modern medical research. To prevent judgment about a patient’s condition being primed, the doctors are metaphorically blindfolded. They are not informed about who is receiving a new drug or treatment and who has been given an inert placebo. Bias is not possible, even unconsciously.Such practices emerged over a century ago. One landmark study was done by Adolf Bingel in 1912–1913 at the City General Hospital in Brunswick, Germany (Tröhler, 2011). For decades, diphtheria had been a major scourge across Europe. Serum therapy (recognized in the very first Nobel Prize in Physiology or Medicine in 1901) had certainly improved the situation. Bingel acknowledged its efficacy but questioned whether it worked because of a specific antitoxin in the serum. Might the serum itself—any serum—be equally effective? By this time, the notion of controls for experimental comparison was widely appreciated (Sacred Bovines, March 2020). So, Bingel established two groups. Some patients received the conventional “antitoxin” serum, and others ordinary horse serum. To avoid inadvertently biasing his sample, he methodically assigned every other admitted patient to the alternate group.Bingel was aware that given the controversial nature of his idea, the physicians’ preconceptions posed a special danger. He reminded his readers that it is “extraordinarily difficult to evaluate the influences of therapy on disease unless they are obvious, as for example, the success of a surgical operation or cure of syphilis with mercury or Salvarsan. The therapeutic optimist very easily sees improvement, and the skeptic sees nothing.” He thus wanted “to achieve an objective overall assessment,” rather than the doctor’s informal, possibly biased, “impressions.” So, “to make the trial as objective as possible,” he explained, “I have not relied on my own judgement alone but have sought the views of the [at least six] assistant physicians of the diphtheria ward, without informing them about the nature of the serum under test (namely the ordinary horse serum). Their judgement was thus completely without prejudice. I am keen to see my observations checked independently, and most warmly recommend this ‘blind’ method for the purpose” (Bingel, 1918, p. 288). Here, Bingel used the term still common today: blinding. That method gave stronger credence to Bingel’s contentious conclusion: the theoretical claims of the Nobel Prize winner were mistaken. Any serum was effective.Documenting specific instances of observer bias can be difficult. However, one can gauge the magnitude of the general problem by bulk comparison of blinded and non-blinded observations. One such analysis looked at clinical studies about a range of medical treatments, from heart conditions to wounds to psychiatric disorders (Hróbjartsson et al., 2013; Hróbjartsson et al., 2014). In the non-blinded studies—the ones open to observer bias—the conclusions were (on average) more dramatic. Probabilities of benefit were 36% higher. Effect sizes increased by 68%. Similar discrepancies were found even for lab studies on animal models (Bello et al., 2014). Overall, blinded studies seemed to yield more modest results. Even among clinical trials with large, randomized samples, unwanted observer bias can intrude and yield misleading findings.One might well imagine that observer bias would be limited to scientific studies where judgment is critical and where prior beliefs are strong. Not so. This method of comparing blinded and non-blinded studies has helped us probe that assumption (a further expression of this month’s Sacred Bovine—that one may assume by default that a scientist’s observations are immune to such influences).For example, do ants recognize nestmates (their genetic kin)? According to the theory of kin selection, the behavior of an individual should tend to benefit its closest genetic relatives. So, this apparently simple question of insect behavior has significant implications for understanding evolutionary biology. A standard way to measure such kin-oriented behavior is to observe ants from the same versus different colonies meeting, and to tally the various types of encounters between them. To what degree do they exhibit aggressive behavior toward kin (nestmates) or toward “others”? Even with the relevant behaviors clearly defined, those assessments can be subtle, it turns out. Identifying instances of “mandible flaring” or “recoil” from a tactile encounter, for example, requires some experimenter judgment. In one recent meta-analysis, investigators found 156 experiments of nestmate versus non-kin behavior (van Wilgenburg & Elgar, 2013). Of those, 53 met the criteria for analysis of observer bias. Fifteen of those used blinded behavioral analysis. As was the case in the clinical studies, the results of the non-blinded studies tended to provide stronger evidence for the predominant theory. First, “aggression among nestmates was three times more likely to be reported in blinded than non-blinded experiments.” Second, “the effect size—the differences between the level of aggression among nestmates and that among non-nestmates—in non-blind experiments was twice that of blind experiments.” Here, blinded experiments seem to have escaped bias from theoretical expectations.Another unlikely topic for observational error might be plant herbivory: namely, how much tree foliage do insects consume? One might envision a fairly straightforward task of sampling leaves and measuring the amount of loss—scan their surface area, weigh them, or count the proportion of leaves with damage. Or estimate defoliation visually, from photos of whole trees (and cross-check this method with some direct sampling). Simple measurements—manageable even by introductory students?This topic, too, has been examined for evidence of observer bias—based on 42 publications of insect herbivory in Brazil (Kozlov et al., 2014). Again, blinded and non-blinded studies were compared. The plant damage differed by a factor of five to ten, depending on the methods used. Non-blinded studies reported significantly more damage than blinded studies. That is, they matched the widespread assumption that such rates are very high in the tropics. In addition, studies that focused on only one or a few species (1–3) found twice as much damage as those studying 10 or more species. Thus, the researcher’s choice of individual species seems to have been a biasing factor. Perhaps one chooses a species because the damage is more noticeable (or “typical”) to the observer who is seeking to measure it? Or the species is more prevalent, enabling easy sampling. But the selected species apparently did not fairly represent all species, and this error has led to misleading claims about insect herbivory in the tropics in general.In a follow-up analysis (based on 125 publications), the same team identified other ways apparently insignificant choices seem to unconsciously bias such research: selection of study site; selection of timing (season and duration); and selection of individual branches or leaves to be sampled (Zvereva & Kozlov, 2019). Casual (technically, “haphazard”) sampling can open the way to observer bias. In addition, primary authors who participated in the sampling or measurement, or others who knew where the samples had originated, inevitably inflated the magnitude of the results. The reviewers concluded sadly, “Our ecological and environmental knowledge is considerably biased due to an unconscious tendency of researchers to lend support for their hypotheses and expectations, which generally leads to overestimation of the effects under study.” Blinding matters.These studies—of serum therapy, forensic analysis, clinical trials, ant behavior, and insect herbivory—document the widespread occurrence of unconscious observer bias in biology. Ironically, they equally indicate how blinding is effective in reducing its effects. Objectivity in science may be threatened by the infelicities of human observation, but it can also be salvaged by appropriate countermeasures. Accordingly, the custom of blinding—familiar to medical and psychological researchers for over a century now—is gradually informing more fields of science. (Note, too, its relevance to NGSS’s third Scientific and Engineering Practice: Planning and Carrying Out Investigations.)Observer bias is insidious, surely. Unconscious and easily hidden. It can severely threaten the quixotic ideal of objectivity in science. Yet turning a “blind eye” to such flaws only compounds the problem, allowing bias to fester at a yet deeper level. Fortunately, perhaps, while observer bias is unintentional, it can nonetheless be managed intentionally—through the strategy of blinding. In a society where facts are disputed, and allegations of prejudiced observations are rampant, such tools for reclaiming objectivity might well be more widely known—and perhaps fruitfully applied even by nonscientists.\",\"PeriodicalId\":50960,\"journal\":{\"name\":\"American Biology Teacher\",\"volume\":\"21 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.3000,\"publicationDate\":\"2023-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"American Biology Teacher\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1525/abt.2023.85.2.122\",\"RegionNum\":4,\"RegionCategory\":\"教育学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"American Biology Teacher","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1525/abt.2023.85.2.122","RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"BIOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

如果传奇人物堂吉诃德是个科学家会怎样?当然,他的追求是对客观的崇高追求。科学家努力超越单纯的意见或个人的解释。他们力求公开证实的事实。因此,科学家们求助于经验证据、测量和观察——它们被视为事实主张的基础。然而,与此同时,普通人也可能是容易犯错的观察者。他们的解释可能会被先前的期望或个人欲望所扭曲。因此,科学的历史学家、哲学家和社会学家现在通常认为,观察结果是“充满理论的”——很容易反映出研究人员的假设。过去,科学的理想是用一句简单的格言来表达的:“眼见为实!”现在,一些愤世嫉俗者认为,一个诚实的科学家可能会承认具有讽刺意味的相反:“当我相信它时,我会看到它。”我们是否不可避免地成为自己信念的傀儡?科学观察在多大程度上是可信的?不然我们怎么捍卫科学主张呢?(不然我们该如何解决社会中有争议的事实?)我认为,大多数教师都赞同传统观点,即科学家和他们的观察本质上是客观的。这使得科学享有特权。在这里,我探讨了这个受人尊敬的观点(本月的“圣牛”)。最终,我坚持认为,我们并不像堂吉诃德想象的那样完美。然而,科学已经开发出了一些工具来适应我们的认知缺陷,并挽救了科学对其大肆吹嘘的客观性的主张。客观也是我们司法制度的一个标志性原则。想想那个寓言人物高举正义的天平,蒙着眼睛,不偏不倚。法院需要可靠的证据来判定某人有罪还是无罪。例如,他们依靠证人。然而,认知研究表明,观察者的感知可以被个人经验和偏见塑造和重塑。记忆也容易受到暗示的影响。也许与直觉相反,目击证人的证词是法庭上最不可靠的证词之一(参见洛夫特斯等人2019年出版的挑衅性著作)。也就是说,目击者容易受到观察者偏见的影响。因此,我们可能会求助于法医科学和物证——指纹、血液、dna——因为它们更安全。但即使在这里,观察者的偏见也会侵入。我们知道这一点,因为科学已经开始研究自己的客观性。心理学家在历史犯罪场景中对法医专家进行了测试。他们对子弹和鞋印证据的评估似乎相当一致。但是,当案件的背景信息可用时,它可能会影响他们如何解释犯罪现场,如何匹配指纹,如何从DNA中识别来自多个人的DNA样本,如何解释血迹模式,以及如何评估皮肤损伤,至少。甚至训犬员对可能的罪魁祸首的看法也会影响他们的嗅探犬的行为(Colloff, 2018;Cooper & Meterko, 2019)。我们能做些什么来确保正义?在现代医学研究中,处理观察者偏见是标准的。为了防止对病人病情的判断被影射,医生们被蒙上了眼睛。他们不知道谁在接受新药或治疗,谁在服用无效的安慰剂。偏见是不可能的,即使是无意识的。这种做法出现在一个多世纪以前。Adolf Bingel于1912-1913年在德国不伦瑞克市总医院进行了一项具有里程碑意义的研究(Tröhler, 2011)。几十年来,白喉一直是欧洲的一大祸害。血清疗法(1901年获得第一届诺贝尔生理学或医学奖)无疑改善了这种状况。宾格尔承认它的功效,但质疑它是否起作用是因为血清中有一种特殊的抗毒素。血清本身——任何血清——是否同样有效?到这个时候,控制实验比较的概念得到了广泛的认可(Sacred Bovines, 2020年3月)。因此,宾格尔建立了两个小组。一些患者接受了常规的“抗毒素”血清,另一些接受了普通的马血清。为了避免无意中使样本产生偏差,他有条不紊地将所有其他住院患者分配到另一组。宾格尔意识到,鉴于他的想法具有争议性,医生们的先入之见构成了一种特殊的危险。他提醒他的读者,“评估治疗对疾病的影响是非常困难的,除非这些影响是显而易见的,例如,外科手术的成功,或者用汞或萨尔瓦桑治愈梅毒。”治疗乐观主义者很容易看到改善,而怀疑论者什么也看不到。”因此,他希望“获得客观的全面评估”,而不是医生非正式的、可能带有偏见的“印象”。 因此,“为了使试验尽可能客观,”他解释说,“我没有单独依靠自己的判断,而是寻求了白喉病房(至少六名)助理医生的意见,而没有告知他们被测试血清(即普通马血清)的性质。”因此,他们的判断完全没有偏见。我渴望看到我的观察得到独立的检验,并且最热烈地推荐这种‘盲’方法”(Bingel, 1918, p. 288)。在这里,宾格尔使用了今天仍然很常见的术语:致盲。这种方法使宾格尔有争议的结论更加可信:这位诺贝尔奖得主的理论主张是错误的。任何血清都有效。记录观察者偏见的具体实例可能很困难。然而,人们可以通过盲法和非盲法观察的大量比较来衡量一般问题的严重程度。其中一项分析着眼于一系列医学治疗的临床研究,从心脏病到伤口再到精神疾病(Hróbjartsson et al., 2013;Hróbjartsson et al., 2014)。在非盲法研究中,即存在观察者偏见的研究中,结论(平均而言)更加引人注目。受益的可能性提高了36%。效应量增加了68%。甚至在动物模型的实验室研究中也发现了类似的差异(Bello et al., 2014)。总的来说,盲法研究似乎得出了更温和的结果。即使在有大量随机样本的临床试验中,不必要的观察者偏见也可能侵入并产生误导性的结果。人们可以很好地想象,观察者偏见将仅限于科学研究,在科学研究中,判断是至关重要的,在科学研究中,先验信念是很强的。不是这样的。这种比较盲法和非盲法研究的方法帮助我们探索了这个假设(本月《神圣的牛》的进一步表达——人们可能默认认为科学家的观察不受这种影响)。例如,蚂蚁能认出它们的近亲(它们的基因近亲)吗?根据亲缘选择理论,个体的行为应该倾向于使其最亲近的遗传亲属受益。所以,这个关于昆虫行为的看似简单的问题对于理解进化生物学有着重要的意义。衡量这种亲缘取向行为的标准方法是观察来自同一群体和不同群体的蚂蚁相遇,并统计它们之间各种类型的相遇。他们对亲属(配偶)或“其他人”表现出多大程度的攻击行为?事实证明,即使对相关行为有了明确的定义,这些评估也可能是微妙的。例如,在触觉接触中识别“下颌张开”或“后坐力”的实例需要一些实验者的判断。在最近的一项荟荟性分析中,研究人员发现了156项关于配偶与非亲属行为的实验(van Wilgenburg & Elgar, 2013)。其中53例符合观察者偏倚分析标准。其中15个使用了盲法行为分析。与临床研究的情况一样,非盲法研究的结果倾向于为主导理论提供更有力的证据。首先,“在盲法实验中,同伴之间的攻击行为被报道的可能性是非盲法实验的三倍。”其次,“在非盲实验中,效应大小——即同巢动物和非同巢动物之间的攻击性水平差异——是盲实验的两倍。”在这里,盲法实验似乎逃脱了理论预期的偏见。另一个不太可能引起观测误差的话题可能是植物的食草性:也就是说,昆虫消耗了多少树叶?人们可能会设想一个相当简单的任务,即对叶子进行采样并测量损失量——扫描它们的表面积,称重,或计算受损叶子的比例。或者从整棵树的照片中直观地估计落叶(并与一些直接抽样交叉检查这种方法)。简单的测量——即使是入门的学生也能做到?基于巴西42篇关于昆虫食草性的出版物(Kozlov et al., 2014),也对这一主题进行了研究,以寻找观察者偏倚的证据。再一次,对盲法和非盲法研究进行了比较。根据使用的方法不同,对植物的伤害有5到10倍之差。非盲法研究报告的损害明显大于盲法研究。也就是说,它们符合普遍的假设,即热带地区的这一比率非常高。此外,只关注一个或几个物种(1-3)的研究发现,损害是研究10个或更多物种的两倍。因此,研究人员对单个物种的选择似乎是一个有偏见的因素。也许一个人选择一个物种是因为它的损害对试图测量它的观察者来说更明显(或“典型”)?或者该物种更为普遍,便于采样。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
The Quixotic Quest for Objectivity in Observation
What if the legendary character Don Quixote had been a scientist? Surely his quest would have been the noble pursuit of objectivity. Scientists endeavor to transcend mere opinion or individual interpretation. They strive for publicly confirmable facts. Accordingly, scientists appeal to empirical evidence, measurements, and observations—regarded as the bedrock for factual claims.Yet, at the same time, ordinary humans can be fallible observers. Their interpretations can be skewed by prior expectations or personal desires. Historians, philosophers, and sociologists of science thus now typically contend that observations are “theory laden”—easily reflecting the researchers’ assumptions. In the past, the ideal of science was expressed in the simple motto “I’ll believe it when I see it!” Now, some cynics contend, an honest scientist might admit the ironic converse: “I’ll see it when I believe it.”Are we inevitable puppets to our beliefs? To what degree are observations in science trustworthy? How else would we defend scientific claims? (How else would we resolve contentious facts in our society?) Most teachers, I think, endorse the conventional view—that scientists and their observations are inherently objective. And that this makes science privileged. Here I explore this revered view (this month’s “Sacred Bovine”). Ultimately, I maintain, we are not as perfect as in the quixotic image. Yet science has developed tools to accommodate our cognitive flaws and to rescue science’s claim to its much-vaunted objectivity.Objectivity is a hallmark principle of our justice system too. Think of the allegorical figure holding aloft the scales of justice, blindfolded and impartial. Courts need trustworthy evidence to decide whether someone is culpable or innocent. For example, they rely on witnesses.However, cognitive research has shown that observers’ perceptions can be shaped and reshaped by personal experience and prejudices. Memories are vulnerable to suggestion too. Eyewitness testimony is—counterintuitively perhaps—among the least reliable in a courtroom (see the provocative volume by Loftus et al., 2019). That is, witnesses are susceptible to observer bias. We might, therefore, turn to forensic science and physical evidence—fingerprints, blood, DNA—as more secure.But even here, observer bias can intrude. We know this because science has turned on itself, to investigate its own objectivity. Psychologists have tested forensic experts in historical crime scenarios. Their assessment of bullet and shoeprint evidence seemed pretty consistent. But when contextual information about a case was available, it could affect how they interpreted a crime scene, how they matched fingerprints, how they identified individuals from the DNA when a sample mixes DNA from multiple persons, how they interpreted bloodstain patterns, and how they assessed skin injuries, at least. Even what dog handlers believed about possible culprits could influence the behavior of their sniffer dogs (Colloff, 2018; Cooper & Meterko, 2019). What can be done to ensure justice?Managing observer bias is standard now in modern medical research. To prevent judgment about a patient’s condition being primed, the doctors are metaphorically blindfolded. They are not informed about who is receiving a new drug or treatment and who has been given an inert placebo. Bias is not possible, even unconsciously.Such practices emerged over a century ago. One landmark study was done by Adolf Bingel in 1912–1913 at the City General Hospital in Brunswick, Germany (Tröhler, 2011). For decades, diphtheria had been a major scourge across Europe. Serum therapy (recognized in the very first Nobel Prize in Physiology or Medicine in 1901) had certainly improved the situation. Bingel acknowledged its efficacy but questioned whether it worked because of a specific antitoxin in the serum. Might the serum itself—any serum—be equally effective? By this time, the notion of controls for experimental comparison was widely appreciated (Sacred Bovines, March 2020). So, Bingel established two groups. Some patients received the conventional “antitoxin” serum, and others ordinary horse serum. To avoid inadvertently biasing his sample, he methodically assigned every other admitted patient to the alternate group.Bingel was aware that given the controversial nature of his idea, the physicians’ preconceptions posed a special danger. He reminded his readers that it is “extraordinarily difficult to evaluate the influences of therapy on disease unless they are obvious, as for example, the success of a surgical operation or cure of syphilis with mercury or Salvarsan. The therapeutic optimist very easily sees improvement, and the skeptic sees nothing.” He thus wanted “to achieve an objective overall assessment,” rather than the doctor’s informal, possibly biased, “impressions.” So, “to make the trial as objective as possible,” he explained, “I have not relied on my own judgement alone but have sought the views of the [at least six] assistant physicians of the diphtheria ward, without informing them about the nature of the serum under test (namely the ordinary horse serum). Their judgement was thus completely without prejudice. I am keen to see my observations checked independently, and most warmly recommend this ‘blind’ method for the purpose” (Bingel, 1918, p. 288). Here, Bingel used the term still common today: blinding. That method gave stronger credence to Bingel’s contentious conclusion: the theoretical claims of the Nobel Prize winner were mistaken. Any serum was effective.Documenting specific instances of observer bias can be difficult. However, one can gauge the magnitude of the general problem by bulk comparison of blinded and non-blinded observations. One such analysis looked at clinical studies about a range of medical treatments, from heart conditions to wounds to psychiatric disorders (Hróbjartsson et al., 2013; Hróbjartsson et al., 2014). In the non-blinded studies—the ones open to observer bias—the conclusions were (on average) more dramatic. Probabilities of benefit were 36% higher. Effect sizes increased by 68%. Similar discrepancies were found even for lab studies on animal models (Bello et al., 2014). Overall, blinded studies seemed to yield more modest results. Even among clinical trials with large, randomized samples, unwanted observer bias can intrude and yield misleading findings.One might well imagine that observer bias would be limited to scientific studies where judgment is critical and where prior beliefs are strong. Not so. This method of comparing blinded and non-blinded studies has helped us probe that assumption (a further expression of this month’s Sacred Bovine—that one may assume by default that a scientist’s observations are immune to such influences).For example, do ants recognize nestmates (their genetic kin)? According to the theory of kin selection, the behavior of an individual should tend to benefit its closest genetic relatives. So, this apparently simple question of insect behavior has significant implications for understanding evolutionary biology. A standard way to measure such kin-oriented behavior is to observe ants from the same versus different colonies meeting, and to tally the various types of encounters between them. To what degree do they exhibit aggressive behavior toward kin (nestmates) or toward “others”? Even with the relevant behaviors clearly defined, those assessments can be subtle, it turns out. Identifying instances of “mandible flaring” or “recoil” from a tactile encounter, for example, requires some experimenter judgment. In one recent meta-analysis, investigators found 156 experiments of nestmate versus non-kin behavior (van Wilgenburg & Elgar, 2013). Of those, 53 met the criteria for analysis of observer bias. Fifteen of those used blinded behavioral analysis. As was the case in the clinical studies, the results of the non-blinded studies tended to provide stronger evidence for the predominant theory. First, “aggression among nestmates was three times more likely to be reported in blinded than non-blinded experiments.” Second, “the effect size—the differences between the level of aggression among nestmates and that among non-nestmates—in non-blind experiments was twice that of blind experiments.” Here, blinded experiments seem to have escaped bias from theoretical expectations.Another unlikely topic for observational error might be plant herbivory: namely, how much tree foliage do insects consume? One might envision a fairly straightforward task of sampling leaves and measuring the amount of loss—scan their surface area, weigh them, or count the proportion of leaves with damage. Or estimate defoliation visually, from photos of whole trees (and cross-check this method with some direct sampling). Simple measurements—manageable even by introductory students?This topic, too, has been examined for evidence of observer bias—based on 42 publications of insect herbivory in Brazil (Kozlov et al., 2014). Again, blinded and non-blinded studies were compared. The plant damage differed by a factor of five to ten, depending on the methods used. Non-blinded studies reported significantly more damage than blinded studies. That is, they matched the widespread assumption that such rates are very high in the tropics. In addition, studies that focused on only one or a few species (1–3) found twice as much damage as those studying 10 or more species. Thus, the researcher’s choice of individual species seems to have been a biasing factor. Perhaps one chooses a species because the damage is more noticeable (or “typical”) to the observer who is seeking to measure it? Or the species is more prevalent, enabling easy sampling. But the selected species apparently did not fairly represent all species, and this error has led to misleading claims about insect herbivory in the tropics in general.In a follow-up analysis (based on 125 publications), the same team identified other ways apparently insignificant choices seem to unconsciously bias such research: selection of study site; selection of timing (season and duration); and selection of individual branches or leaves to be sampled (Zvereva & Kozlov, 2019). Casual (technically, “haphazard”) sampling can open the way to observer bias. In addition, primary authors who participated in the sampling or measurement, or others who knew where the samples had originated, inevitably inflated the magnitude of the results. The reviewers concluded sadly, “Our ecological and environmental knowledge is considerably biased due to an unconscious tendency of researchers to lend support for their hypotheses and expectations, which generally leads to overestimation of the effects under study.” Blinding matters.These studies—of serum therapy, forensic analysis, clinical trials, ant behavior, and insect herbivory—document the widespread occurrence of unconscious observer bias in biology. Ironically, they equally indicate how blinding is effective in reducing its effects. Objectivity in science may be threatened by the infelicities of human observation, but it can also be salvaged by appropriate countermeasures. Accordingly, the custom of blinding—familiar to medical and psychological researchers for over a century now—is gradually informing more fields of science. (Note, too, its relevance to NGSS’s third Scientific and Engineering Practice: Planning and Carrying Out Investigations.)Observer bias is insidious, surely. Unconscious and easily hidden. It can severely threaten the quixotic ideal of objectivity in science. Yet turning a “blind eye” to such flaws only compounds the problem, allowing bias to fester at a yet deeper level. Fortunately, perhaps, while observer bias is unintentional, it can nonetheless be managed intentionally—through the strategy of blinding. In a society where facts are disputed, and allegations of prejudiced observations are rampant, such tools for reclaiming objectivity might well be more widely known—and perhaps fruitfully applied even by nonscientists.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
American Biology Teacher
American Biology Teacher BIOLOGY-EDUCATION, SCIENTIFIC DISCIPLINES
CiteScore
0.80
自引率
20.00%
发文量
108
期刊介绍: The American Biology Teacher is an award winning and peer-refereed professional journal for K-16 biology teachers. Articles include topics such as modern biology content, biology teaching strategies for both the classroom and laboratory, field activities, and a wide range of assistance for application and professional development. Each issue features reviews of books, classroom technology products, and "Biology Today." Published 9 times a year, the journal also covers the social and ethical implications of biology and ways to incorporate such concerns into instructional programs.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信