是时候离开购物中心了吗？错误记忆研究的测量缺陷、似然性与外在效度

IF 1.8 3区心理学 Q2 PSYCHOLOGY, EXPERIMENTAL

Applied Cognitive Psychology Pub Date : 2025-06-11 DOI:10.1002/acp.70083

Zsofia A. Szojka, Stephanie Block, David La Rooy

{"title":"是时候离开购物中心了吗？错误记忆研究的测量缺陷、似然性与外在效度","authors":"Zsofia A. Szojka, Stephanie Block, David La Rooy","doi":"10.1002/acp.70083","DOIUrl":null,"url":null,"abstract":"This commentary discusses the recently published article by Andrews and Brewin (2024) that reanalyzed data collected by Murphy et al. (2023) to replicate the well-known “lost in the mall” study first published by Loftus and Pickrell (1995). We begin by outlining initial and more recent findings that brought the “lost in the mall” paradigm to the forefront of false memory research before considering the thought-provoking results of the reanalysis by Andrews and Brewin (2024). We then highlight some of the implications of the reanalysis for child sexual abuse investigations, and more broadly, for the reliability and validity of psychological research that relies on researchers' coding and interpretation of information provided by participants about the content of their memories. We ask whether the definition and measurement of false memories within laboratory experiments can be meaningfully applied to real-life debates concerning justice for alleged victims and perpetrators of sexual abuse.In the 1970s Elizabeth Loftus and her team conducted a series of highly influential experiments demonstrating that misleading information received after a personal experience can lead people to make mistakes when they later try to describe what happened (Loftus and Palmer 1974; Loftus 1975). After establishing the impact of misinformation on memory for personal experiences, an innovative research paradigm was designed to demonstrate that memories of entire events that never occurred could be implanted in people's minds with relative ease. Loftus and Pickrell (1995) misled 24 adult participants to believe that their family members provided descriptions of four true past events, but unbeknownst to the participants, one of the supposed true events, being “lost in the mall”, was made up by the researchers. After participants were told that they had been lost in the mall many years earlier they were then asked to recall what they could remember in writing and verbally and rate the clarity of their memories. The results showed that a quarter of the participants were successfully induced to claim that they remembered the false event, although their average clarity ratings for the false memory were substantially lower than scores assigned to true events. (1) The “lost in the mall” study resulted in a “veritable explosion of cognitive research on the topic of false memory” (Pezdek and Lam 2007), (2), and led to the establishment of a new view of human memory as being particularly fragile and easily manipulated.However, while most memory researchers accept that false memory implantation is possible, the proportion of people who can be induced to develop false memories has been the subject of fierce debate (Wade et al. 2002). Scrutiny of false memory implantation experiments identified two main challenges concerning the definition of false memories: (1) differentiating between false beliefs and false memories, and (2) differentiating between flawed memories and false memories. The first challenge stems from the difficulty of determining whether participants in memory implantation studies genuinely remember the false event or simply believe the researchers' assertion that the false event had occurred. Failure to differentiate the former (false memories) from the latter (false beliefs) could easily inflate the rate of false memories reported in memory implantation studies (Wade et al. 2002). The second issue results from the potential confusion between false memories, defined as entire false events that have been implanted into memory, and flawed memories, referring to incorrect details in otherwise true memories (Pezdek and Lam 2007). This is a particular concern for versions of the memory implantation paradigm that rely on relatively common experiences, such as being lost in the mall. In these studies, there is a non-negligible probability that some of the participants have truly experienced situations like the false events suggested, leading again to inflated rates of false memories reported by researchers.The debate about defining and categorizing false memories has been reignited by the recent publication of a much larger scale replication of the original “lost in the mall” study by Murphy et al. (2023). Aiming to address prior criticism of the research paradigm, the researchers introduced a novel coding scheme that explicitly differentiates complete false memories (substantial remembering of all central details relating to the false memory), and partial false memories (partial recall of the false memory). Results showed a slightly higher rate of successful memory implantation than the 25% reported in the original “lost in the mall” study (Loftus and Pickrell 1995), with Murphy et al. (2023) classifying 8% of participants as having full false memories, and a further 27% as having partial false memories. However, consistent with the findings of the original study, participants' clarity ratings of the false memories were relatively low, and consistently lower than those of true memories. Importantly, fewer than half (14%) of the participants judged by researchers as having false memories self-reported remembering the event. In addition to publishing their findings, Murphy et al. (2023) made their data file and raw data public allowing for the possibility for other researchers to re-analyse their data.Andrews and Brewin's (2024) reanalysis of Murphy et al. (2023) data explicitly aimed to address the aforementioned criticisms of the “lost in the mall” paradigm: (1) that researcher-identified “false memories” may not reflect genuine remembering, and (2) that researcher-identified “false memories” may actually be distortions of participants' true memories. To investigate these concerns, the authors devised a more systematic coding approach that relies on counting the number of core details participants report about the suggested event and assessing the clarity of each of those details on a scale from “no mention” to “explicit recall”. The researchers also attempted to identify potentially true experiences by coding for mentions of being lost in different circumstances to the fake event, being lost on more than one occasion, or experiences similar to the target event that did not actually involve being lost.The findings of the reanalysis cast doubt on claims that 25%–35% of people can be “led to remember entire events that never actually happened to them” (Loftus and Pickrell 1995, 725). The re-coded data indicated that on average, participants judged by Murphy et al.'s (2023) research team as having a false memory explicitly recalled only 1.47 of the 6 core details. Even the participants judged to have full false memories tended to recall fewer than half of the core details, and 20% did not explicitly recall the most fundamental detail of actually “being lost”. As noted by Murphy et al. (2023), the rate of participants who self-reported remembering the target event (14%) was substantially lower than the rate of participants judged by the researchers as having developed a false memory (35%). Andrews and Brewin (2024) showed that participants' own criteria for remembering was related to the clarity of core details; participants who self-reported having developed a false memory explicitly mentioned significantly more core details than those who did not believe they remembered the event.The results of the re-analysis also validate prior concerns that some participants judged by researchers as having developed false memories may be referring to potentially true experiences. According to Andrews and Brewin (2024), 31% of participants produced descriptions of past experiences that were similar to the fake event but distinguished by key differences in core details, such as being lost in a different shopping location or being abandoned rather than being lost. The impact of these potentially true experiences on false memory rates was not negligible, as they were present in the accounts of 50% of those judged by Murphy et al. (2023) as having full false memories and 52% of those with partial false memories.Based on the results of their reanalysis, Andrews and Brewin (2024) concluded that previous studies using the “lost in the mall” paradigm have substantially overestimated the proportion of people who have developed false memories. The authors suggest three steps to improve the methodology of memory implantation studies: the exclusion of participants with potentially true experiences of the target event, the use of core details as minimum criteria for false memories, and the consideration of self-report measures alongside researcher identification of false memories. Using a step-by-step exclusion approach, Andrews and Brewin (2024) demonstrated that applying these methodological improvements to the data of Murphy et al. (2023) resulted in a drastically lowered false memory rate of only 4%. In their recent commentary, Wade et al. (2025) question the validity of this figure, suggesting that the authors' criteria exclude genuine false memories constructed from a combination of suggested details and memory traces from other sources. Nonetheless, Andrews and Brewin's (2024) argument that entirely false memories are more infrequent than memory implantation research would lead us to believe has implications both for the real-world application of the concept of false memories, and for the use of researcher-coded data in the field of memory research.In the 30 years since the publication of the original “lost in the mall” study, the results of false memory research have been applied far beyond “recovered memory” cases, with the apparent ease of memory implantation and high rate of participants who develop false memories leading to a widespread view that false memories of child sexual abuse are common and have likely resulted in an unknown number of miscarriages of justice (Blizard and Shaw 2019; Crook and McEwen 2019). Concerns about miscarriages of justice have also been expressed by Wade et al. (2025) commentary on Andrews and Brewin (2024), stating that “false memory rates in the lab might underestimate those in real cases, where factors are present that research has shown can exaggerate the likelihood that false memories are formed” (3). We argue that when it comes to claims of child sexual abuse, the opposite is true, with memory implantation experiments giving the impression that accusations based on false memories are more common than they really are. This inflated false memory rate occurs partly because lab studies rely on a specific set of highly suggestive techniques to induce false memories, and partly because lab studies fail to account for factors that reduce the likelihood of false allegations in real cases.To evaluate the claim that laboratory research provides a conservative estimate of the frequency of false memories in real cases, it is helpful to break down the numerous methods of suggestion and deceit that the “lost in the mall” paradigm and other memory implantation designs rely on to convince participants that they experienced a fake event. Firstly, the researchers provide the participant with the core details of the event in order to “remind them” of what happened, including the main action, the time and location of the event, the participant's emotional reaction, and the resolution of the crisis (Loftus and Ketcham 1994; Loftus & Pickrell; Murphy et al. 2023). These researcher-provided elements provide a coherent narrative framework that serves as a script or schema, making it easy for the participants to “fill in” the details even if they have not personally experienced them. Secondly, the participant is led to believe that these core details were provided by a trusted family member who was present when the event occurred. Given that the experimental paradigm involves no stakes for the participant, the relative, or anyone else, there is no reason for participants to suspect that their relatives would mislead them. Moreover, the description of the false event is presented after the participant has read the summaries of true memories, eliminating potential doubts about the veracity of the accounts. Not all memory implantation studies involve these “tricks” of suggestion, but those that alter the proven formula tend to add a different element of deception, such as showing participants fake photographs or materials related to the target event (e.g., Braun et al. 2002; Wade et al. 2002). Furthermore, false memory studies also rely on repeated recall attempts (three including the booklet and consecutive interviews) to maximize the likelihood that participants will acquiesce to suggestion. Although adults and children can recall events they truly experienced accurately across multiple interviews, a wealth of research has demonstrated an increase in erroneous details resulting from a combination of suggestive techniques and repeated recall occasions (La Rooy et al. 2009). Thus, despite some claims to the opposite, implanting false memories is no simple matter and requires the use of a specific set of highly suggestive techniques under laboratory conditions.Perhaps even more importantly, real cases involve factors that research has shown can reduce the likelihood of false allegations of child sexual abuse, including the implausibility of the event, children's reluctance to disclose abuse, and the presence of procedural safeguards to prevent miscarriages of justice based on false memories. Andrews and Brewin (2024) highlight that one criticism of the “lost in the mall” paradigm is that being “lost” is a common or plausible event. Even if we have not experienced being lost ourselves, most of us have a schema for it; that is being lost is a theme of many books, television shows and other media, so it is easy to imagine what it would be like to be lost. Most children, however, without the exposure to extreme suggestion, do not have a schema for child sexual abuse. Pezdek and Hodge (1999) conducted a study where they looked at younger (5–7 years) and older (9–12 years) children's susceptibility to accept a false memory for plausible (i.e., being lost in the mall) versus implausible events (i.e., receiving a rectal enema). Most children in this study did not report remembering either false event, but those who did were far more likely to recall the plausible event than the implausible event. The study's conclusion that false memories are not likely to be implanted for less plausible events is consistent with research showing very low rates of acquiescence to false suggestions of genital touch during a real medical examination, even among the youngest children (Saywitz et al. 1991). This is important as we consider how the “lost in the mall” paradigm has been, in our opinion, inappropriately applied to child sexual abuse cases in the courtroom. In the “lost in the mall” paradigm, children are tested about a plausible memory event that they are told their parents said was true, and that is embedded in other true events.Beyond the implausibility of sexual abuse narratives for children who have not experienced abuse, we also know that children who have experienced abuse are reluctant to disclose to adults (Lyon et al. n.d.), implying that children are far less likely to report false narratives of sexual abuse than false narratives pertaining to innocuous events. Motivational barriers to disclosing sexual abuse include children's concerns about their parents' reactions and the perceived negative consequences of the allegations for the child and the family (Lemaigre et al. 2017). Furthermore, children who were abused by a family member or groomed by a trusted adult often express feelings of love and care toward the perpetrator and are reluctant to report the abuse due to concerns about the consequences the abuser may face (Christensen et al. 2015; Lemaigre et al. 2017). Finally, many victims feel shame, guilt, and self-blame about their abuse (Alaggia et al. 2019; Goodman-Brown et al. 2003; Hershkowitz et al. 2007), further increasing their reluctance to disclose. Indeed, these barriers to disclosure are so strong that an estimated 50% of substantiated victims initially deny abuse when questioned (Lyon et al. n.d.), suggesting that false denials of abuse likely pose a much greater obstacle to justice than allegations based on false memories.In addition to internal and external factors that reduce the likelihood of real victims making allegations based on false memories of sexual abuse, Andrews and Brewin (2024) note the presence of procedural safeguards against the impact of false memories in real cases, such as the extra scrutiny of jury trials in adversarial legal systems. Perhaps anticipating this criticism, Murphy et al. (2023) extended the original memory implantation design with a mock jury experiment in which 1024 lay “jurors” were asked to read participants' descriptions of the “lost in the mall” event and provide a yes/no judgment regarding whether they reflect genuine memories. Mock jurors believed that memories of the fake event were real even more frequently (39%) than the researchers (35%), demonstrating that lay observers may find it difficult to distinguish between true and false memories. The authors explain their decision not to warn mock jurors that some memories may be false by stating their aim to “mirror the experience of real jurors listening to a witness describe events from their past” (822). However, we disagree with the claim that the researchers created a realistic trial scenario, as real-life cases typically involve opening and closing statements, cross-examination of witnesses, and specific jury instructions around credibility and the burden of proof. Real jurors are often told by defense attorneys in the opening statement that witnesses may lie, and indeed the goal of the defense throughout the trial is to cast doubt on the truth of prosecution's evidence. Furthermore, before a case even reaches the courtroom, the credibility of the child's disclosure is scrutinized many times by many different professionals. Having examined 500 reported cases of child sexual abuse in the United States, Block et al. (2023) found that only 53% of cases were investigated and as few as 17% progressed to court, demonstrating the rigorous criteria cases must meet to move forward in the criminal legal system. Thus, we are inclined to agree with Andrews and Brewin's (2024) conclusion that memory implantation studies overestimate the proportion of participants who develop false memories that observers would judge genuine in a legal context.In conclusion, we argue that the external validity of laboratory research on false memory implantation is too low to meaningfully inform real investigations involving allegations of child sexual abuse. Although, as demonstrated, laboratory research relies on multiple highly suggestive techniques to induce false memories, Wade et al. (2025) are correct in pointing out that real cases may involve additional suggestive influences that are not present in the laboratory. However, the impact of these is likely outweighed by the presence of factors that reduce the likelihood of false allegations, and procedural safeguards that mitigate the risk of false memories resulting in miscarriages of justice.In addition to the implications for real-life sexual abuse cases, the results of Andrews and Brewin's (2024) re-analysis also raise questions about the reliability and validity of coding approaches used widely within some areas of psychological research. If re-coding the same dataset with a different coding approach leads to wildly different conclusions with regard to the main hypotheses of a study, how can we trust the results of any research relying on researcher-coded data?Firstly, researchers relying on manually coded raw data must ensure that their coding is reliable, meaning that their coding guide contains clear rules that the coders follow objectively and accurately. In psychology, reliability is generally assessed through measuring inter-rater agreement among multiple coders, most commonly by calculating Cohen's Kappa. Although there are variations in the field, it is generally accepted that Kappa values at or above 0.8 (sometimes 0.7) reflect almost perfect agreement between coders, signifying a reliable coding approach. Both Murphy et al. (2023) and Andrews and Brewin (2024) fall short of this established standard, with inter-rater agreement as low as k = 0.60 (Murphy et al. 2023) and k = 0.49 (Andrews and Brewin 2024). These figures reflect at best moderate agreement between coders and are approaching the lower limit of what could be considered “reliable” coding. Given the low inter-rater agreement figures, it is questionable whether the results of Murphy et al. (2023) and Andrews and Brewin (2024) could be replicated even if a research team used the exact same coding guide as the original studies.Low reliability figures limit the strength of conclusions one can draw from quantitative analysis, so it is unfortunate that neither Murphy et al. (2023) nor Andrews and Brewin (2024) highlight the relatively low inter-rater agreement achieved by coders (k = 0.60 and 0.49 at the lowest, respectively) when discussing their findings. Difficulties with achieving reliability in coding may be indicative of more pervasive problems with the study design, such as challenges with operationalizing vague concepts like “partial memory” (Murphy et al. 2023). When objective definitions of psychological concepts prove elusive, an alternative approach is to deconstruct the variables in question into smaller, more easily circumscribed components. Despite Andrews and Brewin's (2024) detail-focused coding guide aiming to capitalize on this approach, there was still substantial inter-rater disagreement with regards to two out of the six core details identified by the authors. One potential contributor to low interrater reliability is human error, which might play a significant role in research designs relying on manual coding of large amounts of complex data even when the coding categories are clearly defined. In this respect, the development of machine-assisted coding approaches is a promising avenue for increasing the reliability of research studies, as machine learning models have been found to outperform manual coders in accuracy when coding interview transcripts (Szojka et al. 2025). Although training machine models still requires an initial dataset of reliably coded data, the trained model can then be applied to new datasets that rely on the same coding categories. This method has the advantage of providing a standardized coding approach that can be used across studies addressing the same research question, as is the case for the numerous direct and quasi-replications of the “lost in the mall” experiment.However, as Andrews and Brewin (2024) demonstrate, questions about the results of the “lost in the mall” study and its replications go beyond the reliability of the coding guide and concern its validity; the extent to which researcher-defined false memories reflect genuine remembering on the part of the participant. Even if coders in Murphy et al. (2023) achieved perfect reliability, re-analysing the data with a new operational definition of false memories may produce different results. The Andrews and Brewin (2024) re-analysis is not the first study to raise questions about the validity of the definition and measurement of false memories in memory implantation designs. In 2015, a team of researchers criticized the astonishing findings of a study that reported successful induction of false memories of committing a crime in 70% of participants (Shaw and Porter 2015), arguing that the false memory rate was inflated by the authors' failure to differentiate between false beliefs and false memories (Wade et al. 2018). Wade et al. (2018) recoded the data of Shaw and Porter (2015) using two separate coding schemes that distinguish false beliefs from false memories (Lindsay et al. 2004; Scoboria et al. 2017) and obtained a more conservative false memory rate of 26%–30%. Highlighting the stark difference in results between the two analyses, the authors suggest that eschewing established coding approaches in favor of new definitions leads to imprecision that “fuels skepticism of memory research and detracts from the understanding of real-world behavior” (Wade et al. 2018, 474).Andrews and Brewin's (2024) re-analysis of the data collected by Murphy et al. (2023) is based on the premise that the relatively well-established approach of categorizing partial and full false memories itself lacks validity, necessitating the introduction of a new conceptualisation of false memories. In line with Wade et al. (2018) suggestions, Andrews and Brewin (2024) ensured that their approach is clearly positioned in relation to previous research by (1) providing a comprehensive description of how the data was coded, (2) explaining their motivation for developing an alternative coding approach, and (3) reporting their findings alongside the results obtained with a different coding guide by Murphy et al. (2023). While these steps certainly contribute to making the dialog about the definition and measurement of false memories in the field more transparent, they ultimately cannot answer the question of which of the many definitions of false memories is correct. To determine whether the concept of false memories suggested by Andrews and Brewin (2024) improves on previous definitions used by Murphy et al. (2023) or indeed Loftus and Pickrell (1995), researchers need to examine the world beyond controlled experimental conditions and investigate the meaning and usefulness of the concept of false memories in real cases.In conclusion, if the question is whether the findings of Loftus and Pickrell's (1995) “lost in the mall” study, one of the most influential and surprising experiments in the history of memory research, can be replicated, the answer has to be a confident “yes”. Over a period of 30 years, a multitude of studies, including a meta-analysis (Scoboria et al. 2017) and a direct replication (Murphy et al. 2023), confirmed that under strictly controlled experimental conditions, a substantial minority of participants can be misled to report details of fake events as if they remembered them. The contribution of Andrews and Brewin (2024) is to move the debate away from the reliability of the false memory phenomenon and challenge its validity. Is it meaningful for researchers to state that an individual has developed a false memory if the participant herself does not think she remembers it? Would a researcher-identified false memory constitute credible evidence at court? Renewed interest in these questions supports our view that it remains inappropriate to interpret false memory rates reported by laboratory studies using a memory implantation paradigm as evidence that a substantial proportion of real-life allegations of child sexual abuse are based on false memories. Even if further replications of the “lost in the mall” study were able to introduce reliable and valid methods of measuring false memories, the paradigm itself will still fail to account for the real-life context of child abuse investigations. We argue that the aim of preventing miscarriages of justice is better served by observational studies and new, ecologically valid research paradigms than by the continued deconstruction and reconstruction of a 30-year-old experiment.Zsofia A. Szojka: writing – original draft, writing – review and editing, conceptualization. Stephanie Block: writing – review and editing, conceptualization. David La Rooy: writing – review and editing, conceptualization.An ethics approval was not required, as this study is based exclusively on published research.The authors declare no conflicts of interest.","PeriodicalId":48281,"journal":{"name":"Applied Cognitive Psychology","volume":"39 3","pages":""},"PeriodicalIF":1.8000,"publicationDate":"2025-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/acp.70083","citationCount":"0","resultStr":"{\"title\":\"Is It Time to Leave the Shopping Mall Behind? Measurement Flaws, Plausibility, and External Validity of False Memory Research\",\"authors\":\"Zsofia A. Szojka, Stephanie Block, David La Rooy\",\"doi\":\"10.1002/acp.70083\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This commentary discusses the recently published article by Andrews and Brewin (2024) that reanalyzed data collected by Murphy et al. (2023) to replicate the well-known “lost in the mall” study first published by Loftus and Pickrell (1995). We begin by outlining initial and more recent findings that brought the “lost in the mall” paradigm to the forefront of false memory research before considering the thought-provoking results of the reanalysis by Andrews and Brewin (2024). We then highlight some of the implications of the reanalysis for child sexual abuse investigations, and more broadly, for the reliability and validity of psychological research that relies on researchers' coding and interpretation of information provided by participants about the content of their memories. We ask whether the definition and measurement of false memories within laboratory experiments can be meaningfully applied to real-life debates concerning justice for alleged victims and perpetrators of sexual abuse.In the 1970s Elizabeth Loftus and her team conducted a series of highly influential experiments demonstrating that misleading information received after a personal experience can lead people to make mistakes when they later try to describe what happened (Loftus and Palmer 1974; Loftus 1975). After establishing the impact of misinformation on memory for personal experiences, an innovative research paradigm was designed to demonstrate that memories of entire events that never occurred could be implanted in people's minds with relative ease. Loftus and Pickrell (1995) misled 24 adult participants to believe that their family members provided descriptions of four true past events, but unbeknownst to the participants, one of the supposed true events, being “lost in the mall”, was made up by the researchers. After participants were told that they had been lost in the mall many years earlier they were then asked to recall what they could remember in writing and verbally and rate the clarity of their memories. The results showed that a quarter of the participants were successfully induced to claim that they remembered the false event, although their average clarity ratings for the false memory were substantially lower than scores assigned to true events. (1) The “lost in the mall” study resulted in a “veritable explosion of cognitive research on the topic of false memory” (Pezdek and Lam 2007), (2), and led to the establishment of a new view of human memory as being particularly fragile and easily manipulated.However, while most memory researchers accept that false memory implantation is possible, the proportion of people who can be induced to develop false memories has been the subject of fierce debate (Wade et al. 2002). Scrutiny of false memory implantation experiments identified two main challenges concerning the definition of false memories: (1) differentiating between false beliefs and false memories, and (2) differentiating between flawed memories and false memories. The first challenge stems from the difficulty of determining whether participants in memory implantation studies genuinely remember the false event or simply believe the researchers' assertion that the false event had occurred. Failure to differentiate the former (false memories) from the latter (false beliefs) could easily inflate the rate of false memories reported in memory implantation studies (Wade et al. 2002). The second issue results from the potential confusion between false memories, defined as entire false events that have been implanted into memory, and flawed memories, referring to incorrect details in otherwise true memories (Pezdek and Lam 2007). This is a particular concern for versions of the memory implantation paradigm that rely on relatively common experiences, such as being lost in the mall. In these studies, there is a non-negligible probability that some of the participants have truly experienced situations like the false events suggested, leading again to inflated rates of false memories reported by researchers.The debate about defining and categorizing false memories has been reignited by the recent publication of a much larger scale replication of the original “lost in the mall” study by Murphy et al. (2023). Aiming to address prior criticism of the research paradigm, the researchers introduced a novel coding scheme that explicitly differentiates complete false memories (substantial remembering of all central details relating to the false memory), and partial false memories (partial recall of the false memory). Results showed a slightly higher rate of successful memory implantation than the 25% reported in the original “lost in the mall” study (Loftus and Pickrell 1995), with Murphy et al. (2023) classifying 8% of participants as having full false memories, and a further 27% as having partial false memories. However, consistent with the findings of the original study, participants' clarity ratings of the false memories were relatively low, and consistently lower than those of true memories. Importantly, fewer than half (14%) of the participants judged by researchers as having false memories self-reported remembering the event. In addition to publishing their findings, Murphy et al. (2023) made their data file and raw data public allowing for the possibility for other researchers to re-analyse their data.Andrews and Brewin's (2024) reanalysis of Murphy et al. (2023) data explicitly aimed to address the aforementioned criticisms of the “lost in the mall” paradigm: (1) that researcher-identified “false memories” may not reflect genuine remembering, and (2) that researcher-identified “false memories” may actually be distortions of participants' true memories. To investigate these concerns, the authors devised a more systematic coding approach that relies on counting the number of core details participants report about the suggested event and assessing the clarity of each of those details on a scale from “no mention” to “explicit recall”. The researchers also attempted to identify potentially true experiences by coding for mentions of being lost in different circumstances to the fake event, being lost on more than one occasion, or experiences similar to the target event that did not actually involve being lost.The findings of the reanalysis cast doubt on claims that 25%–35% of people can be “led to remember entire events that never actually happened to them” (Loftus and Pickrell 1995, 725). The re-coded data indicated that on average, participants judged by Murphy et al.'s (2023) research team as having a false memory explicitly recalled only 1.47 of the 6 core details. Even the participants judged to have full false memories tended to recall fewer than half of the core details, and 20% did not explicitly recall the most fundamental detail of actually “being lost”. As noted by Murphy et al. (2023), the rate of participants who self-reported remembering the target event (14%) was substantially lower than the rate of participants judged by the researchers as having developed a false memory (35%). Andrews and Brewin (2024) showed that participants' own criteria for remembering was related to the clarity of core details; participants who self-reported having developed a false memory explicitly mentioned significantly more core details than those who did not believe they remembered the event.The results of the re-analysis also validate prior concerns that some participants judged by researchers as having developed false memories may be referring to potentially true experiences. According to Andrews and Brewin (2024), 31% of participants produced descriptions of past experiences that were similar to the fake event but distinguished by key differences in core details, such as being lost in a different shopping location or being abandoned rather than being lost. The impact of these potentially true experiences on false memory rates was not negligible, as they were present in the accounts of 50% of those judged by Murphy et al. (2023) as having full false memories and 52% of those with partial false memories.Based on the results of their reanalysis, Andrews and Brewin (2024) concluded that previous studies using the “lost in the mall” paradigm have substantially overestimated the proportion of people who have developed false memories. The authors suggest three steps to improve the methodology of memory implantation studies: the exclusion of participants with potentially true experiences of the target event, the use of core details as minimum criteria for false memories, and the consideration of self-report measures alongside researcher identification of false memories. Using a step-by-step exclusion approach, Andrews and Brewin (2024) demonstrated that applying these methodological improvements to the data of Murphy et al. (2023) resulted in a drastically lowered false memory rate of only 4%. In their recent commentary, Wade et al. (2025) question the validity of this figure, suggesting that the authors' criteria exclude genuine false memories constructed from a combination of suggested details and memory traces from other sources. Nonetheless, Andrews and Brewin's (2024) argument that entirely false memories are more infrequent than memory implantation research would lead us to believe has implications both for the real-world application of the concept of false memories, and for the use of researcher-coded data in the field of memory research.In the 30 years since the publication of the original “lost in the mall” study, the results of false memory research have been applied far beyond “recovered memory” cases, with the apparent ease of memory implantation and high rate of participants who develop false memories leading to a widespread view that false memories of child sexual abuse are common and have likely resulted in an unknown number of miscarriages of justice (Blizard and Shaw 2019; Crook and McEwen 2019). Concerns about miscarriages of justice have also been expressed by Wade et al. (2025) commentary on Andrews and Brewin (2024), stating that “false memory rates in the lab might underestimate those in real cases, where factors are present that research has shown can exaggerate the likelihood that false memories are formed” (3). We argue that when it comes to claims of child sexual abuse, the opposite is true, with memory implantation experiments giving the impression that accusations based on false memories are more common than they really are. This inflated false memory rate occurs partly because lab studies rely on a specific set of highly suggestive techniques to induce false memories, and partly because lab studies fail to account for factors that reduce the likelihood of false allegations in real cases.To evaluate the claim that laboratory research provides a conservative estimate of the frequency of false memories in real cases, it is helpful to break down the numerous methods of suggestion and deceit that the “lost in the mall” paradigm and other memory implantation designs rely on to convince participants that they experienced a fake event. Firstly, the researchers provide the participant with the core details of the event in order to “remind them” of what happened, including the main action, the time and location of the event, the participant's emotional reaction, and the resolution of the crisis (Loftus and Ketcham 1994; Loftus & Pickrell; Murphy et al. 2023). These researcher-provided elements provide a coherent narrative framework that serves as a script or schema, making it easy for the participants to “fill in” the details even if they have not personally experienced them. Secondly, the participant is led to believe that these core details were provided by a trusted family member who was present when the event occurred. Given that the experimental paradigm involves no stakes for the participant, the relative, or anyone else, there is no reason for participants to suspect that their relatives would mislead them. Moreover, the description of the false event is presented after the participant has read the summaries of true memories, eliminating potential doubts about the veracity of the accounts. Not all memory implantation studies involve these “tricks” of suggestion, but those that alter the proven formula tend to add a different element of deception, such as showing participants fake photographs or materials related to the target event (e.g., Braun et al. 2002; Wade et al. 2002). Furthermore, false memory studies also rely on repeated recall attempts (three including the booklet and consecutive interviews) to maximize the likelihood that participants will acquiesce to suggestion. Although adults and children can recall events they truly experienced accurately across multiple interviews, a wealth of research has demonstrated an increase in erroneous details resulting from a combination of suggestive techniques and repeated recall occasions (La Rooy et al. 2009). Thus, despite some claims to the opposite, implanting false memories is no simple matter and requires the use of a specific set of highly suggestive techniques under laboratory conditions.Perhaps even more importantly, real cases involve factors that research has shown can reduce the likelihood of false allegations of child sexual abuse, including the implausibility of the event, children's reluctance to disclose abuse, and the presence of procedural safeguards to prevent miscarriages of justice based on false memories. Andrews and Brewin (2024) highlight that one criticism of the “lost in the mall” paradigm is that being “lost” is a common or plausible event. Even if we have not experienced being lost ourselves, most of us have a schema for it; that is being lost is a theme of many books, television shows and other media, so it is easy to imagine what it would be like to be lost. Most children, however, without the exposure to extreme suggestion, do not have a schema for child sexual abuse. Pezdek and Hodge (1999) conducted a study where they looked at younger (5–7 years) and older (9–12 years) children's susceptibility to accept a false memory for plausible (i.e., being lost in the mall) versus implausible events (i.e., receiving a rectal enema). Most children in this study did not report remembering either false event, but those who did were far more likely to recall the plausible event than the implausible event. The study's conclusion that false memories are not likely to be implanted for less plausible events is consistent with research showing very low rates of acquiescence to false suggestions of genital touch during a real medical examination, even among the youngest children (Saywitz et al. 1991). This is important as we consider how the “lost in the mall” paradigm has been, in our opinion, inappropriately applied to child sexual abuse cases in the courtroom. In the “lost in the mall” paradigm, children are tested about a plausible memory event that they are told their parents said was true, and that is embedded in other true events.Beyond the implausibility of sexual abuse narratives for children who have not experienced abuse, we also know that children who have experienced abuse are reluctant to disclose to adults (Lyon et al. n.d.), implying that children are far less likely to report false narratives of sexual abuse than false narratives pertaining to innocuous events. Motivational barriers to disclosing sexual abuse include children's concerns about their parents' reactions and the perceived negative consequences of the allegations for the child and the family (Lemaigre et al. 2017). Furthermore, children who were abused by a family member or groomed by a trusted adult often express feelings of love and care toward the perpetrator and are reluctant to report the abuse due to concerns about the consequences the abuser may face (Christensen et al. 2015; Lemaigre et al. 2017). Finally, many victims feel shame, guilt, and self-blame about their abuse (Alaggia et al. 2019; Goodman-Brown et al. 2003; Hershkowitz et al. 2007), further increasing their reluctance to disclose. Indeed, these barriers to disclosure are so strong that an estimated 50% of substantiated victims initially deny abuse when questioned (Lyon et al. n.d.), suggesting that false denials of abuse likely pose a much greater obstacle to justice than allegations based on false memories.In addition to internal and external factors that reduce the likelihood of real victims making allegations based on false memories of sexual abuse, Andrews and Brewin (2024) note the presence of procedural safeguards against the impact of false memories in real cases, such as the extra scrutiny of jury trials in adversarial legal systems. Perhaps anticipating this criticism, Murphy et al. (2023) extended the original memory implantation design with a mock jury experiment in which 1024 lay “jurors” were asked to read participants' descriptions of the “lost in the mall” event and provide a yes/no judgment regarding whether they reflect genuine memories. Mock jurors believed that memories of the fake event were real even more frequently (39%) than the researchers (35%), demonstrating that lay observers may find it difficult to distinguish between true and false memories. The authors explain their decision not to warn mock jurors that some memories may be false by stating their aim to “mirror the experience of real jurors listening to a witness describe events from their past” (822). However, we disagree with the claim that the researchers created a realistic trial scenario, as real-life cases typically involve opening and closing statements, cross-examination of witnesses, and specific jury instructions around credibility and the burden of proof. Real jurors are often told by defense attorneys in the opening statement that witnesses may lie, and indeed the goal of the defense throughout the trial is to cast doubt on the truth of prosecution's evidence. Furthermore, before a case even reaches the courtroom, the credibility of the child's disclosure is scrutinized many times by many different professionals. Having examined 500 reported cases of child sexual abuse in the United States, Block et al. (2023) found that only 53% of cases were investigated and as few as 17% progressed to court, demonstrating the rigorous criteria cases must meet to move forward in the criminal legal system. Thus, we are inclined to agree with Andrews and Brewin's (2024) conclusion that memory implantation studies overestimate the proportion of participants who develop false memories that observers would judge genuine in a legal context.In conclusion, we argue that the external validity of laboratory research on false memory implantation is too low to meaningfully inform real investigations involving allegations of child sexual abuse. Although, as demonstrated, laboratory research relies on multiple highly suggestive techniques to induce false memories, Wade et al. (2025) are correct in pointing out that real cases may involve additional suggestive influences that are not present in the laboratory. However, the impact of these is likely outweighed by the presence of factors that reduce the likelihood of false allegations, and procedural safeguards that mitigate the risk of false memories resulting in miscarriages of justice.In addition to the implications for real-life sexual abuse cases, the results of Andrews and Brewin's (2024) re-analysis also raise questions about the reliability and validity of coding approaches used widely within some areas of psychological research. If re-coding the same dataset with a different coding approach leads to wildly different conclusions with regard to the main hypotheses of a study, how can we trust the results of any research relying on researcher-coded data?Firstly, researchers relying on manually coded raw data must ensure that their coding is reliable, meaning that their coding guide contains clear rules that the coders follow objectively and accurately. In psychology, reliability is generally assessed through measuring inter-rater agreement among multiple coders, most commonly by calculating Cohen's Kappa. Although there are variations in the field, it is generally accepted that Kappa values at or above 0.8 (sometimes 0.7) reflect almost perfect agreement between coders, signifying a reliable coding approach. Both Murphy et al. (2023) and Andrews and Brewin (2024) fall short of this established standard, with inter-rater agreement as low as k = 0.60 (Murphy et al. 2023) and k = 0.49 (Andrews and Brewin 2024). These figures reflect at best moderate agreement between coders and are approaching the lower limit of what could be considered “reliable” coding. Given the low inter-rater agreement figures, it is questionable whether the results of Murphy et al. (2023) and Andrews and Brewin (2024) could be replicated even if a research team used the exact same coding guide as the original studies.Low reliability figures limit the strength of conclusions one can draw from quantitative analysis, so it is unfortunate that neither Murphy et al. (2023) nor Andrews and Brewin (2024) highlight the relatively low inter-rater agreement achieved by coders (k = 0.60 and 0.49 at the lowest, respectively) when discussing their findings. Difficulties with achieving reliability in coding may be indicative of more pervasive problems with the study design, such as challenges with operationalizing vague concepts like “partial memory” (Murphy et al. 2023). When objective definitions of psychological concepts prove elusive, an alternative approach is to deconstruct the variables in question into smaller, more easily circumscribed components. Despite Andrews and Brewin's (2024) detail-focused coding guide aiming to capitalize on this approach, there was still substantial inter-rater disagreement with regards to two out of the six core details identified by the authors. One potential contributor to low interrater reliability is human error, which might play a significant role in research designs relying on manual coding of large amounts of complex data even when the coding categories are clearly defined. In this respect, the development of machine-assisted coding approaches is a promising avenue for increasing the reliability of research studies, as machine learning models have been found to outperform manual coders in accuracy when coding interview transcripts (Szojka et al. 2025). Although training machine models still requires an initial dataset of reliably coded data, the trained model can then be applied to new datasets that rely on the same coding categories. This method has the advantage of providing a standardized coding approach that can be used across studies addressing the same research question, as is the case for the numerous direct and quasi-replications of the “lost in the mall” experiment.However, as Andrews and Brewin (2024) demonstrate, questions about the results of the “lost in the mall” study and its replications go beyond the reliability of the coding guide and concern its validity; the extent to which researcher-defined false memories reflect genuine remembering on the part of the participant. Even if coders in Murphy et al. (2023) achieved perfect reliability, re-analysing the data with a new operational definition of false memories may produce different results. The Andrews and Brewin (2024) re-analysis is not the first study to raise questions about the validity of the definition and measurement of false memories in memory implantation designs. In 2015, a team of researchers criticized the astonishing findings of a study that reported successful induction of false memories of committing a crime in 70% of participants (Shaw and Porter 2015), arguing that the false memory rate was inflated by the authors' failure to differentiate between false beliefs and false memories (Wade et al. 2018). Wade et al. (2018) recoded the data of Shaw and Porter (2015) using two separate coding schemes that distinguish false beliefs from false memories (Lindsay et al. 2004; Scoboria et al. 2017) and obtained a more conservative false memory rate of 26%–30%. Highlighting the stark difference in results between the two analyses, the authors suggest that eschewing established coding approaches in favor of new definitions leads to imprecision that “fuels skepticism of memory research and detracts from the understanding of real-world behavior” (Wade et al. 2018, 474).Andrews and Brewin's (2024) re-analysis of the data collected by Murphy et al. (2023) is based on the premise that the relatively well-established approach of categorizing partial and full false memories itself lacks validity, necessitating the introduction of a new conceptualisation of false memories. In line with Wade et al. (2018) suggestions, Andrews and Brewin (2024) ensured that their approach is clearly positioned in relation to previous research by (1) providing a comprehensive description of how the data was coded, (2) explaining their motivation for developing an alternative coding approach, and (3) reporting their findings alongside the results obtained with a different coding guide by Murphy et al. (2023). While these steps certainly contribute to making the dialog about the definition and measurement of false memories in the field more transparent, they ultimately cannot answer the question of which of the many definitions of false memories is correct. To determine whether the concept of false memories suggested by Andrews and Brewin (2024) improves on previous definitions used by Murphy et al. (2023) or indeed Loftus and Pickrell (1995), researchers need to examine the world beyond controlled experimental conditions and investigate the meaning and usefulness of the concept of false memories in real cases.In conclusion, if the question is whether the findings of Loftus and Pickrell's (1995) “lost in the mall” study, one of the most influential and surprising experiments in the history of memory research, can be replicated, the answer has to be a confident “yes”. Over a period of 30 years, a multitude of studies, including a meta-analysis (Scoboria et al. 2017) and a direct replication (Murphy et al. 2023), confirmed that under strictly controlled experimental conditions, a substantial minority of participants can be misled to report details of fake events as if they remembered them. The contribution of Andrews and Brewin (2024) is to move the debate away from the reliability of the false memory phenomenon and challenge its validity. Is it meaningful for researchers to state that an individual has developed a false memory if the participant herself does not think she remembers it? Would a researcher-identified false memory constitute credible evidence at court? Renewed interest in these questions supports our view that it remains inappropriate to interpret false memory rates reported by laboratory studies using a memory implantation paradigm as evidence that a substantial proportion of real-life allegations of child sexual abuse are based on false memories. Even if further replications of the “lost in the mall” study were able to introduce reliable and valid methods of measuring false memories, the paradigm itself will still fail to account for the real-life context of child abuse investigations. We argue that the aim of preventing miscarriages of justice is better served by observational studies and new, ecologically valid research paradigms than by the continued deconstruction and reconstruction of a 30-year-old experiment.Zsofia A. Szojka: writing – original draft, writing – review and editing, conceptualization. Stephanie Block: writing – review and editing, conceptualization. David La Rooy: writing – review and editing, conceptualization.An ethics approval was not required, as this study is based exclusively on published research.The authors declare no conflicts of interest.\",\"PeriodicalId\":48281,\"journal\":{\"name\":\"Applied Cognitive Psychology\",\"volume\":\"39 3\",\"pages\":\"\"},\"PeriodicalIF\":1.8000,\"publicationDate\":\"2025-06-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1002/acp.70083\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applied Cognitive Psychology\",\"FirstCategoryId\":\"102\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/acp.70083\",\"RegionNum\":3,\"RegionCategory\":\"心理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"PSYCHOLOGY, EXPERIMENTAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Cognitive Psychology","FirstCategoryId":"102","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/acp.70083","RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"PSYCHOLOGY, EXPERIMENTAL","Score":null,"Total":0}

引用次数: 0

摘要

然而，与最初的研究结果一致，参与者对虚假记忆的清晰度评级相对较低，并且一直低于真实记忆。重要的是，被研究人员判断为有错误记忆的参与者中，只有不到一半（14%）的人自我报告记住了该事件。Murphy等人（2023）除了公布他们的发现外，还公开了他们的数据文件和原始数据，允许其他研究人员重新分析他们的数据。Andrews和Brewin（2024）对Murphy等人（2023）数据的重新分析明确旨在解决上述对“迷失在商场”范式的批评：(1)研究人员识别的“错误记忆”可能不能反映真实的记忆，(2)研究人员识别的“错误记忆”实际上可能是参与者真实记忆的扭曲。为了调查这些问题，作者设计了一种更系统的编码方法，该方法依赖于计算参与者报告的关于建议事件的核心细节的数量，并在从“没有提及”到“明确回忆”的范围内评估每个细节的清晰度。研究人员还试图识别潜在的真实经历，方法是将在不同情况下的迷失与虚假事件、不止一次的迷失或与目标事件相似但实际上不涉及迷失的经历进行编码。重新分析的结果对25%-35%的人可以“记住从未发生过的整个事件”的说法提出了质疑（Loftus和Pickrell 1995,725）。重新编码的数据表明，平均而言，被Murphy等人（2023）的研究团队判断为错误记忆的参与者只能明确地回忆起6个核心细节中的1.47个。即使被判定有完全错误记忆的参与者也倾向于回忆不到一半的核心细节，20%的人没有明确回忆起实际上“迷路”的最基本细节。正如Murphy等人（2023）所指出的那样，自我报告记住目标事件的参与者比例（14%）大大低于被研究人员判断为产生错误记忆的参与者比例（35%）。Andrews和Brewin（2024）表明，参与者自己的记忆标准与核心细节的清晰度有关；那些自我报告有错误记忆的参与者比那些不相信自己记得事件的参与者明确地提到了更多的核心细节。重新分析的结果也证实了先前的担忧，即一些被研究人员判断为形成错误记忆的参与者可能指的是潜在的真实经历。根据Andrews和Brewin（2024）的研究，31%的参与者对过去经历的描述与虚假事件相似，但在核心细节上存在关键差异，例如在不同的购物地点迷路或被遗弃而不是迷路。这些潜在的真实经历对错误记忆率的影响不容忽视，因为在墨菲等人（2023）的判断中，50%的人有完全错误的记忆，52%的人有部分错误的记忆。基于他们重新分析的结果，Andrews和Brewin（2024）得出结论，先前使用“迷失在商场”范式的研究大大高估了产生错误记忆的人的比例。作者提出了三个步骤来改进记忆植入研究的方法：排除对目标事件有潜在真实经历的参与者，使用核心细节作为错误记忆的最低标准，以及在研究人员识别错误记忆的同时考虑自我报告措施。Andrews和Brewin（2024）使用逐步排除方法证明，将这些方法改进应用于Murphy等人（2023）的数据导致错误记忆率大幅降低，仅为4%。在他们最近的评论中，Wade等人（2025）质疑了这个数字的有效性，认为作者的标准排除了由暗示的细节和其他来源的记忆痕迹组合而成的真正的错误记忆。尽管如此，Andrews和Brewin（2024）的观点认为，完全错误的记忆比记忆植入研究更少见，这将使我们相信，这对错误记忆概念的现实应用以及在记忆研究领域使用研究人员编码的数据都有影响。自最初的“迷失在商场”研究发表以来的30年里，错误记忆研究的结果已经远远超出了“恢复记忆”的范围，记忆植入显然很容易，参与者产生错误记忆的比例很高，这导致了一种普遍的观点，即儿童性虐待的错误记忆很常见，并可能导致数量未知的误判(暴雪和肖2019；克鲁克和麦克尤恩2019)。Wade等人（2025）对Andrews和Brewin（2024）的评论也表达了对司法不公的担忧，他们指出“实验室中的错误记忆率可能低估了真实案例中的错误记忆率，而研究表明，现实中存在的因素可能会夸大错误记忆形成的可能性”(3)。我们认为，当涉及到儿童性虐待的指控时，事实正好相反，记忆植入实验给人的印象是，基于错误记忆的指控比实际情况更常见。错误记忆率的膨胀部分是因为实验室研究依赖于一套特定的高度暗示性的技术来诱发错误记忆，部分是因为实验室研究没有考虑到在真实案例中降低错误指控可能性的因素。为了评估实验室研究对真实案例中错误记忆的频率提供了保守估计的说法，分解“迷失在商场”范式和其他记忆植入设计所依赖的大量暗示和欺骗方法是有帮助的，这些方法使参与者相信他们经历了一个虚假的事件。首先，研究者向参与者提供事件的核心细节，以“提醒”他们发生了什么，包括主要行动、事件发生的时间和地点、参与者的情绪反应以及危机的解决(Loftus and Ketcham 1994；Loftus,Pickrell;Murphy et al. 2023)。这些研究人员提供的元素提供了一个连贯的叙事框架，作为一个脚本或图式，使参与者很容易“填写”细节，即使他们没有亲身经历过。其次，引导参与者相信这些核心细节是由事件发生时在场的可信任的家庭成员提供的。鉴于实验范式不涉及参与者、亲属或其他任何人的利害关系，参与者没有理由怀疑他们的亲属会误导他们。此外，对虚假事件的描述是在参与者阅读了真实记忆的摘要之后呈现的，这消除了对这些描述真实性的潜在怀疑。并非所有的记忆植入研究都涉及这些暗示的“把戏”，但那些改变已证实公式的研究倾向于添加不同的欺骗元素，例如向参与者展示与目标事件相关的假照片或材料(例如，Braun等人，2002；Wade et al. 2002)。此外，错误记忆研究还依赖于重复的回忆尝试（包括小册子和连续的采访），以最大限度地提高参与者默认建议的可能性。尽管成人和儿童可以通过多次采访准确地回忆起他们真实经历的事件，但大量研究表明，由于暗含技巧和重复回忆场合的结合，错误细节的增加（La Rooy et al. 2009）。因此，尽管有些人持相反的观点，植入虚假记忆并不是一件简单的事情，它需要在实验室条件下使用一套特定的高度暗示性的技术。也许更重要的是，研究表明，真实案例涉及的因素可以减少对儿童性虐待的虚假指控的可能性，包括事件的不真实性，儿童不愿披露虐待，以及防止基于错误记忆的司法不公的程序保障。Andrews和Brewin（2024）强调，对“在商场迷路”范式的一种批评是，“迷路”是一种常见的或看似合理的事件。即使我们自己没有经历过迷失，我们中的大多数人也有一种模式；迷失是许多书籍、电视节目和其他媒体的主题，所以很容易想象迷失会是什么样子。然而，大多数儿童，没有接触到极端的暗示，没有儿童性虐待的图式。Pezdek和Hodge（1999）进行了一项研究，他们观察了年龄较小（5-7岁）和较大（9-12岁）的儿童对可信事件（如在商场迷路）和不可信事件（如接受直肠灌肠）的错误记忆的易感性。在这项研究中，大多数孩子都没有回忆起虚假事件，但那些回忆起可信事件的孩子比不可信事件的孩子更容易回忆起可信事件。该研究的结论是，不太可能将错误记忆植入不太可信的事件，这与一项研究一致，该研究显示，在真实的医学检查中，即使是最小的孩子，对生殖器触摸的错误暗示的默许率也非常低（Saywitz et al. 1991）。这一点很重要，因为我们认为，在法庭上，“迷失在商场”的范式是如何不恰当地应用于儿童性虐待案件的。在“迷失在商场”范式中，孩子们被测试关于一个看似合理的记忆事件，他们被告知父母说的是真的，这是嵌入在其他真实事件中的。除了没有经历过性虐待的儿童的性虐待叙述不可信之外，我们还知道，经历过虐待的儿童不愿意向成年人透露（Lyon等人，n.d），这意味着儿童报告性虐待虚假叙述的可能性远远低于与无害事件有关的虚假叙述。披露性虐待的动机障碍包括儿童对父母反应的担忧，以及对儿童和家庭的指控所带来的负面影响（Lemaigre et al. 2017）。此外，被家庭成员虐待或被信任的成年人培养的儿童通常会表达对施虐者的爱和关怀，并且由于担心施虐者可能面临的后果而不愿意报告虐待(Christensen et al. 2015；Lemaigre et al. 2017)。最后，许多受害者对自己受到的虐待感到羞耻、内疚和自责(Alaggia et al. 2019；Goodman-Brown et al. 2003；Hershkowitz et al. 2007)，进一步增加了他们不愿透露的意愿。事实上，这些披露的障碍是如此之强，以至于估计有50%的被证实的受害者在被询问时最初否认虐待（Lyon et al. n.d.），这表明对虐待的虚假否认可能比基于错误记忆的指控构成更大的司法障碍。除了内部和外部因素降低了真实受害者基于对性虐待的错误记忆提出指控的可能性之外，Andrews和Brewin（2024）还指出，在真实案件中存在防止错误记忆影响的程序保障措施，例如对抗性法律体系中陪审团审判的额外审查。也许是预料到了这种批评，Murphy等人（2023）扩展了最初的记忆植入设计，进行了一个模拟陪审团实验，要求1024名非专业“陪审员”阅读参与者对“在商场迷路”事件的描述，并就他们是否反映了真实的记忆提供是/否的判断。模拟陪审员相信假事件的记忆是真实的比例（39%）甚至比研究人员（35%）更高，这表明外行观察者可能很难区分真实和虚假的记忆。作者解释说，他们决定不警告模拟陪审员，有些记忆可能是错误的，他们的目的是“反映真正的陪审员听证人描述他们过去的事件的经历”（822）。然而，我们不同意研究人员创造了一个现实的审判场景的说法，因为现实生活中的案件通常涉及开庭和结案陈述，证人的交叉询问，以及关于可信度和举证责任的具体陪审团指示。在开庭陈述中，辩方律师经常告诉真正的陪审员，证人可能会撒谎。实际上，整个庭审过程中，辩方的目标就是让人对控方证据的真实性产生怀疑。此外，在案件进入法庭之前，孩子所披露信息的可信度要经过许多不同专业人士的多次审查。Block等人（2023）研究了美国500起报告的儿童性虐待案件，发现只有53%的案件被调查，只有17%的案件进入法庭，这表明案件在刑事法律体系中必须满足严格的标准。因此，我们倾向于同意Andrews和Brewin（2024）的结论，即记忆植入研究高估了产生错误记忆的参与者的比例，而观察者在法律背景下会认为这些记忆是真实的。总之，我们认为虚假记忆植入的实验室研究的外部效度太低，无法为涉及儿童性虐待指控的实际调查提供有意义的信息。虽然，正如所证明的那样，实验室研究依赖于多种高度暗示性的技术来诱导错误记忆，但Wade等人（2025）指出，真实案例可能涉及实验室中不存在的额外暗示性影响，这是正确的。然而，这些因素的影响可能会被降低虚假指控可能性的因素和减轻错误记忆导致司法不公风险的程序保障所抵消。除了对现实生活中的性虐待案件的影响外，Andrews和Brewin（2024）重新分析的结果还提出了关于在某些心理学研究领域广泛使用的编码方法的可靠性和有效性的问题。如果用不同的编码方法对相同的数据集进行重新编码，会导致一项研究的主要假设得出截然不同的结论，那么我们怎么能相信任何依赖于研究人员编码数据的研究结果呢？首先，依靠人工编码原始数据的研究人员必须确保他们的编码是可靠的，这意味着他们的编码指南包含明确的规则，编码人员客观准确地遵循这些规则。在心理学中，可靠性通常是通过测量多个编码员之间的一致性来评估的，最常见的是通过计算科恩Kappa。尽管在该领域存在差异，但人们普遍认为Kappa值等于或高于0.8（有时为0.7）反映了编码人员之间几乎完美的一致，表示可靠的编码方法。Murphy et al.（2023）和Andrews and Brewin（2024）都没有达到这一既定标准，评分者之间的一致性低至k = 0.60 （Murphy et al. 2023）和k = 0.49 （Andrews and Brewin 2024）。这些数字最多反映了编码人员之间的适度一致，并且正在接近被认为是“可靠”编码的下限。鉴于评分者之间的一致性数据较低，即使研究团队使用与原始研究完全相同的编码指南，Murphy等人（2023）和Andrews和Brewin（2024）的结果是否可以复制也是值得怀疑的。低信度数据限制了人们可以从定量分析中得出的结论的强度，因此不幸的是，Murphy等人（2023）和Andrews和Brewin（2024）在讨论他们的发现时，都没有强调编码员之间相对较低的一致性（k分别最低= 0.60和0.49）。实现编码可靠性的困难可能表明研究设计中存在更普遍的问题，例如对“部分记忆”等模糊概念的操作挑战（Murphy et al. 2023）。当心理学概念的客观定义难以捉摸时，另一种方法是将有问题的变量解构成更小、更容易限制的组件。尽管Andrews和Brewin（2024）的以细节为中心的编码指南旨在利用这种方法，但对于作者确定的六个核心细节中的两个，仍然存在实质性的内部分歧。导致互译器可靠性低的一个潜在因素是人为错误，这可能在依赖于大量复杂数据的人工编码的研究设计中发挥重要作用，即使编码类别已经明确定义。在这方面，机器辅助编码方法的发展是提高研究可靠性的一个有前途的途径，因为机器学习模型在编码采访记录时的准确性优于人工编码员（Szojka et al. 2025）。虽然训练机器模型仍然需要一个可靠编码数据的初始数据集，但训练后的模型可以应用于依赖相同编码类别的新数据集。这种方法的优点是提供了一种标准化的编码方法，可以在解决相同研究问题的研究中使用，就像“迷失在商场”实验的大量直接和准复制一样。然而，正如Andrews和Brewin（2024）所证明的那样，关于“迷失在商场”研究结果及其重复的问题超出了编码指南的可靠性，而涉及其有效性；研究者定义的错误记忆在多大程度上反映了参与者的真实记忆。即使Murphy等人（2023）的编码器实现了完美的可靠性，用错误记忆的新操作定义重新分析数据也可能产生不同的结果。Andrews和Brewin（2024）的重新分析并不是第一个对记忆植入设计中错误记忆的定义和测量的有效性提出质疑的研究。2015年，一组研究人员批评了一项惊人的研究结果，该研究报告称70%的参与者成功地诱导了错误的犯罪记忆（Shaw and Porter 2015），认为错误记忆率被夸大了，因为作者未能区分错误信念和错误记忆（Wade et al. 2018）。Wade等人（2018）使用两种区分错误信念和错误记忆的独立编码方案对Shaw和Porter（2015）的数据进行了编码(Lindsay等人，2004；Scoboria et al. 2017)，并获得了更为保守的26%-30%的错误记忆率。作者强调了两种分析结果之间的明显差异，认为避开既定的编码方法而支持新的定义会导致不精确，“助长对记忆研究的怀疑，并减少对现实世界行为的理解”（Wade et al. 2018, 474）。Andrews和Brewin（2024）对Murphy等人（2023）收集的数据的重新分析是基于一个前提，即相对完善的部分和完全错误记忆分类方法本身缺乏有效性，因此有必要引入错误记忆的新概念。与Wade等人（2018）的建议一致，Andrews和Brewin（2024）确保他们的方法与之前的研究明确定位，通过(1)提供数据编码方式的全面描述，(2)解释他们开发替代编码方法的动机，以及(3)报告他们的发现以及Murphy等人（2023）使用不同编码指南获得的结果。虽然这些步骤确实有助于使关于错误记忆的定义和测量的对话在该领域更加透明，但它们最终无法回答错误记忆的许多定义中哪一个是正确的问题。为了确定Andrews和Brewin（2024）提出的错误记忆概念是否改进了Murphy等人（2023）或Loftus和Pickrell（1995）先前使用的定义，研究人员需要检查受控实验条件之外的世界，并调查错误记忆概念在真实案例中的意义和有用性。总之，如果问题是Loftus和Pickrell（1995）的“迷失在商场”研究的发现是否可以被复制，答案必须是一个自信的“是”。Loftus和Pickrell（1995）的“迷失在商场”研究是记忆研究史上最具影响力和令人惊讶的实验之一。在30年的时间里，包括meta分析（Scoboria et al. 2017）和直接复制（Murphy et al. 2023）在内的大量研究证实，在严格控制的实验条件下，相当多的少数参与者可能会被误导，报告虚假事件的细节，就好像他们记得它们一样。Andrews和Brewin（2024）的贡献是将争论从错误记忆现象的可靠性转移到质疑其有效性。如果参与者自己都认为自己不记得，那么研究人员说一个人形成了错误记忆，这有意义吗？研究者认定的错误记忆能否成为法庭上可信的证据？对这些问题的重新关注支持了我们的观点，即使用记忆植入范式将实验室研究报告的错误记忆率解释为现实生活中很大一部分儿童性虐待指控是基于错误记忆的证据仍然是不合适的。即使“迷失在商场”研究的进一步复制能够引入可靠和有效的测量错误记忆的方法，范式本身仍然无法解释儿童虐待调查的现实背景。我们认为，观察性研究和新的、生态有效的研究范式比持续解构和重建一个30年的实验更好地服务于防止司法不公的目标。Zsofia A. Szojka：写作-原稿，写作-审查和编辑，概念化。斯蒂芬妮布洛克：写作-审查和编辑，概念化。David La Rooy：写作-评论和编辑，概念化。由于这项研究完全基于已发表的研究，因此不需要伦理批准。作者声明无利益冲突。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Is It Time to Leave the Shopping Mall Behind? Measurement Flaws, Plausibility, and External Validity of False Memory Research

This commentary discusses the recently published article by Andrews and Brewin (2024) that reanalyzed data collected by Murphy et al. (2023) to replicate the well-known “lost in the mall” study first published by Loftus and Pickrell (1995). We begin by outlining initial and more recent findings that brought the “lost in the mall” paradigm to the forefront of false memory research before considering the thought-provoking results of the reanalysis by Andrews and Brewin (2024). We then highlight some of the implications of the reanalysis for child sexual abuse investigations, and more broadly, for the reliability and validity of psychological research that relies on researchers' coding and interpretation of information provided by participants about the content of their memories. We ask whether the definition and measurement of false memories within laboratory experiments can be meaningfully applied to real-life debates concerning justice for alleged victims and perpetrators of sexual abuse.

In the 1970s Elizabeth Loftus and her team conducted a series of highly influential experiments demonstrating that misleading information received after a personal experience can lead people to make mistakes when they later try to describe what happened (Loftus and Palmer 1974; Loftus 1975). After establishing the impact of misinformation on memory for personal experiences, an innovative research paradigm was designed to demonstrate that memories of entire events that never occurred could be implanted in people's minds with relative ease. Loftus and Pickrell (1995) misled 24 adult participants to believe that their family members provided descriptions of four true past events, but unbeknownst to the participants, one of the supposed true events, being “lost in the mall”, was made up by the researchers. After participants were told that they had been lost in the mall many years earlier they were then asked to recall what they could remember in writing and verbally and rate the clarity of their memories. The results showed that a quarter of the participants were successfully induced to claim that they remembered the false event, although their average clarity ratings for the false memory were substantially lower than scores assigned to true events. (1) The “lost in the mall” study resulted in a “veritable explosion of cognitive research on the topic of false memory” (Pezdek and Lam 2007), (2), and led to the establishment of a new view of human memory as being particularly fragile and easily manipulated.

However, while most memory researchers accept that false memory implantation is possible, the proportion of people who can be induced to develop false memories has been the subject of fierce debate (Wade et al. 2002). Scrutiny of false memory implantation experiments identified two main challenges concerning the definition of false memories: (1) differentiating between false beliefs and false memories, and (2) differentiating between flawed memories and false memories. The first challenge stems from the difficulty of determining whether participants in memory implantation studies genuinely remember the false event or simply believe the researchers' assertion that the false event had occurred. Failure to differentiate the former (false memories) from the latter (false beliefs) could easily inflate the rate of false memories reported in memory implantation studies (Wade et al. 2002). The second issue results from the potential confusion between false memories, defined as entire false events that have been implanted into memory, and flawed memories, referring to incorrect details in otherwise true memories (Pezdek and Lam 2007). This is a particular concern for versions of the memory implantation paradigm that rely on relatively common experiences, such as being lost in the mall. In these studies, there is a non-negligible probability that some of the participants have truly experienced situations like the false events suggested, leading again to inflated rates of false memories reported by researchers.

The debate about defining and categorizing false memories has been reignited by the recent publication of a much larger scale replication of the original “lost in the mall” study by Murphy et al. (2023). Aiming to address prior criticism of the research paradigm, the researchers introduced a novel coding scheme that explicitly differentiates complete false memories (substantial remembering of all central details relating to the false memory), and partial false memories (partial recall of the false memory). Results showed a slightly higher rate of successful memory implantation than the 25% reported in the original “lost in the mall” study (Loftus and Pickrell 1995), with Murphy et al. (2023) classifying 8% of participants as having full false memories, and a further 27% as having partial false memories. However, consistent with the findings of the original study, participants' clarity ratings of the false memories were relatively low, and consistently lower than those of true memories. Importantly, fewer than half (14%) of the participants judged by researchers as having false memories self-reported remembering the event. In addition to publishing their findings, Murphy et al. (2023) made their data file and raw data public allowing for the possibility for other researchers to re-analyse their data.

Andrews and Brewin's (2024) reanalysis of Murphy et al. (2023) data explicitly aimed to address the aforementioned criticisms of the “lost in the mall” paradigm: (1) that researcher-identified “false memories” may not reflect genuine remembering, and (2) that researcher-identified “false memories” may actually be distortions of participants' true memories. To investigate these concerns, the authors devised a more systematic coding approach that relies on counting the number of core details participants report about the suggested event and assessing the clarity of each of those details on a scale from “no mention” to “explicit recall”. The researchers also attempted to identify potentially true experiences by coding for mentions of being lost in different circumstances to the fake event, being lost on more than one occasion, or experiences similar to the target event that did not actually involve being lost.

The findings of the reanalysis cast doubt on claims that 25%–35% of people can be “led to remember entire events that never actually happened to them” (Loftus and Pickrell 1995, 725). The re-coded data indicated that on average, participants judged by Murphy et al.'s (2023) research team as having a false memory explicitly recalled only 1.47 of the 6 core details. Even the participants judged to have full false memories tended to recall fewer than half of the core details, and 20% did not explicitly recall the most fundamental detail of actually “being lost”. As noted by Murphy et al. (2023), the rate of participants who self-reported remembering the target event (14%) was substantially lower than the rate of participants judged by the researchers as having developed a false memory (35%). Andrews and Brewin (2024) showed that participants' own criteria for remembering was related to the clarity of core details; participants who self-reported having developed a false memory explicitly mentioned significantly more core details than those who did not believe they remembered the event.

The results of the re-analysis also validate prior concerns that some participants judged by researchers as having developed false memories may be referring to potentially true experiences. According to Andrews and Brewin (2024), 31% of participants produced descriptions of past experiences that were similar to the fake event but distinguished by key differences in core details, such as being lost in a different shopping location or being abandoned rather than being lost. The impact of these potentially true experiences on false memory rates was not negligible, as they were present in the accounts of 50% of those judged by Murphy et al. (2023) as having full false memories and 52% of those with partial false memories.

Based on the results of their reanalysis, Andrews and Brewin (2024) concluded that previous studies using the “lost in the mall” paradigm have substantially overestimated the proportion of people who have developed false memories. The authors suggest three steps to improve the methodology of memory implantation studies: the exclusion of participants with potentially true experiences of the target event, the use of core details as minimum criteria for false memories, and the consideration of self-report measures alongside researcher identification of false memories. Using a step-by-step exclusion approach, Andrews and Brewin (2024) demonstrated that applying these methodological improvements to the data of Murphy et al. (2023) resulted in a drastically lowered false memory rate of only 4%. In their recent commentary, Wade et al. (2025) question the validity of this figure, suggesting that the authors' criteria exclude genuine false memories constructed from a combination of suggested details and memory traces from other sources. Nonetheless, Andrews and Brewin's (2024) argument that entirely false memories are more infrequent than memory implantation research would lead us to believe has implications both for the real-world application of the concept of false memories, and for the use of researcher-coded data in the field of memory research.

In the 30 years since the publication of the original “lost in the mall” study, the results of false memory research have been applied far beyond “recovered memory” cases, with the apparent ease of memory implantation and high rate of participants who develop false memories leading to a widespread view that false memories of child sexual abuse are common and have likely resulted in an unknown number of miscarriages of justice (Blizard and Shaw 2019; Crook and McEwen 2019). Concerns about miscarriages of justice have also been expressed by Wade et al. (2025) commentary on Andrews and Brewin (2024), stating that “false memory rates in the lab might underestimate those in real cases, where factors are present that research has shown can exaggerate the likelihood that false memories are formed” (3). We argue that when it comes to claims of child sexual abuse, the opposite is true, with memory implantation experiments giving the impression that accusations based on false memories are more common than they really are. This inflated false memory rate occurs partly because lab studies rely on a specific set of highly suggestive techniques to induce false memories, and partly because lab studies fail to account for factors that reduce the likelihood of false allegations in real cases.

To evaluate the claim that laboratory research provides a conservative estimate of the frequency of false memories in real cases, it is helpful to break down the numerous methods of suggestion and deceit that the “lost in the mall” paradigm and other memory implantation designs rely on to convince participants that they experienced a fake event. Firstly, the researchers provide the participant with the core details of the event in order to “remind them” of what happened, including the main action, the time and location of the event, the participant's emotional reaction, and the resolution of the crisis (Loftus and Ketcham 1994; Loftus & Pickrell; Murphy et al. 2023). These researcher-provided elements provide a coherent narrative framework that serves as a script or schema, making it easy for the participants to “fill in” the details even if they have not personally experienced them. Secondly, the participant is led to believe that these core details were provided by a trusted family member who was present when the event occurred. Given that the experimental paradigm involves no stakes for the participant, the relative, or anyone else, there is no reason for participants to suspect that their relatives would mislead them. Moreover, the description of the false event is presented after the participant has read the summaries of true memories, eliminating potential doubts about the veracity of the accounts. Not all memory implantation studies involve these “tricks” of suggestion, but those that alter the proven formula tend to add a different element of deception, such as showing participants fake photographs or materials related to the target event (e.g., Braun et al. 2002; Wade et al. 2002). Furthermore, false memory studies also rely on repeated recall attempts (three including the booklet and consecutive interviews) to maximize the likelihood that participants will acquiesce to suggestion. Although adults and children can recall events they truly experienced accurately across multiple interviews, a wealth of research has demonstrated an increase in erroneous details resulting from a combination of suggestive techniques and repeated recall occasions (La Rooy et al. 2009). Thus, despite some claims to the opposite, implanting false memories is no simple matter and requires the use of a specific set of highly suggestive techniques under laboratory conditions.

Perhaps even more importantly, real cases involve factors that research has shown can reduce the likelihood of false allegations of child sexual abuse, including the implausibility of the event, children's reluctance to disclose abuse, and the presence of procedural safeguards to prevent miscarriages of justice based on false memories. Andrews and Brewin (2024) highlight that one criticism of the “lost in the mall” paradigm is that being “lost” is a common or plausible event. Even if we have not experienced being lost ourselves, most of us have a schema for it; that is being lost is a theme of many books, television shows and other media, so it is easy to imagine what it would be like to be lost. Most children, however, without the exposure to extreme suggestion, do not have a schema for child sexual abuse. Pezdek and Hodge (1999) conducted a study where they looked at younger (5–7 years) and older (9–12 years) children's susceptibility to accept a false memory for plausible (i.e., being lost in the mall) versus implausible events (i.e., receiving a rectal enema). Most children in this study did not report remembering either false event, but those who did were far more likely to recall the plausible event than the implausible event. The study's conclusion that false memories are not likely to be implanted for less plausible events is consistent with research showing very low rates of acquiescence to false suggestions of genital touch during a real medical examination, even among the youngest children (Saywitz et al. 1991). This is important as we consider how the “lost in the mall” paradigm has been, in our opinion, inappropriately applied to child sexual abuse cases in the courtroom. In the “lost in the mall” paradigm, children are tested about a plausible memory event that they are told their parents said was true, and that is embedded in other true events.

Beyond the implausibility of sexual abuse narratives for children who have not experienced abuse, we also know that children who have experienced abuse are reluctant to disclose to adults (Lyon et al. n.d.), implying that children are far less likely to report false narratives of sexual abuse than false narratives pertaining to innocuous events. Motivational barriers to disclosing sexual abuse include children's concerns about their parents' reactions and the perceived negative consequences of the allegations for the child and the family (Lemaigre et al. 2017). Furthermore, children who were abused by a family member or groomed by a trusted adult often express feelings of love and care toward the perpetrator and are reluctant to report the abuse due to concerns about the consequences the abuser may face (Christensen et al. 2015; Lemaigre et al. 2017). Finally, many victims feel shame, guilt, and self-blame about their abuse (Alaggia et al. 2019; Goodman-Brown et al. 2003; Hershkowitz et al. 2007), further increasing their reluctance to disclose. Indeed, these barriers to disclosure are so strong that an estimated 50% of substantiated victims initially deny abuse when questioned (Lyon et al. n.d.), suggesting that false denials of abuse likely pose a much greater obstacle to justice than allegations based on false memories.

In addition to internal and external factors that reduce the likelihood of real victims making allegations based on false memories of sexual abuse, Andrews and Brewin (2024) note the presence of procedural safeguards against the impact of false memories in real cases, such as the extra scrutiny of jury trials in adversarial legal systems. Perhaps anticipating this criticism, Murphy et al. (2023) extended the original memory implantation design with a mock jury experiment in which 1024 lay “jurors” were asked to read participants' descriptions of the “lost in the mall” event and provide a yes/no judgment regarding whether they reflect genuine memories. Mock jurors believed that memories of the fake event were real even more frequently (39%) than the researchers (35%), demonstrating that lay observers may find it difficult to distinguish between true and false memories. The authors explain their decision not to warn mock jurors that some memories may be false by stating their aim to “mirror the experience of real jurors listening to a witness describe events from their past” (822). However, we disagree with the claim that the researchers created a realistic trial scenario, as real-life cases typically involve opening and closing statements, cross-examination of witnesses, and specific jury instructions around credibility and the burden of proof. Real jurors are often told by defense attorneys in the opening statement that witnesses may lie, and indeed the goal of the defense throughout the trial is to cast doubt on the truth of prosecution's evidence. Furthermore, before a case even reaches the courtroom, the credibility of the child's disclosure is scrutinized many times by many different professionals. Having examined 500 reported cases of child sexual abuse in the United States, Block et al. (2023) found that only 53% of cases were investigated and as few as 17% progressed to court, demonstrating the rigorous criteria cases must meet to move forward in the criminal legal system. Thus, we are inclined to agree with Andrews and Brewin's (2024) conclusion that memory implantation studies overestimate the proportion of participants who develop false memories that observers would judge genuine in a legal context.

In conclusion, we argue that the external validity of laboratory research on false memory implantation is too low to meaningfully inform real investigations involving allegations of child sexual abuse. Although, as demonstrated, laboratory research relies on multiple highly suggestive techniques to induce false memories, Wade et al. (2025) are correct in pointing out that real cases may involve additional suggestive influences that are not present in the laboratory. However, the impact of these is likely outweighed by the presence of factors that reduce the likelihood of false allegations, and procedural safeguards that mitigate the risk of false memories resulting in miscarriages of justice.

In addition to the implications for real-life sexual abuse cases, the results of Andrews and Brewin's (2024) re-analysis also raise questions about the reliability and validity of coding approaches used widely within some areas of psychological research. If re-coding the same dataset with a different coding approach leads to wildly different conclusions with regard to the main hypotheses of a study, how can we trust the results of any research relying on researcher-coded data?

Firstly, researchers relying on manually coded raw data must ensure that their coding is reliable, meaning that their coding guide contains clear rules that the coders follow objectively and accurately. In psychology, reliability is generally assessed through measuring inter-rater agreement among multiple coders, most commonly by calculating Cohen's Kappa. Although there are variations in the field, it is generally accepted that Kappa values at or above 0.8 (sometimes 0.7) reflect almost perfect agreement between coders, signifying a reliable coding approach. Both Murphy et al. (2023) and Andrews and Brewin (2024) fall short of this established standard, with inter-rater agreement as low as k = 0.60 (Murphy et al. 2023) and k = 0.49 (Andrews and Brewin 2024). These figures reflect at best moderate agreement between coders and are approaching the lower limit of what could be considered “reliable” coding. Given the low inter-rater agreement figures, it is questionable whether the results of Murphy et al. (2023) and Andrews and Brewin (2024) could be replicated even if a research team used the exact same coding guide as the original studies.

Low reliability figures limit the strength of conclusions one can draw from quantitative analysis, so it is unfortunate that neither Murphy et al. (2023) nor Andrews and Brewin (2024) highlight the relatively low inter-rater agreement achieved by coders (k = 0.60 and 0.49 at the lowest, respectively) when discussing their findings. Difficulties with achieving reliability in coding may be indicative of more pervasive problems with the study design, such as challenges with operationalizing vague concepts like “partial memory” (Murphy et al. 2023). When objective definitions of psychological concepts prove elusive, an alternative approach is to deconstruct the variables in question into smaller, more easily circumscribed components. Despite Andrews and Brewin's (2024) detail-focused coding guide aiming to capitalize on this approach, there was still substantial inter-rater disagreement with regards to two out of the six core details identified by the authors. One potential contributor to low interrater reliability is human error, which might play a significant role in research designs relying on manual coding of large amounts of complex data even when the coding categories are clearly defined. In this respect, the development of machine-assisted coding approaches is a promising avenue for increasing the reliability of research studies, as machine learning models have been found to outperform manual coders in accuracy when coding interview transcripts (Szojka et al. 2025). Although training machine models still requires an initial dataset of reliably coded data, the trained model can then be applied to new datasets that rely on the same coding categories. This method has the advantage of providing a standardized coding approach that can be used across studies addressing the same research question, as is the case for the numerous direct and quasi-replications of the “lost in the mall” experiment.

However, as Andrews and Brewin (2024) demonstrate, questions about the results of the “lost in the mall” study and its replications go beyond the reliability of the coding guide and concern its validity; the extent to which researcher-defined false memories reflect genuine remembering on the part of the participant. Even if coders in Murphy et al. (2023) achieved perfect reliability, re-analysing the data with a new operational definition of false memories may produce different results. The Andrews and Brewin (2024) re-analysis is not the first study to raise questions about the validity of the definition and measurement of false memories in memory implantation designs. In 2015, a team of researchers criticized the astonishing findings of a study that reported successful induction of false memories of committing a crime in 70% of participants (Shaw and Porter 2015), arguing that the false memory rate was inflated by the authors' failure to differentiate between false beliefs and false memories (Wade et al. 2018). Wade et al. (2018) recoded the data of Shaw and Porter (2015) using two separate coding schemes that distinguish false beliefs from false memories (Lindsay et al. 2004; Scoboria et al. 2017) and obtained a more conservative false memory rate of 26%–30%. Highlighting the stark difference in results between the two analyses, the authors suggest that eschewing established coding approaches in favor of new definitions leads to imprecision that “fuels skepticism of memory research and detracts from the understanding of real-world behavior” (Wade et al. 2018, 474).

Andrews and Brewin's (2024) re-analysis of the data collected by Murphy et al. (2023) is based on the premise that the relatively well-established approach of categorizing partial and full false memories itself lacks validity, necessitating the introduction of a new conceptualisation of false memories. In line with Wade et al. (2018) suggestions, Andrews and Brewin (2024) ensured that their approach is clearly positioned in relation to previous research by (1) providing a comprehensive description of how the data was coded, (2) explaining their motivation for developing an alternative coding approach, and (3) reporting their findings alongside the results obtained with a different coding guide by Murphy et al. (2023). While these steps certainly contribute to making the dialog about the definition and measurement of false memories in the field more transparent, they ultimately cannot answer the question of which of the many definitions of false memories is correct. To determine whether the concept of false memories suggested by Andrews and Brewin (2024) improves on previous definitions used by Murphy et al. (2023) or indeed Loftus and Pickrell (1995), researchers need to examine the world beyond controlled experimental conditions and investigate the meaning and usefulness of the concept of false memories in real cases.

In conclusion, if the question is whether the findings of Loftus and Pickrell's (1995) “lost in the mall” study, one of the most influential and surprising experiments in the history of memory research, can be replicated, the answer has to be a confident “yes”. Over a period of 30 years, a multitude of studies, including a meta-analysis (Scoboria et al. 2017) and a direct replication (Murphy et al. 2023), confirmed that under strictly controlled experimental conditions, a substantial minority of participants can be misled to report details of fake events as if they remembered them. The contribution of Andrews and Brewin (2024) is to move the debate away from the reliability of the false memory phenomenon and challenge its validity. Is it meaningful for researchers to state that an individual has developed a false memory if the participant herself does not think she remembers it? Would a researcher-identified false memory constitute credible evidence at court? Renewed interest in these questions supports our view that it remains inappropriate to interpret false memory rates reported by laboratory studies using a memory implantation paradigm as evidence that a substantial proportion of real-life allegations of child sexual abuse are based on false memories. Even if further replications of the “lost in the mall” study were able to introduce reliable and valid methods of measuring false memories, the paradigm itself will still fail to account for the real-life context of child abuse investigations. We argue that the aim of preventing miscarriages of justice is better served by observational studies and new, ecologically valid research paradigms than by the continued deconstruction and reconstruction of a 30-year-old experiment.

Zsofia A. Szojka: writing – original draft, writing – review and editing, conceptualization. Stephanie Block: writing – review and editing, conceptualization. David La Rooy: writing – review and editing, conceptualization.

An ethics approval was not required, as this study is based exclusively on published research.

The authors declare no conflicts of interest.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Applied Cognitive Psychology PSYCHOLOGY, EXPERIMENTAL-

CiteScore

4.30

自引率

8.30%

发文量

111

期刊介绍： Applied Cognitive Psychology seeks to publish the best papers dealing with psychological analyses of memory, learning, thinking, problem solving, language, and consciousness as they occur in the real world. Applied Cognitive Psychology will publish papers on a wide variety of issues and from diverse theoretical perspectives. The journal focuses on studies of human performance and basic cognitive skills in everyday environments including, but not restricted to, studies of eyewitness memory, autobiographical memory, spatial cognition, skill training, expertise and skilled behaviour. Articles will normally combine realistic investigations of real world events with appropriate theoretical analyses and proper appraisal of practical implications.