发布求助

文献互助智能选刊最新文献

JOM Forum: Theory Testing Is Theory Generation

IF 10.4 2区管理学 Q1 MANAGEMENT

Journal of Operations Management Pub Date : 2026-04-02 Epub Date: 2026-03-09 DOI:10.1002/joom.70039

Mikko Ketokivi, Saku Mantere, Herman Aguinis, Richard Makadok, Morgan Swink, Elliot Bendoly, Rogelio Oliva

{"title":"JOM Forum: Theory Testing Is Theory Generation","authors":"Mikko Ketokivi, Saku Mantere, Herman Aguinis, Richard Makadok, Morgan Swink, Elliot Bendoly, Rogelio Oliva","doi":"10.1002/joom.70039","DOIUrl":null,"url":null,"abstract":"In this paper, we propose that theory-testing research offers just as much potential for generating theory as theory-building and theory-elaborating research, the two variants typically associated with theory generation (Ketokivi and Choi 2014; Lee et al. 1999). Responding to Bendoly and Oliva's (2025) call for searching meaningful theoretical pathways for research contributions, we suggest that theory-testing research has always constituted a meaningful pathway to theoretical contributions when it extends beyond merely applying theory to challenging, expanding, and elaborating it. These extensions can lead to significant adjustments in bodies of knowledge over time as research programs progress.To understand the generative aspect of theory testing, we must distinguish it from theory application. When we apply theory, the objective is usually to address a practical problem, without the interest of contributing to an ongoing theoretical conversation. In empirical operations management (OM) research, the application of factory physics offers an illustrative example: Researchers apply concepts such as Little's Law and laws of variability to improve factory productivity (Schmenner and Swink 1998). In this context, theory consists of the relevant applicable laws that are treated as given, which makes theory effectively axiomatic from an epistemological point of view (Popper 1935/2005, 51).\n 1\n \n In stark contrast to theory application, the fundamental idea in theory-testing research is to place the theory itself under empirical scrutiny. Accordingly, theory is no longer treated as self-evident and certain but propositional and conjectural, subject to revisions (Lakatos 1970; Popper 1963).As an example of theory-testing research, consider Williamson's (1971) question “Why do firms integrate vertically?” This question gave birth to transaction cost economics (TCE), one of the most influential and established research programs on organizational boundaries (Santos and Eisenhardt 2005). The theoretical essence of TCE is succinctly captured by the discriminating alignment hypothesis: “Transactions, which differ in their attributes, are aligned with governance structures, which differ in their costs and competencies, in a discriminating (mainly transaction cost economizing) way” (Williamson 1996, 46–47). Importantly, this statement is not meant as axiomatic but conjectural, as the word ‘hypothesis’ implies: Whether actual governance decisions align transactions and governance structures in a “mainly transaction cost economizing way” is to be settled empirically.Consider Walker and Weber's (1984) seminal TCE-based study that examined the make-or-buy decision in the final assembly of automobiles. TCE-as-conjecture becomes salient in the discussion section where several TCE's central propositions are called into question based on the empirical analysis. For example, the finding that “the effect of transaction costs on make-or-buy decisions was substantially overshadowed by comparative production costs” (Walker and Weber 1984, 387) is inconsistent with TCE's original central proposition that transactions will be aligned with governance structures in “mainly transaction cost economizing” (Williamson 1996, 47, emphasis added) way. When the qualifier “mainly” is interpreted as conjectural and malleable, empirical research not only tests but also informs theory. Walker and Weber's (1984) findings suggest that while transaction costs are relevant, they constitute only a portion of total costs, which are decisive in make-or-buy decisions. Such findings, and many others, have expanded TCE's focus over time from transaction costs to total costs. Another more recent development is that instead of focusing on costs, researchers have incorporated the revenue side into the comparative analysis as well (Ketokivi and Mahoney 2020). More generally, reviews of empirical TCE literature (e.g., Macher and Richman 2008) demonstrate how TCE as a theory has developed significantly over time, mainly through the broadening of its scope.TCE illustrates a general and essential characteristic of theory-testing research: When theory is taken as conjectural, testing theory also generates theory through marginal adjustments. Such adjustments link individual theory-testing research efforts to a broader theoretical conversation and, consequently, enable the accumulation of theoretical knowledge and theory progress. We do not witness similar accumulation in knowledge communities where theories are merely applied.\n 2\n \n Theory-testing research is often described as hypothetico-deductive (Mantere and Ketokivi 2013). We submit that the label “deductive” is accurate for theory application but inaccurate for theory testing; for the latter, the descriptively accurate term is hypothetico-abductive. In this section, we seek to establish this by comparing reasoning in theory testing versus theory application.To understand the role of abduction, we need to distinguish between two central reasoning tasks in theory-testing research: connecting theoretical and observational statements (the theorist's concern) and connecting observational statements with data (the statistician's concern) (Meehl 1990, 116). The statistician's concern is comparatively straightforward, and there is no difference between theory application and theory testing: The statistician's concern is addressed using the established tools of statistical inference, that is, a combination of deductive and inductive reasoning. Differences are found in how the researcher addresses the theorist's concern (Figure 1).In theory application, the theorist's concern is methodologically comparatively simpler. When theory is merely applied, there is no feedback arrow from observational predictions to theory. Furthermore, if theory consists of empirically salient concepts, observational predictions can be deduced from the theoretical foundation (Schmenner and Swink 1998)—hence the term hypothetico-deductive.The case of theory testing is comparatively more complex, as adjustments to theoretical conjectures do not follow a deductive, computational logic (Mantere and Ketokivi 2013). Rather, adjustments are iterative steps of abductive inferences which adjust conjectures based on often surprising findings (Peirce 1877). As an example, let us revisit TCE's discriminating alignment hypothesis. Its central terms (e.g., transaction, governance structure, competence) are theoretical and must be translated from the language of theory into the language of empirical observation. Given that translation involves several possible, non-obvious interpretations (Quine 1951), the reasoning process cannot possibly be deductive. Similarly, since translation does not involve generalization of any kind, it cannot be inductive either. The only remaining form of reasoning is abduction, which is indeed the reasoning tool by which theory-testing researchers bridge the theoretical to the empirical.The abductive translation process is generative because it creates new meaning for theoretical concepts (Gadamer 1975). In their make-or-buy study, Walker and Weber (1984) translated TCE's general concept of uncertainty into volume uncertainty and further into unpredictable fluctuations in demand for components in automobile final assembly. This translation created specific and contextualized—in a word, new—meaning for the concept of uncertainty.The other complicating factor has to do with the feedback arrow to theory (Figure 1). Specifically, testing hypotheses is ultimately a means to the end of testing theoretical conjectures. Empirical evidence that is consistent with the hypothesis constitutes an instance of positive corroboration, whereas inconsistency means negative corroboration (Popper 1935/2005, 264–266). Both kinds not only inform theory but may also lead to adjustments and elaborations.The feedback arrow to theory makes the reasoning process in theory-testing significantly more complex than in theory-application research because it involves the use of modus tollens.\n 3\n The use of modus tollens becomes particularly complex in the case of negative corroboration: What conclusions do we draw about theory if the evidence is inconsistent with a theoretical prediction?In his seminal contribution to the literature on theory testing, Lakatos (1970, 133) maintained that in the case of negative corroboration, we are not permitted to direct the modus tollens to the “hard core” of the theory but to its “protective belt” (i.e., measurement issues, data quality, contextual issues, and other problems or oversights that might have given rise to the failed prediction). This is particularly relevant when the theory under scrutiny has amassed a high degree of positive corroboration from past research, or, as Meehl (1990, 108) put it, has “money in the bank.” To suggest that all this money would be forfeited based on just one instance of negative corroboration is both unreasonable and methodologically dubious: There are no defensible methodological principles that permit us to immediately direct the modus tollens to the hard core of the theory.Reasoning about corroboration is an abductive process. The specific form of abduction used in back-translating the empirical to the theoretical differs from the abduction used in translating the theoretical to the empirical; consistent with Bendoly and Oliva's (2025, 7) terminology, we label these “abduction a posteriori” and “abduction a priori,” respectively.\n 4\n Understanding how theory testing is theory generation hinges specifically on understanding these two variants of abduction. The connection from abduction to theory generation stems from the fact that abduction is the only form of reasoning that allows the introduction of new ideas in the conclusion of a reasoning process (Locke et al. 2008).Bendoly and Oliva's (2025, 7) observation that abduction is a form of sensemaking offers a useful starting point for establishing that theory testing generates theory. Because both the practices and the objectives of our sensemaking are diverse (Weick 1995), so are the forms of abduction: some forms are selective, others creative; some are theoretical, others empirical; some are explanatory, others non-explanatory; some incorporate only observables while others include unobservables; and so on. Given that there are literally dozens of variants of abduction (Hoffmann 2011; Minnameier 2017; Schurz 2008), one must be explicit about the specific form used. In the following, we discuss the use of abduction in the two stages of theory-testing research.Theory application and theory testing both play an indispensable role in empirical research. In this paper, we have sought to establish that the latter has always had generative potential to shape our theoretical thinking. To realize this potential, we must strengthen our abductive reasoning practices both in the a priori and a posteriori stages of research. Stated in reasoning terms, this involves theoretical-model abductions that extend theories to new empirical contexts on the one hand, and strong inference-to-the-best-explanation (IBE) abductions to modify theory based on negative corroboration on the other. This elaborates the process by which abductive sensemaking enables the creation of theoretical arguments (cf. Bendoly and Oliva 2025, 7), thus offering an important pathway to theory.Herman Aguinis ([email protected]), The George Washington University, Washington, DC, USA.As noted in the original discussion above, theory testing is generative because it necessarily involves abduction. Researchers must translate abstract theoretical ideas into concrete, context-specific predictions, a step that is never automatic and often reshapes what those ideas actually mean. They then have to work back from the evidence to theory, asking which explanation best accounts for what they have observed. Apparent support should be handled carefully, because the same evidence can often be explained in more than one way, and apparent failures rarely justify abandoning a theory's core once it has accumulated substantial support. More often, such failures point to problems with assumptions, measures, or scope. Over time, these kinds of adjustments build across related studies, extending what a theory can explain and sharpening its logic. Seen this way, progress in OM theory comes less from inventing new theories and more from systematically improving existing ones through disciplined theory testing (e.g., Aguinis and Cronin 2026). This is a reality that matters for the vitality of the field and, frankly, for scholars working in a publication system that demands a clear theoretical contribution as a requirement for career success.But, a question I am asked frequently, especially by junior researchers, is: “These general principles about how to make contributions to theory make sense, but… how do I put them into practice, specifically? What actionable recommendations can you give me to implement these principles in my own research?”To answer this question, decades of methodological research allow me to offer a concise \n 8-step theory contributions playbook\n (for details, see Aguinis 2025, 2026; Aguinis and Cronin 2026). Importantly, I demonstrate the practical feasibility and effectiveness of this 8-step playbook. I do this by continuing with the illustrative case of TCE theory as discussed earlier. Specifically, I describe how Crook et al. (2013), which received the Academy of Management Perspectives best article of the year award, made meaningful theory contributions by implementing each of the playbook's steps (albeit some of them implicitly).Richard Makadok ([email protected]), The Ohio State University, Columbus, Ohio, USA.When I was 7 years old, my father explained science by opening his old college physics textbook from the atomic age of the 1950s, when folks revered Science with a capital “S.” In the introductory chapter, he showed me a closed-cycle flow chart labeled “The Scientific Method” with four stages: (1) propose a theory, (2) design a study to test the theory, (3) execute the study to collect data, and (4) interpret the study's results to either confirm, modify, or reject the theory, leading back to the first step to repeat the cycle anew. Simple language that a seven-year-old could understand, without fancy terms like deduction, induction, or abduction.In their forum essay, Ketokivi and Mantere focus mainly on the second and fourth stages in that old scientific method cycle—that is, designing a study to test a theory, which they label as “abduction a priori,” and interpreting the study's results to judge the theory, which they label as “abduction a posteriori.” I doubt their fancy new labels are needed when existing terms like “design” and “interpretation” are readily available, but their instinct to problematize (another fancy term) these two stages seems promising, since both designing studies and interpretating their results are more subtle and less straightforward than they seem at first glance. By admitting these problems, Ketokivi and Mantere take a helpful first step toward finding realistic solutions.But then what are the next steps? First, even if the eventual goal is normative analysis—that is, articulating how studies should be designed and how their results should be interpreted—it may still be useful to begin with some positive analysis by investigating what choices and tradeoffs, and even errors, real-world researchers make in their daily work of designing studies and interpreting their results. After all, it is usually helpful to clarify a problem before attempting to solve it. Such investigation may reveal hidden pitfalls—that is, dimensions of the design and interpretation problems we do not yet fully recognize—as well as identifying which aspects of current practice are working well or poorly. This investigation could begin by scrutinizing publications for possible disconnects between theory and study design, as the forum essay does with Walker and Weber (1984), and for possible disconnects between results and theoretical interpretations of those results. However, extracting practical implications from such disconnects may require interviewing researchers themselves, to understand the rationale behind their choices, and tradeoffs underlying those choices.The next step of deriving normative implications demands deeper analysis of tradeoffs. Thomas Sowell (1987) quipped, “There are no solutions, only tradeoffs.” In a world of scarce resources and limited observability, there is no perfect study design, for at least three reasons: First, financial or material tradeoffs occur wherever barriers to observation can only be reduced by deploying more resources (e.g., CERN's Large Hadron Collider), in which case the study's design may be constrained by interests, concerns, and wishes of whatever entities bankroll it. Second, under legal, regulatory, or ethical barriers to observation, institutional review boards may manage tradeoffs, or study design may be constrained by confidentiality requirements (e.g., census, taxation, education, or health records), or perhaps by piggybacking on whatever public data authorities, practitioners, or intermediaries happen to collect for their own purposes. Third, the barriers and tradeoffs are sometimes inherent in the theory itself, due to imprecisely defined conceptual constructs like the forum essay's example of “mainly transaction cost economizing.” Indeed, even physicists still struggle to define fundamental concepts like time and space.Thus, the realities of scarce resources and limited observability demand some humility about what is possible in study design, and some caution from readers, reviewers, and editors in second-guessing researchers' choices under tradeoffs. One may object to imperfections in the methods Walker and Weber (1984) chose when operationalizing and contextualizing TCE theory for a specific company in a specific industry, but perfection is an inappropriate standard. Inferiority is a more appropriate concern than imperfection. Sowell's quip about tradeoffs trumping solutions dovetails with his favorite question for utopian-minded critics, “Compared to what alternative?” Was imperfect TCE operationalization inferior compared to not measuring the concept at all? Was it inferior to realistic alternatives available at the time of the study, given available access to information and resources? Thus, it is unrealistic to hope that study design is a problem that will ever be “solved” in some absolute sense. Perhaps the best we can hope for is a practical “engineering science” of study design that focuses more on identifying situational pitfalls and helpful special-purpose tools than seeking a universal code of best practices.The same is also true of interpreting results, where tradeoffs also abound. Here the modus tollens example in the forum essay's footnote 3 suggests an “engineering science” in which the success or failure of empirical prediction B can be interpreted via Bayesian updating of priors for the set of conditions A = {A\n \n 1\n , A\n \n 2\n , A\n \n 3\n , A\n \n 4\n , …A\n \n n\n }, where A\n \n 1\n is the validity of the theory itself, A\n \n 2\n is the validity of the measurements, A\n \n 3\n is the validity of the theory's background assumptions, A\n \n 4\n is the validity of the empirical identification strategy, and the remaining A\n \n i\n are the rest of the protective belt. Of course, this approach would require not only methods to determine priors for the elements of A, but also knowledge (or at least defensible assumptions) about the correlations, interactions, or other dependencies among those elements, so that credit for B's success or blame for its failure gets allocated plausibly.This approach is more complicated than current practice\n 6\n but is further complicated by two problems—the calibration problem of “known unknowns” due to limited observability, and the ignorance problem of “unknown unknowns” due to limited awareness. First, at least some of the known protective belt elements {A\n \n 2\n , A\n \n 3\n , A\n \n 4\n , …A\n \n n\n } may themselves require calibration studies to determine their priors and/or their correlations or dependencies with each other, in which case many of the tradeoffs from study design—for example, financial, material, legal, regulatory, ethical—apply here too.\n 7\n Second, due to inherent tradeoffs between simplicity, generality, and accuracy (Weick 1979), researchers may be ignorant of some protective belt elements {<math>\n \n <semantics>\n \n <mrow>\n \n <msub>\n \n <mi>A</mi>\n \n <mrow>\n \n <mi>n</mi>\n \n <mo>+</mo>\n \n <mn>1</mn>\n </mrow>\n </msub>\n \n <mo>,</mo>\n \n <msub>\n \n <mi>A</mi>\n \n <mrow>\n \n <mi>n</mi>\n \n <mo>+</mo>\n \n <mn>2</mn>\n </mrow>\n </msub>\n \n <mo>,</mo>\n \n <mi>…</mi>\n \n <msub>\n \n <mi>A</mi>\n \n <mrow>\n \n <mi>n</mi>\n \n <mo>+</mo>\n \n <mi>m</mi>\n </mrow>\n </msub>\n </mrow>\n </semantics>\n </math>}, especially unrecognized background assumptions or boundary conditions, like unawareness of relativity in Newtonian physics. Since discovery of these is often serendipitous, part of interpreting results may always remain more of an art than an engineering science. As my father says, discoveries are either impossible or obvious—impossible until they are made, and then they are obvious.Morgan Swink ([email protected]), Texas Christian University, Fort Worth, Texas, USA.Many OM researchers present their work as “theory testing,” either explicitly or by virtue of the paper's structure (i.e., hypotheses before research method). Yet much of what is labeled theory testing research in OM (and likely in other fields) is not really testing at all—it is framing. In my experience (admittedly anecdotal), researchers often bring to a study expectations and even candidate explanations for observed relationships well before any theory is systematically interrogated. Most OM research originates in observations of practice—or in literature describing practice—and is thus motivated primarily by phenomena rather than theory.Researchers then consult existing theories to identify suitable conceptual frames: to articulate research questions, develop arguments, communicate expectations, and interpret results. This approach represents a different form of “theory application” than that described by Ketokivi and Mantere. They characterize theory application as using a theory's laws to solve an operational problem; in practice, researchers often “apply theory” to solve a different problem—namely, satisfying reviewers' expectations for theoretical grounding.Rarely do OM researchers begin with a theory and an explicit motivation to confirm, disconfirm, extend, or constrain its core tenets. In practical terms, OM is a phenomenon-driven field, and many would argue that this orientation is appropriate. After all, the “science” of OM traces its roots to early empiricists such as Taylor and the Gilbreths, and the field has advanced most dramatically through industry-driven innovations such as Fordism (mass production), the Toyota Production System, Agile Manufacturing, and related paradigms. Rather than developing indigenous theories, OM researchers have largely borrowed theories from other disciplines (e.g., organizational theory, sociology, economics).While some view this reliance on external theories as a weakness, it is arguably a reasonable outcome of historical timing. Many of the theories we have adopted predate OM as a distinct discipline—and certainly predate supply chain management. Until the 1980s, “OM” in most business and engineering schools was narrowly defined as production management, industrial engineering, or management science. Research in these areas was dominated by mathematically tractable theories (e.g., inventory theory, queuing theory) amenable to formal proof. Only in the past 30–40 years has OM—and more recently SCM—expanded into sociological and empirical research domains, where proof is inherently elusive.My conjecture is that increasing global integration and competition, driven by geopolitical and technological change, encouraged OM researchers to pursue broader, observation-based questions about operational practice. In doing so, the field has returned—perhaps inadvertently—to the empiricist roots of its earliest scholars. At the same time, in an effort to maintain scientific legitimacy, we have collectively emphasized theory development as a central research objective.More than over-dependence on theories borrowed from other fields, what constrains theoretical progress in OM is the lack of competing theories. Our dominant approach—practical research framed through imported theories—has produced a long list of theories, each explaining phenomena within a particular domain, but rarely standing in direct opposition to another. There have been notable exceptions. The 1990s debate over “trade-offs” versus “synergies” in operational improvement provides one example. The rise of behavioral operations in the early 2000s offered another, challenging assumptions of full rationality (though partially reconciled through bounded rationality). Still, most theories in OM persist with little sustained challenge—they rarely die. Without meaningful theoretical competition, the kind of theory “generation” described by Ketokivi and Mantere—largely extension and elaboration—is likely the upper bound of what our field can achieve under current conditions.Can we reasonably expect OM researchers to generate fundamentally new theories? Given the incentives embedded in our publication review process, probably not. Theory-testing studies (including extensions and elaborations) are generally easier to publish than theory-building efforts. The broadening of established theories—such as transaction cost economics, the resource-based view, or the theory of swift even flow—is undeniably valuable. Such expansions have enriched these theories by incorporating new constructs, domains, and behavioral considerations. However, it may be too much to expect theory testing to yield new theory generation. Importantly, negative corroboration is often the most powerful catalyst for theory development. When methodological flaws can be ruled out, surprising or non-supportive findings stimulate the kind of a posteriori abductive theorizing emphasized by Ketokivi and Mantere.Yet, as Ketokivi and Mantere also point out, theory development also requires a priori abduction. Two approaches may be particularly promising. First, researchers could be encouraged—especially during hypothesis development—to abductively generate plausible competing explanations and competing hypotheses, rather than relying solely on the logic of a chosen theory (or theories). This would yield stronger and more informative hypotheses, in the sense that empirical results could adjudicate among rival explanations rather than merely support or fail to support a narrow theoretical argument.A second approach is to postpone hypothesizing altogether. This would require editors in our field to acknowledge that rigorous description can make a legitimate contribution as a precursor to theory development. Editors and reviewers would need to allow greater space for informed speculation—typically discouraged in our field—about interesting, anomalous, or counter-intuitive phenomena uncovered through careful data analysis prior to formal hypothesis formulation. This logic underpins the JOM special issue on “nascent theory,”\n 8\n which seeks to create space for research motivated by observation rather than by allegiance to an established theoretical framework.Efforts such as these may encourage the development of genuinely new theories in OM. At the same time, the OM field should remain committed to our empirical heritage and continue to leverage the strengths of both “practice-oriented” and “practical” research traditions. Doing so will help us avoid the risk of “too much theory, not enough understanding” (Schmenner et al. 2009).Elliot Bendoly ([email protected]), The Ohio State University, Columbus, Ohio, USA.Rogelio Oliva ([email protected]), Texas A&M University, College Station, Texas, USA.Ketokivi and Mantere make a compelling case that research designed with the primary intention of testing theory has always carried generative potential. We agree. Their central argument—that abductive reasoning in both the a priori and a posteriori stages of theory testing enables the creation of new theoretical meaning—is fully consistent with our view that theories are never finished products but exist along a continuum of sensemaking, from vague hunches to detailed accounts of causal mechanisms. In our earlier terms (Bendoly and Oliva 2025), their proposal speaks primarily to research that begins on what we labeled Path A—studies motivated by existing theoretical conjectures and designed to test them—rather than to Path B work that originates in anomalous or intriguing empirical observations. Our commentary therefore concentrates on how their analysis enriches and reshapes our understanding of Path A, while leaving open important questions about the role of Path B in theory development. We are particularly gratified to see how their elaboration of abduction a priori and a posteriori develops the connection between abductive sensemaking and the creation of theoretical arguments that we identified as a meaningful pathway for research contributions. Their analysis of how Walker and Weber (1984) translated TCE's abstract concept of uncertainty into context-specific meaning illustrates precisely the kind of generative reasoning we had in mind.That said, we see opportunities to sharpen, deepen, and extend the argument in ways that matter for OM specifically.","PeriodicalId":51097,"journal":{"name":"Journal of Operations Management","volume":"72 3","pages":"356-365"},"PeriodicalIF":10.4000,"publicationDate":"2026-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/joom.70039","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Operations Management","FirstCategoryId":"91","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/joom.70039","RegionNum":2,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2026/3/9 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"MANAGEMENT","Score":null,"Total":0}

引用次数: 0

Abstract

In this paper, we propose that theory-testing research offers just as much potential for generating theory as theory-building and theory-elaborating research, the two variants typically associated with theory generation (Ketokivi and Choi 2014; Lee et al. 1999). Responding to Bendoly and Oliva's (2025) call for searching meaningful theoretical pathways for research contributions, we suggest that theory-testing research has always constituted a meaningful pathway to theoretical contributions when it extends beyond merely applying theory to challenging, expanding, and elaborating it. These extensions can lead to significant adjustments in bodies of knowledge over time as research programs progress.

To understand the generative aspect of theory testing, we must distinguish it from theory application. When we apply theory, the objective is usually to address a practical problem, without the interest of contributing to an ongoing theoretical conversation. In empirical operations management (OM) research, the application of factory physics offers an illustrative example: Researchers apply concepts such as Little's Law and laws of variability to improve factory productivity (Schmenner and Swink 1998). In this context, theory consists of the relevant applicable laws that are treated as given, which makes theory effectively axiomatic from an epistemological point of view (Popper 1935/2005, 51). ¹

In stark contrast to theory application, the fundamental idea in theory-testing research is to place the theory itself under empirical scrutiny. Accordingly, theory is no longer treated as self-evident and certain but propositional and conjectural, subject to revisions (Lakatos 1970; Popper 1963).

As an example of theory-testing research, consider Williamson's (1971) question “Why do firms integrate vertically?” This question gave birth to transaction cost economics (TCE), one of the most influential and established research programs on organizational boundaries (Santos and Eisenhardt 2005). The theoretical essence of TCE is succinctly captured by the discriminating alignment hypothesis: “Transactions, which differ in their attributes, are aligned with governance structures, which differ in their costs and competencies, in a discriminating (mainly transaction cost economizing) way” (Williamson 1996, 46–47). Importantly, this statement is not meant as axiomatic but conjectural, as the word ‘hypothesis’ implies: Whether actual governance decisions align transactions and governance structures in a “mainly transaction cost economizing way” is to be settled empirically.

Consider Walker and Weber's (1984) seminal TCE-based study that examined the make-or-buy decision in the final assembly of automobiles. TCE-as-conjecture becomes salient in the discussion section where several TCE's central propositions are called into question based on the empirical analysis. For example, the finding that “the effect of transaction costs on make-or-buy decisions was substantially overshadowed by comparative production costs” (Walker and Weber 1984, 387) is inconsistent with TCE's original central proposition that transactions will be aligned with governance structures in “mainly transaction cost economizing” (Williamson 1996, 47, emphasis added) way. When the qualifier “mainly” is interpreted as conjectural and malleable, empirical research not only tests but also informs theory. Walker and Weber's (1984) findings suggest that while transaction costs are relevant, they constitute only a portion of total costs, which are decisive in make-or-buy decisions. Such findings, and many others, have expanded TCE's focus over time from transaction costs to total costs. Another more recent development is that instead of focusing on costs, researchers have incorporated the revenue side into the comparative analysis as well (Ketokivi and Mahoney 2020). More generally, reviews of empirical TCE literature (e.g., Macher and Richman 2008) demonstrate how TCE as a theory has developed significantly over time, mainly through the broadening of its scope.

TCE illustrates a general and essential characteristic of theory-testing research: When theory is taken as conjectural, testing theory also generates theory through marginal adjustments. Such adjustments link individual theory-testing research efforts to a broader theoretical conversation and, consequently, enable the accumulation of theoretical knowledge and theory progress. We do not witness similar accumulation in knowledge communities where theories are merely applied. ²

Theory-testing research is often described as hypothetico-deductive (Mantere and Ketokivi 2013). We submit that the label “deductive” is accurate for theory application but inaccurate for theory testing; for the latter, the descriptively accurate term is hypothetico-abductive. In this section, we seek to establish this by comparing reasoning in theory testing versus theory application.

To understand the role of abduction, we need to distinguish between two central reasoning tasks in theory-testing research: connecting theoretical and observational statements (the theorist's concern) and connecting observational statements with data (the statistician's concern) (Meehl 1990, 116). The statistician's concern is comparatively straightforward, and there is no difference between theory application and theory testing: The statistician's concern is addressed using the established tools of statistical inference, that is, a combination of deductive and inductive reasoning. Differences are found in how the researcher addresses the theorist's concern (Figure 1).

In theory application, the theorist's concern is methodologically comparatively simpler. When theory is merely applied, there is no feedback arrow from observational predictions to theory. Furthermore, if theory consists of empirically salient concepts, observational predictions can be deduced from the theoretical foundation (Schmenner and Swink 1998)—hence the term hypothetico-deductive.

The case of theory testing is comparatively more complex, as adjustments to theoretical conjectures do not follow a deductive, computational logic (Mantere and Ketokivi 2013). Rather, adjustments are iterative steps of abductive inferences which adjust conjectures based on often surprising findings (Peirce 1877). As an example, let us revisit TCE's discriminating alignment hypothesis. Its central terms (e.g., transaction, governance structure, competence) are theoretical and must be translated from the language of theory into the language of empirical observation. Given that translation involves several possible, non-obvious interpretations (Quine 1951), the reasoning process cannot possibly be deductive. Similarly, since translation does not involve generalization of any kind, it cannot be inductive either. The only remaining form of reasoning is abduction, which is indeed the reasoning tool by which theory-testing researchers bridge the theoretical to the empirical.

The abductive translation process is generative because it creates new meaning for theoretical concepts (Gadamer 1975). In their make-or-buy study, Walker and Weber (1984) translated TCE's general concept of uncertainty into volume uncertainty and further into unpredictable fluctuations in demand for components in automobile final assembly. This translation created specific and contextualized—in a word, new—meaning for the concept of uncertainty.

The other complicating factor has to do with the feedback arrow to theory (Figure 1). Specifically, testing hypotheses is ultimately a means to the end of testing theoretical conjectures. Empirical evidence that is consistent with the hypothesis constitutes an instance of positive corroboration, whereas inconsistency means negative corroboration (Popper 1935/2005, 264–266). Both kinds not only inform theory but may also lead to adjustments and elaborations.

The feedback arrow to theory makes the reasoning process in theory-testing significantly more complex than in theory-application research because it involves the use of modus tollens. ³ The use of modus tollens becomes particularly complex in the case of negative corroboration: What conclusions do we draw about theory if the evidence is inconsistent with a theoretical prediction?

In his seminal contribution to the literature on theory testing, Lakatos (1970, 133) maintained that in the case of negative corroboration, we are not permitted to direct the modus tollens to the “hard core” of the theory but to its “protective belt” (i.e., measurement issues, data quality, contextual issues, and other problems or oversights that might have given rise to the failed prediction). This is particularly relevant when the theory under scrutiny has amassed a high degree of positive corroboration from past research, or, as Meehl (1990, 108) put it, has “money in the bank.” To suggest that all this money would be forfeited based on just one instance of negative corroboration is both unreasonable and methodologically dubious: There are no defensible methodological principles that permit us to immediately direct the modus tollens to the hard core of the theory.

Reasoning about corroboration is an abductive process. The specific form of abduction used in back-translating the empirical to the theoretical differs from the abduction used in translating the theoretical to the empirical; consistent with Bendoly and Oliva's (2025, 7) terminology, we label these “abduction a posteriori” and “abduction a priori,” respectively. ⁴ Understanding how theory testing is theory generation hinges specifically on understanding these two variants of abduction. The connection from abduction to theory generation stems from the fact that abduction is the only form of reasoning that allows the introduction of new ideas in the conclusion of a reasoning process (Locke et al. 2008).

Bendoly and Oliva's (2025, 7) observation that abduction is a form of sensemaking offers a useful starting point for establishing that theory testing generates theory. Because both the practices and the objectives of our sensemaking are diverse (Weick 1995), so are the forms of abduction: some forms are selective, others creative; some are theoretical, others empirical; some are explanatory, others non-explanatory; some incorporate only observables while others include unobservables; and so on. Given that there are literally dozens of variants of abduction (Hoffmann 2011; Minnameier 2017; Schurz 2008), one must be explicit about the specific form used. In the following, we discuss the use of abduction in the two stages of theory-testing research.

Theory application and theory testing both play an indispensable role in empirical research. In this paper, we have sought to establish that the latter has always had generative potential to shape our theoretical thinking. To realize this potential, we must strengthen our abductive reasoning practices both in the a priori and a posteriori stages of research. Stated in reasoning terms, this involves theoretical-model abductions that extend theories to new empirical contexts on the one hand, and strong inference-to-the-best-explanation (IBE) abductions to modify theory based on negative corroboration on the other. This elaborates the process by which abductive sensemaking enables the creation of theoretical arguments (cf. Bendoly and Oliva 2025, 7), thus offering an important pathway to theory.

Herman Aguinis ([email protected]), The George Washington University, Washington, DC, USA.

As noted in the original discussion above, theory testing is generative because it necessarily involves abduction. Researchers must translate abstract theoretical ideas into concrete, context-specific predictions, a step that is never automatic and often reshapes what those ideas actually mean. They then have to work back from the evidence to theory, asking which explanation best accounts for what they have observed. Apparent support should be handled carefully, because the same evidence can often be explained in more than one way, and apparent failures rarely justify abandoning a theory's core once it has accumulated substantial support. More often, such failures point to problems with assumptions, measures, or scope. Over time, these kinds of adjustments build across related studies, extending what a theory can explain and sharpening its logic. Seen this way, progress in OM theory comes less from inventing new theories and more from systematically improving existing ones through disciplined theory testing (e.g., Aguinis and Cronin 2026). This is a reality that matters for the vitality of the field and, frankly, for scholars working in a publication system that demands a clear theoretical contribution as a requirement for career success.

But, a question I am asked frequently, especially by junior researchers, is: “These general principles about how to make contributions to theory make sense, but… how do I put them into practice, specifically? What actionable recommendations can you give me to implement these principles in my own research?”

To answer this question, decades of methodological research allow me to offer a concise 8-step theory contributions playbook (for details, see Aguinis 2025, 2026; Aguinis and Cronin 2026). Importantly, I demonstrate the practical feasibility and effectiveness of this 8-step playbook. I do this by continuing with the illustrative case of TCE theory as discussed earlier. Specifically, I describe how Crook et al. (2013), which received the Academy of Management Perspectives best article of the year award, made meaningful theory contributions by implementing each of the playbook's steps (albeit some of them implicitly).

Richard Makadok ([email protected]), The Ohio State University, Columbus, Ohio, USA.

When I was 7 years old, my father explained science by opening his old college physics textbook from the atomic age of the 1950s, when folks revered Science with a capital “S.” In the introductory chapter, he showed me a closed-cycle flow chart labeled “The Scientific Method” with four stages: (1) propose a theory, (2) design a study to test the theory, (3) execute the study to collect data, and (4) interpret the study's results to either confirm, modify, or reject the theory, leading back to the first step to repeat the cycle anew. Simple language that a seven-year-old could understand, without fancy terms like deduction, induction, or abduction.

In their forum essay, Ketokivi and Mantere focus mainly on the second and fourth stages in that old scientific method cycle—that is, designing a study to test a theory, which they label as “abduction a priori,” and interpreting the study's results to judge the theory, which they label as “abduction a posteriori.” I doubt their fancy new labels are needed when existing terms like “design” and “interpretation” are readily available, but their instinct to problematize (another fancy term) these two stages seems promising, since both designing studies and interpretating their results are more subtle and less straightforward than they seem at first glance. By admitting these problems, Ketokivi and Mantere take a helpful first step toward finding realistic solutions.

But then what are the next steps? First, even if the eventual goal is normative analysis—that is, articulating how studies should be designed and how their results should be interpreted—it may still be useful to begin with some positive analysis by investigating what choices and tradeoffs, and even errors, real-world researchers make in their daily work of designing studies and interpreting their results. After all, it is usually helpful to clarify a problem before attempting to solve it. Such investigation may reveal hidden pitfalls—that is, dimensions of the design and interpretation problems we do not yet fully recognize—as well as identifying which aspects of current practice are working well or poorly. This investigation could begin by scrutinizing publications for possible disconnects between theory and study design, as the forum essay does with Walker and Weber (1984), and for possible disconnects between results and theoretical interpretations of those results. However, extracting practical implications from such disconnects may require interviewing researchers themselves, to understand the rationale behind their choices, and tradeoffs underlying those choices.

The next step of deriving normative implications demands deeper analysis of tradeoffs. Thomas Sowell (1987) quipped, “There are no solutions, only tradeoffs.” In a world of scarce resources and limited observability, there is no perfect study design, for at least three reasons: First, financial or material tradeoffs occur wherever barriers to observation can only be reduced by deploying more resources (e.g., CERN's Large Hadron Collider), in which case the study's design may be constrained by interests, concerns, and wishes of whatever entities bankroll it. Second, under legal, regulatory, or ethical barriers to observation, institutional review boards may manage tradeoffs, or study design may be constrained by confidentiality requirements (e.g., census, taxation, education, or health records), or perhaps by piggybacking on whatever public data authorities, practitioners, or intermediaries happen to collect for their own purposes. Third, the barriers and tradeoffs are sometimes inherent in the theory itself, due to imprecisely defined conceptual constructs like the forum essay's example of “mainly transaction cost economizing.” Indeed, even physicists still struggle to define fundamental concepts like time and space.

Thus, the realities of scarce resources and limited observability demand some humility about what is possible in study design, and some caution from readers, reviewers, and editors in second-guessing researchers' choices under tradeoffs. One may object to imperfections in the methods Walker and Weber (1984) chose when operationalizing and contextualizing TCE theory for a specific company in a specific industry, but perfection is an inappropriate standard. Inferiority is a more appropriate concern than imperfection. Sowell's quip about tradeoffs trumping solutions dovetails with his favorite question for utopian-minded critics, “Compared to what alternative?” Was imperfect TCE operationalization inferior compared to not measuring the concept at all? Was it inferior to realistic alternatives available at the time of the study, given available access to information and resources? Thus, it is unrealistic to hope that study design is a problem that will ever be “solved” in some absolute sense. Perhaps the best we can hope for is a practical “engineering science” of study design that focuses more on identifying situational pitfalls and helpful special-purpose tools than seeking a universal code of best practices.

The same is also true of interpreting results, where tradeoffs also abound. Here the modus tollens example in the forum essay's footnote 3 suggests an “engineering science” in which the success or failure of empirical prediction B can be interpreted via Bayesian updating of priors for the set of conditions A = {A ₁, A ₂, A ₃, A ₄, …A _n}, where A ₁ is the validity of the theory itself, A ₂ is the validity of the measurements, A ₃ is the validity of the theory's background assumptions, A ₄ is the validity of the empirical identification strategy, and the remaining A _i are the rest of the protective belt. Of course, this approach would require not only methods to determine priors for the elements of A, but also knowledge (or at least defensible assumptions) about the correlations, interactions, or other dependencies among those elements, so that credit for B's success or blame for its failure gets allocated plausibly.

This approach is more complicated than current practice ⁶ but is further complicated by two problems—the calibration problem of “known unknowns” due to limited observability, and the ignorance problem of “unknown unknowns” due to limited awareness. First, at least some of the known protective belt elements {A ₂, A ₃, A ₄, …A _n} may themselves require calibration studies to determine their priors and/or their correlations or dependencies with each other, in which case many of the tradeoffs from study design—for example, financial, material, legal, regulatory, ethical—apply here too. ⁷ Second, due to inherent tradeoffs between simplicity, generality, and accuracy (Weick 1979), researchers may be ignorant of some protective belt elements { $A_{n + 1}, A_{n + 2}, \dots A_{n + m}$ }, especially unrecognized background assumptions or boundary conditions, like unawareness of relativity in Newtonian physics. Since discovery of these is often serendipitous, part of interpreting results may always remain more of an art than an engineering science. As my father says, discoveries are either impossible or obvious—impossible until they are made, and then they are obvious.

Morgan Swink ([email protected]), Texas Christian University, Fort Worth, Texas, USA.

Many OM researchers present their work as “theory testing,” either explicitly or by virtue of the paper's structure (i.e., hypotheses before research method). Yet much of what is labeled theory testing research in OM (and likely in other fields) is not really testing at all—it is framing. In my experience (admittedly anecdotal), researchers often bring to a study expectations and even candidate explanations for observed relationships well before any theory is systematically interrogated. Most OM research originates in observations of practice—or in literature describing practice—and is thus motivated primarily by phenomena rather than theory.

Researchers then consult existing theories to identify suitable conceptual frames: to articulate research questions, develop arguments, communicate expectations, and interpret results. This approach represents a different form of “theory application” than that described by Ketokivi and Mantere. They characterize theory application as using a theory's laws to solve an operational problem; in practice, researchers often “apply theory” to solve a different problem—namely, satisfying reviewers' expectations for theoretical grounding.

Rarely do OM researchers begin with a theory and an explicit motivation to confirm, disconfirm, extend, or constrain its core tenets. In practical terms, OM is a phenomenon-driven field, and many would argue that this orientation is appropriate. After all, the “science” of OM traces its roots to early empiricists such as Taylor and the Gilbreths, and the field has advanced most dramatically through industry-driven innovations such as Fordism (mass production), the Toyota Production System, Agile Manufacturing, and related paradigms. Rather than developing indigenous theories, OM researchers have largely borrowed theories from other disciplines (e.g., organizational theory, sociology, economics).

While some view this reliance on external theories as a weakness, it is arguably a reasonable outcome of historical timing. Many of the theories we have adopted predate OM as a distinct discipline—and certainly predate supply chain management. Until the 1980s, “OM” in most business and engineering schools was narrowly defined as production management, industrial engineering, or management science. Research in these areas was dominated by mathematically tractable theories (e.g., inventory theory, queuing theory) amenable to formal proof. Only in the past 30–40 years has OM—and more recently SCM—expanded into sociological and empirical research domains, where proof is inherently elusive.

My conjecture is that increasing global integration and competition, driven by geopolitical and technological change, encouraged OM researchers to pursue broader, observation-based questions about operational practice. In doing so, the field has returned—perhaps inadvertently—to the empiricist roots of its earliest scholars. At the same time, in an effort to maintain scientific legitimacy, we have collectively emphasized theory development as a central research objective.

More than over-dependence on theories borrowed from other fields, what constrains theoretical progress in OM is the lack of competing theories. Our dominant approach—practical research framed through imported theories—has produced a long list of theories, each explaining phenomena within a particular domain, but rarely standing in direct opposition to another. There have been notable exceptions. The 1990s debate over “trade-offs” versus “synergies” in operational improvement provides one example. The rise of behavioral operations in the early 2000s offered another, challenging assumptions of full rationality (though partially reconciled through bounded rationality). Still, most theories in OM persist with little sustained challenge—they rarely die. Without meaningful theoretical competition, the kind of theory “generation” described by Ketokivi and Mantere—largely extension and elaboration—is likely the upper bound of what our field can achieve under current conditions.

Can we reasonably expect OM researchers to generate fundamentally new theories? Given the incentives embedded in our publication review process, probably not. Theory-testing studies (including extensions and elaborations) are generally easier to publish than theory-building efforts. The broadening of established theories—such as transaction cost economics, the resource-based view, or the theory of swift even flow—is undeniably valuable. Such expansions have enriched these theories by incorporating new constructs, domains, and behavioral considerations. However, it may be too much to expect theory testing to yield new theory generation. Importantly, negative corroboration is often the most powerful catalyst for theory development. When methodological flaws can be ruled out, surprising or non-supportive findings stimulate the kind of a posteriori abductive theorizing emphasized by Ketokivi and Mantere.

Yet, as Ketokivi and Mantere also point out, theory development also requires a priori abduction. Two approaches may be particularly promising. First, researchers could be encouraged—especially during hypothesis development—to abductively generate plausible competing explanations and competing hypotheses, rather than relying solely on the logic of a chosen theory (or theories). This would yield stronger and more informative hypotheses, in the sense that empirical results could adjudicate among rival explanations rather than merely support or fail to support a narrow theoretical argument.

A second approach is to postpone hypothesizing altogether. This would require editors in our field to acknowledge that rigorous description can make a legitimate contribution as a precursor to theory development. Editors and reviewers would need to allow greater space for informed speculation—typically discouraged in our field—about interesting, anomalous, or counter-intuitive phenomena uncovered through careful data analysis prior to formal hypothesis formulation. This logic underpins the JOM special issue on “nascent theory,” ⁸ which seeks to create space for research motivated by observation rather than by allegiance to an established theoretical framework.

Efforts such as these may encourage the development of genuinely new theories in OM. At the same time, the OM field should remain committed to our empirical heritage and continue to leverage the strengths of both “practice-oriented” and “practical” research traditions. Doing so will help us avoid the risk of “too much theory, not enough understanding” (Schmenner et al. 2009).

Elliot Bendoly ([email protected]), The Ohio State University, Columbus, Ohio, USA.

Rogelio Oliva ([email protected]), Texas A&M University, College Station, Texas, USA.

Ketokivi and Mantere make a compelling case that research designed with the primary intention of testing theory has always carried generative potential. We agree. Their central argument—that abductive reasoning in both the a priori and a posteriori stages of theory testing enables the creation of new theoretical meaning—is fully consistent with our view that theories are never finished products but exist along a continuum of sensemaking, from vague hunches to detailed accounts of causal mechanisms. In our earlier terms (Bendoly and Oliva 2025), their proposal speaks primarily to research that begins on what we labeled Path A—studies motivated by existing theoretical conjectures and designed to test them—rather than to Path B work that originates in anomalous or intriguing empirical observations. Our commentary therefore concentrates on how their analysis enriches and reshapes our understanding of Path A, while leaving open important questions about the role of Path B in theory development. We are particularly gratified to see how their elaboration of abduction a priori and a posteriori develops the connection between abductive sensemaking and the creation of theoretical arguments that we identified as a meaningful pathway for research contributions. Their analysis of how Walker and Weber (1984) translated TCE's abstract concept of uncertainty into context-specific meaning illustrates precisely the kind of generative reasoning we had in mind.

That said, we see opportunities to sharpen, deepen, and extend the argument in ways that matter for OM specifically.

Abstract Image

查看原文本刊更多论文

JOM论坛：理论检验即理论生成

在本节中，我们试图通过比较理论测试和理论应用中的推理来建立这一点。为了理解溯因的作用，我们需要区分理论检验研究中的两个中心推理任务：连接理论和观察陈述（理论家的关注点）和连接观察陈述与数据（统计学家的关注点）（Meehl 1990,116）。统计学家关注的问题相对直接，理论应用和理论检验之间没有区别：统计学家关注的问题是使用已建立的统计推断工具来解决的，即演绎推理和归纳推理的结合。不同之处在于研究者如何处理理论家的关注点（图1）。在理论应用上，理论家的关注在方法论上比较简单。当理论仅仅被应用时，没有从观测预测到理论的反馈箭头。此外，如果理论由经验上显著的概念组成，则观测预测可以从理论基础中推导出来（Schmenner和Swink 1998）——因此被称为假设演绎。理论检验的情况相对更复杂，因为对理论推测的调整不遵循演绎的计算逻辑（Mantere和Ketokivi 2013）。更确切地说，调整是溯因推理的迭代步骤，它根据经常令人惊讶的发现调整猜想（Peirce 1877）。作为一个例子，让我们重新审视TCE的判别对齐假设。它的中心术语（例如，交易、治理结构、能力）是理论性的，必须从理论语言翻译成经验观察语言。鉴于翻译涉及几种可能的、不明显的解释（Quine 1951），推理过程不可能是演绎的。同样，由于翻译不涉及任何形式的概括，它也不能归纳。唯一剩下的推理形式是溯因法，它确实是理论检验研究人员将理论与经验联系起来的推理工具。溯因翻译过程是生成性的，因为它为理论概念创造了新的意义（伽达默尔，1975）。Walker和Weber（1984）在他们的制造或购买研究中，将TCE的一般不确定性概念转化为体积不确定性，并进一步转化为汽车最终装配中零部件需求的不可预测波动。这种翻译为不确定性的概念创造了具体的、语境化的——一句话，新的含义。另一个复杂的因素与理论的反馈箭头有关（图1）。具体来说，检验假设最终是检验理论猜想的一种手段。与假设一致的经验证据构成积极佐证的实例，而不一致则意味着消极佐证（Popper 1935/2005, 264-266）。这两种类型不仅提供理论信息，而且可能导致调整和细化。理论的反馈箭头使得理论检验中的推理过程比理论应用研究中的推理过程要复杂得多，因为它涉及到模量的使用。在否定确证的情况下，计算方法的使用变得特别复杂：如果证据与理论预测不一致，我们对理论得出什么结论？在他对理论检验文献的开创性贡献中，Lakatos（1970, 133）坚持认为，在否定确证的情况下，我们不允许将模型指向理论的“核心”，而是指向其“保护带”（即测量问题、数据质量、上下文问题以及其他可能导致预测失败的问题或疏忽）。当被审查的理论已经从过去的研究中积累了高度的积极佐证，或者正如Meehl（1990,108）所说的那样，“在银行里有钱”时，这一点尤为重要。仅仅基于一个否定确证的例子就认为所有这些钱都将被没收，这不仅是不合理的，而且在方法上也是可疑的：没有任何可辩护的方法原则允许我们立即将模型指向理论的核心。关于确证的推理是一个溯因过程。在将经验反译为理论时所使用的溯因法的具体形式不同于将理论反译为经验时所使用的溯因法；与Bendoly和Oliva（2025, 7）的术语一致，我们将这两种现象分别称为“后天绑架”和“先天绑架”。理解理论检验是如何产生理论的，具体取决于理解溯因法的这两种变体。在他们的论坛文章中，Ketokivi和Mantere主要关注旧科学方法循环的第二和第四阶段，即设计一项研究来测试一个理论，他们称之为“先验绑架”，并解释研究结果来判断这个理论，他们称之为“后验绑架”。当现有的术语如“设计”和“解释”唾可得时，我怀疑他们是否需要新的花哨标签，但他们对这两个阶段的问题化（另一个花哨的术语）的本能似乎很有希望，因为设计研究和解释他们的结果都比乍一看更微妙，更不直接。通过承认这些问题，Ketokivi和Mantere朝着寻找现实的解决方案迈出了有益的第一步。但是接下来的步骤是什么呢？首先，即使最终的目标是规范分析——也就是说，阐明研究应该如何设计，研究结果应该如何解释——通过调查现实世界的研究人员在设计研究和解释结果的日常工作中做出的选择和权衡，甚至错误，从一些积极的分析开始，可能仍然是有用的。毕竟，在试图解决问题之前澄清问题通常是有帮助的。这样的调查可能会揭示隐藏的陷阱——也就是说，我们还没有完全认识到的设计和解释问题的维度——以及确定当前实践的哪些方面工作得好或不好。这项调查可以从仔细审查出版物中理论和研究设计之间可能存在的脱节开始，就像Walker和Weber（1984）的论坛文章所做的那样，以及结果和这些结果的理论解释之间可能存在的脱节。然而，从这种脱节中提取实际意义可能需要采访研究人员自己，以了解他们选择背后的理由，以及这些选择背后的权衡。导出规范含义的下一步需要对权衡进行更深入的分析。Thomas Sowell（1987）打趣道：“没有解决方案，只有权衡。”在一个资源稀缺和可观测性有限的世界里，不存在完美的研究设计，至少有三个原因：首先，只有通过部署更多的资源（例如，欧洲核子研究中心的大型强子对撞机）才能减少观察障碍，在这种情况下，研究的设计可能会受到利益、关注和任何资助实体的愿望的限制。其次，在法律、监管或伦理方面的观察障碍下，机构审查委员会可能会权衡利弊，或者研究设计可能会受到保密要求（例如，人口普查、税收、教育或健康记录）的限制，或者可能会利用当局、从业人员或中介机构为自己的目的碰巧收集的任何公共数据。第三，由于不精确定义的概念结构，如论坛文章中的“主要是交易成本节约”的例子，这些障碍和权衡有时是理论本身固有的。事实上，即使是物理学家也仍在努力定义时间和空间等基本概念。因此，资源稀缺和可观察性有限的现实要求我们对研究设计的可能性保持一定的谦虚态度，读者、审稿人和编辑在权衡取舍的情况下对研究人员的选择进行事后评价时要保持一定的谨慎。有人可能会反对Walker和Weber（1984）在为特定行业的特定公司实施和情境化TCE理论时选择的方法的不完善，但完美是一个不合适的标准。自卑是比不完美更合适的忧虑。索厄尔关于权衡胜过解决方案的俏皮话，与他最喜欢问的乌托邦主义批评者的问题不谋而合：“与什么替代方案相比？”与完全不测量概念相比，不完美的TCE操作化是否更差？考虑到现有的信息和资源，它是否不如研究时现有的现实选择？因此，希望研究设计是一个在某种绝对意义上永远“解决”的问题是不现实的。也许我们所能期望的最好结果是一种实用的研究设计“工程科学”，它更多地关注于识别情境陷阱和有用的特殊目的工具，而不是寻找最佳实践的通用代码。在解释结果时也是如此，其中也有很多权衡。在这里，论坛文章的脚注3中的模型示例表明了一种“工程科学”，其中经验预测B的成功或失败可以通过贝叶斯更新先验来解释条件A = {a1, a2, a3, a4，…A n}。其中a1是理论本身的有效性，a2是测量的有效性，a3是理论背景假设的有效性，a4是经验识别策略的有效性，剩下的a1是防护带的其余部分。当然，这种方法不仅需要确定A元素的先验方法，还需要关于这些元素之间的相关性、相互作用或其他依赖关系的知识（或至少是可辩护的假设），以便合理地分配B的成功或失败的功劳。这种方法比目前的实践更复杂，但由于两个问题——由于有限的可观测性而导致的“已知未知”的校准问题，以及由于有限的意识而导致的“未知未知”的无知问题，使这种方法变得更加复杂。首先，至少一些已知的防护带元素{a2、a3、4a、…a1 n}本身可能需要进行校准研究，以确定它们的先验性和/或它们之间的相关性或依赖性，在这种情况下，研究设计中的许多权衡-例如，财务、材料、法律、监管、道德——也适用于此。7其次，由于简单性、一般性和准确性之间的内在权衡（Weick 1979），研究人员可能忽略了一些防护带元素{A n + 1， A n + 2，…A n + m}，尤其是未被认识的背景假设或边界条件，就像牛顿物理学中对相对论的不认识一样。由于这些发现往往是偶然的，部分解释结果可能总是更像是一门艺术，而不是一门工程科学。正如我父亲所说，发现要么是不可能的，要么是显而易见的——在它们被发现之前是不可能的，然后它们就变得显而易见了。Morgan Swink ([email protected])，美国德克萨斯州沃斯堡德克萨斯基督教大学。许多OM研究人员将他们的工作描述为“理论检验”，或者明确地，或者根据论文的结构（即，在研究方法之前的假设）。然而，在管理学领域（可能还有其他领域），许多被贴上理论测试标签的研究根本不是真正的测试——而是框架。根据我的经验（当然是道听途说），研究人员经常在系统地质疑任何理论之前，就对观察到的关系提出预期甚至候选解释。大多数OM研究起源于对实践的观察——或者描述实践的文献——因此主要是由现象而不是理论驱动的。然后，研究人员参考现有的理论，以确定合适的概念框架：阐明研究问题，发展论点，沟通期望，并解释结果。这种方法代表了一种不同于Ketokivi和Mantere所描述的“理论应用”形式。他们将理论应用描述为使用理论定律来解决操作问题；在实践中，研究人员经常“运用理论”来解决不同的问题——即满足审稿人对理论基础的期望。OM研究人员很少从一个理论和明确的动机开始，以确认、否定、扩展或限制其核心原则。实际上，OM是一个现象驱动的领域，许多人会认为这种方向是合适的。毕竟，OM的“科学”可以追溯到早期的经验主义者，如Taylor和Gilbreths，该领域通过行业驱动的创新，如福特主义（大规模生产）、丰田生产系统、敏捷制造和相关范例，取得了最显著的进步。而不是发展本土理论，OM研究人员在很大程度上借鉴了其他学科的理论（例如，组织理论，社会学，经济学）。虽然有些人认为这种对外部理论的依赖是一种弱点，但这可以说是历史时机的合理结果。我们采用的许多理论早于管理学作为一门独特的学科，当然也早于供应链管理。直到20世纪80年代，“OM”在大多数商业和工程学校被狭隘地定义为生产管理、工业工程或管理科学。在这些领域的研究主要是数学上易于处理的理论（例如，库存理论，排队理论），可以正式证明。只有在过去的30-40年里，om和最近的scm才扩展到社会学和实证研究领域，而这些领域的证据本质上是难以捉摸的。我的猜想是，在地缘政治和技术变革的推动下，全球一体化和竞争日益加剧，鼓励OM研究人员追求更广泛的、基于观察的关于作战实践的问题。在这样做的过程中，这一领域或许无意中回到了最早学者的经验主义根源。同时，为了维护科学的合法性，我们共同强调理论发展是研究的中心目标。除了过度依赖其他领域的理论外，制约OM理论进步的因素还在于缺乏相互竞争的理论。我们的主要方法——通过引进理论框架的实践研究——产生了一长串理论，每一个理论都解释了一个特定领域内的现象，但很少与另一个理论直接对立。但也有明显的例外。上世纪90年代关于运营改进中“权衡”与“协同”的争论提供了一个例子。21世纪初，行为操作的兴起提供了另一种具有挑战性的完全理性假设（尽管通过有限理性得到了部分调和）。尽管如此，OM的大多数理论仍然存在，几乎没有持续的挑战——它们很少消亡。如果没有有意义的理论竞争，Ketokivi和mantere所描述的那种理论“生成”——主要是扩展和细化——很可能是我们这个领域在当前条件下所能达到的上限。我们能合理地期待OM研究人员产生根本性的新理论吗？考虑到我们出版审查过程中的激励机制，可能不会。理论检验研究（包括扩展和阐述）通常比理论构建工作更容易发表。不可否认，对现有理论的拓展是有价值的，比如交易成本经济学、资源基础观点或快速均匀流理论。这种扩展通过纳入新的结构、领域和行为考虑，丰富了这些理论。然而，期望理论测试产生新的理论可能是过分的。重要的是，否定确证往往是理论发展最有力的催化剂。当可以排除方法上的缺陷时，令人惊讶或不支持的发现就会激发Ketokivi和Mantere所强调的那种事后溯因理论。然而，正如Ketokivi和Mantere也指出的那样，理论的发展也需要先验的溯因。有两种方法可能特别有希望。首先，可以鼓励研究人员——特别是在假设发展过程中——溯因性地产生可信的相互竞争的解释和相互竞争的假设，而不是仅仅依赖于一个（或多个）选定的理论的逻辑。这将产生更有力、信息更丰富的假设，从某种意义上说，经验结果可以在相互竞争的解释中做出裁决，而不仅仅是支持或不支持一个狭隘的理论论点。第二种方法是完全推迟假设。这将要求我们领域的编辑承认，严格的描述可以作为理论发展的先驱做出合法的贡献。编辑和审稿人需要为有根据的推测提供更大的空间——在我们的领域通常是不鼓励的——在正式假设形成之前，通过仔细的数据分析发现有趣的、异常的或反直觉的现象。这一逻辑支撑了JOM关于“新生理论”的特刊8，该特刊试图为由观察而不是忠实于既定理论框架的研究创造空间。诸如此类的努力可能会鼓励真正的管理学新理论的发展。与此同时，管理学领域应该继续致力于我们的经验遗产，并继续利用“以实践为导向”和“实践”研究传统的优势。这样做可以帮助我们避免“理论太多，理解不够”的风险（Schmenner et al. 2009）。Elliot Bendoly ([email protected])，俄亥俄州立大学，美国俄亥俄州哥伦布市。罗格里奥·奥利瓦（[email protected]），美国德克萨斯州大学城德州农工大学。Ketokivi和Mantere提出了一个令人信服的案例，即以测试理论为主要目的而设计的研究总是具有生成潜力。我们同意。他们的中心论点——理论检验的先验和后验阶段的溯因推理能够创造新的理论意义——与我们的观点完全一致，即理论从来都不是成品，而是存在于一个连续的意义制造过程中，从模糊的预感到对因果机制的详细描述。在我们早期的术语中（Bendoly和Oliva 2025），他们的建议主要针对的是我们称之为路径a的研究——由现有的理论推测激发并旨在测试它们的研究——而不是路径B的工作，源于异常或有趣的经验观察。因此，我们的评论集中在他们的分析如何丰富和重塑我们对路径A的理解，同时留下关于路径B在理论发展中的作用的重要问题。我们特别高兴地看到，他们对先验溯因和后验溯因的阐述，发展了溯因语义和理论论证之间的联系，我们认为这是研究贡献的一条有意义的途径。他们对Walker和Weber（1984）如何将TCE的不确定性抽象概念转化为特定情境意义的分析，恰恰说明了我们所想到的那种生成推理。也就是说，我们看到了以对OM特别重要的方式加强、深化和扩展争论的机会。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Operations Management 管理科学-运筹学与管理科学

CiteScore

11.00

自引率

15.40%

发文量

审稿时长

24 months

期刊介绍： The Journal of Operations Management (JOM) is a leading academic publication dedicated to advancing the field of operations management (OM) through rigorous and original research. The journal's primary audience is the academic community, although it also values contributions that attract the interest of practitioners. However, it does not publish articles that are primarily aimed at practitioners, as academic relevance is a fundamental requirement. JOM focuses on the management aspects of various types of operations, including manufacturing, service, and supply chain operations. The journal's scope is broad, covering both profit-oriented and non-profit organizations. The core criterion for publication is that the research question must be centered around operations management, rather than merely using operations as a context. For instance, a study on charismatic leadership in a manufacturing setting would only be within JOM's scope if it directly relates to the management of operations; the mere setting of the study is not enough. Published papers in JOM are expected to address real-world operational questions and challenges. While not all research must be driven by practical concerns, there must be a credible link to practice that is considered from the outset of the research, not as an afterthought. Authors are cautioned against assuming that academic knowledge can be easily translated into practical applications without proper justification. JOM's articles are abstracted and indexed by several prestigious databases and services, including Engineering Information, Inc.; Executive Sciences Institute; INSPEC; International Abstracts in Operations Research; Cambridge Scientific Abstracts; SciSearch/Science Citation Index; CompuMath Citation Index; Current Contents/Engineering, Computing & Technology; Information Access Company; and Social Sciences Citation Index. This ensures that the journal's research is widely accessible and recognized within the academic and professional communities.