Situatedness in educational research

IF 3.6 2区心理学 Q1 PSYCHOLOGY, EDUCATIONAL

British Journal of Educational Psychology Pub Date : 2025-07-18 DOI:10.1111/bjep.70010

Kai S. Cortina

{"title":"Situatedness in educational research","authors":"Kai S. Cortina","doi":"10.1111/bjep.70010","DOIUrl":null,"url":null,"abstract":"In educational psychology, emphasizing the situational context is clearly ‘du jour’, becoming arguably most apparent in the renaming of Eccles' Expectancy Value Model to ‘Situated Expectancy-value Model’ (SEVT), outlined in several papers she coauthored (Eccles & Wigfield, 2020, 2024; Gladstone et al., 2022). According to Eccles and Wigfield (2024), the programmatic shift was necessary to reflect the expansion of the theory since its beginnings as a framework to explain gender differences in learning motivation and educational choices of students, to a now full-fledged socio-cognitive developmental theory. As such, the model is explicit about the recursive nature of the underlying processes and acknowledges the idiosyncratic circumstances of each behavioural moment, be it students' decision what classes to take or a teacher's decision about the feedback they give each student. While this makes a lot of sense conceptually, the new framing of the model comes with two challenges. One is of epistemological nature, related to the fact that the emphasis of the ‘situatedness’ weakens the generalizability of empirical finding to other, even very similar contexts. The second challenge lies in the translation of the expanded model to adequate empirical research strategies that reflect the new model complexity or, put more simply: How do we overcome the limitations of questionnaires as the most commonly used tool to collect data in this line of research? It feels inadequate now to pack the ‘situatedness’ in the item stem, for example, ‘When doing your math homework…’ or ‘In general, I love being a science teacher’. This might logically make the response somewhat context-specific, but situation-specific enough in the sense of SEVT.Overcoming this limitation is the common theme throughout the six papers which, each in its unique way, are pushing towards a more convincing empirical approach to illustrate and understand the relevance of the situational context and to identify aspects of it that allow us to carefully generalize findings to a similar class of situations. The latter is important as ‘situatedness’ in the SEVT model is not meant to be merely a new label for otherwise unexplained variance in an analysis that uses stable teacher and student characteristics as predictors. Instead, it suggests characterizing the context in order to integrate relevant features into a predictive model.For example, Stark, Camburn & Kaler (this volume) demonstrate that teacher motivation varies across different but typical work activities. But instead of the ‘classic approach’ to rely on item construction for a cross-sectional study (‘When I teach in the classroom…’, ‘When I interact with colleagues…’, ‘When I grade papers…’, etc.), they use the ‘day reconstruction method’ (DRM) to get not only a more valid measurement of the motivational state of teachers in a given context but also a precise account of how often teachers encounter those qualitatively different, but nevertheless typical, professional situations. It is obvious that a teacher's motivational state during actual teaching is not predictive of their long-term experience of burnout, for example, if this context represents only a small fraction of the professional contexts a teacher navigates on a daily basis. They are able to demonstrate that roughly two thirds of the variance in teacher motivation lies between periods, that is, distinct situations throughout a workday, negligible variance between days (controlling for periods) and roughly a third of the variance resides (stably) between teachers.Wang, Thompson-Lee and Klassen (this volume) combine the emphasis on ‘situatedness’ with the advances made in classroom simulations as a tool for teacher training, which is steadily moving towards the use of virtual reality as a standard tool (see Huang et al., 2023). Wang et al. demonstrate that even in the reduced complexity setting of a simulation in an online training setting, the success in adequately reacting in a set of 15 situations a teacher typically encounters on a daily basis has a consistent impact on student teachers self-efficacy beliefs and their assessment of how good they see themselves aligned with the affordances of the job. Unintentionally, the exposure to the scenarios tends to have a somewhat sobering effect since self-efficacy and career intentions trended down on average. However, one could argue that this is reflective of a more accurate self-assessment of the students regarding their readiness to be a teacher. They can take this either as a call to intensify their learning efforts or as a critical appraisal of their decision to become a teacher. As long as the 15 scenarios authentically reflect the professional life of a teacher, this study implicitly reflects the situational variability of the profession, and one is invited to speculate how this might impact a teacher's motivation in the long run.Similar to Stark, Camburn and Kaler (this volume), Bross, Frenzel and Nett (this volume) consider ‘day’ the key temporal unit of observation for a longitudinal study on teacher motivation or, in this case, the emotion regulation of teachers. Emotion regulation is strongly related to teacher motivation as successful regulation of negative emotions an important predictor of maintained teacher motivation is (Wang et al., 2023). The interesting twist in their study is the use of latent profile analysis that allows them, in addition to identifying coping patterns for two emotions in different situational settings, to reveal flexibility/consistency of teachers' emotion regulation across situations as a trait-like characteristic. Even if the authors do not discuss this explicitly, their approach introduces an interesting expansion of the SEVT model: While it is true that situations matter for the response of teachers, only some teachers actually vary in their response to negative emotions while the majority of teachers show very similar emotion regulation patterns. This could be understood as a situation by person interaction: Only 17.4% of the teacher sample used different combinations of flexibility across situations. The approach also reveals that the remaining three patterns consist of teachers who differ in their coping profiles but not across situations. This opens the door for further investigation beyond the emotion regulation research because it is conceivable that similar ‘meta patterns’, that is, stability of different patterns across situations for some teachers but not for others, exist for other motivational constructs as well.Moving to the papers that focus on the instructional process, we again see the need to resort to more complex statistical tools if ‘situatedness’ is of particular interest. Oschwald, Moeller, Kracke, Viljaranta and Dietrich (this volume) present probably the most fine-grained analysis of ‘situatedness’ in the context of motivational research to date, analysing the ‘micro-cycles’ of instructional quality on college students motivation in 9-min intervals (combining three ratings of 3 min). The basic idea was to illustrate that change/variation in the instructional clarity (detail, variation, consistency) has an immediate/short-term lag effect on student motivation. While the authors are very circumspect in considering methodological and conceptual shortcomings of their Null findings, I am more inclined to take them at face value: Motivational dispositions of students, as conceptualized in the SEVT context, are more inert than the study design implies. If this is true, it is good news for future research in the sense that it is not necessary to choose such a high-resolution (and hence expensive) research design. Most likely, a low-clarity teaching style simply does not dampen college students motivation immediately and maybe not even from 1 day to the next. However, if a teacher consistently over days and weeks teaches with low clarity, students become gradually frustrated, start to question their own competence, etc.The idea that zooming out the time-frame somewhat is corroborated by the Rubach and von Keyserlink paper (this volume) which used 5 weeks within the semester as the elapsed time to investigate longitudinal trends. The consistency of the student assessment of the quality of the instruction dominated observation specificity when the course was held constant. However, at a given time point, students rated different courses differently, suggesting that their assessment reflected substantial differences in their perception of the different courses. Also important is their finding that roughly 30% of variance is a stable difference between students who adds substantial noise to any statistical analysis that aims at identifying causal impact over time. Accordingly, Rubach and von Keyserlink acknowledge that their study is limited as it is a single source study, that is, students rated the instructional quality as well as their interest and expectations.But that consistency of instructional quality throughout the semester is a limiting factor to demonstrate ‘situatedness’ of student motivation comes from other research contexts as well, for example, the research on the often replicated ‘thin-slice-effect’ (Ambady & Rosenthal, 1993): Student evaluations at the end of the semester can be extremely well predicted by the assessment of the first 10 min of the first lecture of the semester. While this is often taken as proof of the importance of the first impression, our own (experimental) research suggests that this high correlation is mainly due to the consistency of teacher behaviour throughout the semester (Samudra et al., 2016). The first impression is a good indicator of the teaching quality for the teacher's behaviour/quality of the rest of the semester. A final course evaluation may well be more or less an accurate average of the experience throughout the semester and therefore a valid measure of instructional quality. With the caveat that student assessment and student motivation are different constructs, this observation would suggest for the Oschwald et al. study that the authors would find more robust effects if the time unit was not 9-min intervals, but daily or weekly aggregates of instructional quality.For both, the Oschwald et al. as well as the Rubach and von Keyserlink study, the measurement of instructional quality becomes a critical issue when we want to avoid artefacts of common-source bias or too short-cycled causal models. Göllner, Lazarides and Stark (this volume) make a foray into new territory by exploring the validity of large language models (LLMs) to assess teaching quality which, in the future, could eliminate the human factor in coding entirely. If a holistic semantic analysis could be able to capture relevant aspects of teaching quality reliably, human coding through expert or student assessment would become obsolete. Quality could even be assessed in real time as the teaching is still happening or shortly thereafter, opening the opportunity to use it as immediate feedback in teacher training. In a more rudimentary fashion, we used the same idea for specific teacher training purposes a decade ago. A voice-recording device (LENA) that distinguished teachers' and students' speaking turns identified in-class discourse segments the teachers were learning to use more frequently in their mathematics classes. Teachers received feedback within 24 h, and for some (not all), it was helpful for improving their teaching (Wang et al., 2014).Göllner et al.'s cutting-edge exploratory study shows that LLMs have potential in this regard, but we have still ways to go. The semantic representations are ‘sensitive enough’ to reflect variation between segments, lessons and teacher. They also were associated with human-coded quality assessment, but a ballpark 20% of shared variance is not even close to the level where the human–AI interrater reliability could reach the level of human–human reliability after efficient coder training. However, they used a zero-shot GPT model which mean that no additional information was provided to guide the semantic analysis, and the PCA-based dimensionality reduction is indicative of the exploratory nature of the approach with its inherent difficulty to interpret the dimensions and questions of replicability. However, the prompted transcript analysis is a first step towards a use of LLMs that is closer aligned with theoretical concepts and hence a promising step to the next level. After all, the LLM can identify the strength of instructional dialogue best when it can use samples of human-identified examples of dialogue that represent the quality dimension in question (multi-shot GPT). There is no doubt that LLMs will in the near future take over a lot of (if not all) coding tasks of texts and video footage. But what and how the AI codes material will always depend on theoretical considerations about student–teacher and student–student interactions and how they facilitate academic learning. The tool does not come with a guiding theory and Göllner et al.'s contribution makes that clear.In her reflections on the situative approach to research in educational psychology, Nolen (2024) points out that the situative view leads to an emphasis on understanding the processes that underlie change. This, in turn, leads to a reflection on what kind of change is to be analysed and what kind of change is considered desirable. Academic learning in educational psychology is, for the most part, conceptualized as a cumulative process, as relatively stable gains over an observed time period, adding to the prior knowledge level. Weeks or months as temporal units of analysis seem appropriate as standard in the learning context of curriculum-based schooling, unless the learning of smaller units is the focus, like learning the content of one particular mathematics lesson.In contrast, the underlying idea in the Stark, Camburn and Kaler contribution on teacher motivation is that high teacher motivation is desirable and a potential goal for interventions. Or it stimulates a teacher's self-directed action by minimizing exposure to situations that are demotivating or to change the quality of the social interactions to avoid the demotivating impact. It is not a cumulative, but rather a protective change model. At least implicitly, the self-efficacy belief of student teachers in Wang, Thompson-Lee and Klassen similarly is a variable one would wish to be and remain high, based on the normative assumption that high self-efficacy beliefs are a characteristic of a good teacher. But different from academic learning, there is a logical ceiling for self-efficacy beliefs. Therefore, it is not a cumulative change model, but an optimization model. The environment should lead teacher to—and keep them at—a ‘5 out of 5’ level of self-efficacy.While those two papers have similar underlying change models, the theory of emotion regulation in Bross, Frenzel and Nett is based on a qualitatively different conceptualization of change: homeostasis. For a teacher, anger is arguably a dysfunctional state and it is desirable to quickly and effectively regulate it down to an emotional set point if the situational trigger cannot be avoided. It is apparent that the logical temporal unit of analysis in this context is probably minutes, if the goal is to investigate the process as such. This, of course, is not the intention of the authors as their focus lies on the coping patterns of teachers across situations encountered throughout the day. The assumption is, in fact, cumulative in the sense that exposure to a lot of anger-inducing situations paired with a suboptimal coping pattern will wear teachers down in the long run and reduce their professional motivation.The intention of the Oschwald et al. study was to demonstrate that instructional clarity has an immediate positive impact on college students learning motivation—again not as a cumulative model but with the normative goal to reach and maintain a high level of learning motivation. At least implicitly, the assumption is that a somewhat consistent lack of clarity over a longer period of time, that is, not 9 min but several weeks of low-clarity instruction, will wear a student's learning motivation down. Even if the short-term lag effect could not be shown, the long-term effect might still—and it is likely to—exist.The measurement used in the Rubach and von Keyserlink study is Likert-scale based, which means that it comes with a maximal value despite the fact that theoretically, at least, interest is logically unlimited and could therefore follow a cumulative model. If the quality of the instruction is extremely high every week I am in class, my interest might continuously grow until the end of the term. I might reach the scale's ceiling, but that would be an artefact of the measurement scale.Why are these considerations important? They identify the epistemological challenge of an overly situation-focused perspective. While it might be relevant in some research contexts to understand features of the situation and not treat it as error variance (Nolen, 2024), we will still need to transcend the insights gained from these analyses to a more general level in order to be of educational relevance. At least for the run-of-the-mill K-12 schooling context, it would be difficult to drop the traditional positivistic rationale when we consider the practical relevance of our research: Once causal mechanisms are identified as tentative truths, they are of practical relevance only if they show long-term impact on academic learning and psychosocial development across a fairly broad class of situational contexts. The more specific the context is defined in the research, the more limited the practical implications. For example, it might be of psychological interest to demonstrate that a student's academic self-concept dips down after 20 instances of unclear instruction. But if the teacher simply was underprepared on that day and otherwise presented the material clearly and accessibly throughout the semester, treating this as a random ‘error’ is probably justified. When long-term development is the main focus of our research (here motivation), the minute-to-minute fluctuations in the clarity of instructions are unlikely to be important. The reason is that, in the back of our heads, we have a model of how motivation affects learning. A student who is—more or less—stably interested in the content of the class will be more likely to work happily on assignments, etc. in the evening and on weekends. As a general rule (non-situative), research has shown that unclear instruction has a negative impact on self-concept and interest in the long run, and we assume that this is true for a broad array of situations, student characteristics, grade levels, etc. This is why it is reasonable that teacher training works with student teachers on instructional clarity as a skill set. If done well, across a variety of situations and contexts a teacher will experience. Even if we recognize that every situation is different and mechanisms are ‘complex’, ‘situatedness’ cannot mean that educational psychology, as an applied science, loses sight of the long-term developmental goals that are the ultimate ‘dependent variables’ of educational processes. It means to be more attentive to conditions of the learning environment that are necessary for the assumed impact of certain independent variables on learning success. Situatedness means to acknowledge the contextual embeddedness of teaching, but this becomes a non-trivial paradigm only if the goal is to identify situational characteristics that allow generalizations. The Stark, Camburn and Kaler paper provides a strong example for this idea because their diary method allowed the teachers themselves to identify situations and their similarities over time and how they felt in those contexts. While every meeting with other teachers might be different from the next, they share as a group many features that contrast with other situations in the daily professional routine, for example, actual instruction in the classroom. Reflecting on the situations we navigate on a daily basis seems to be a good starting point to translate ‘situatedness’ into a research paradigm that does not lose sight of the core focus of our discipline—the process of learning in institutional settings.","PeriodicalId":51367,"journal":{"name":"British Journal of Educational Psychology","volume":"95 S1","pages":"S337-S342"},"PeriodicalIF":3.6000,"publicationDate":"2025-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://bpspsychub.onlinelibrary.wiley.com/doi/epdf/10.1111/bjep.70010","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"British Journal of Educational Psychology","FirstCategoryId":"102","ListUrlMain":"https://bpspsychub.onlinelibrary.wiley.com/doi/10.1111/bjep.70010","RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PSYCHOLOGY, EDUCATIONAL","Score":null,"Total":0}

引用次数: 0

Abstract

In educational psychology, emphasizing the situational context is clearly ‘du jour’, becoming arguably most apparent in the renaming of Eccles' Expectancy Value Model to ‘Situated Expectancy-value Model’ (SEVT), outlined in several papers she coauthored (Eccles & Wigfield, 2020, 2024; Gladstone et al., 2022). According to Eccles and Wigfield (2024), the programmatic shift was necessary to reflect the expansion of the theory since its beginnings as a framework to explain gender differences in learning motivation and educational choices of students, to a now full-fledged socio-cognitive developmental theory. As such, the model is explicit about the recursive nature of the underlying processes and acknowledges the idiosyncratic circumstances of each behavioural moment, be it students' decision what classes to take or a teacher's decision about the feedback they give each student. While this makes a lot of sense conceptually, the new framing of the model comes with two challenges. One is of epistemological nature, related to the fact that the emphasis of the ‘situatedness’ weakens the generalizability of empirical finding to other, even very similar contexts. The second challenge lies in the translation of the expanded model to adequate empirical research strategies that reflect the new model complexity or, put more simply: How do we overcome the limitations of questionnaires as the most commonly used tool to collect data in this line of research? It feels inadequate now to pack the ‘situatedness’ in the item stem, for example, ‘When doing your math homework…’ or ‘In general, I love being a science teacher’. This might logically make the response somewhat context-specific, but situation-specific enough in the sense of SEVT.

Overcoming this limitation is the common theme throughout the six papers which, each in its unique way, are pushing towards a more convincing empirical approach to illustrate and understand the relevance of the situational context and to identify aspects of it that allow us to carefully generalize findings to a similar class of situations. The latter is important as ‘situatedness’ in the SEVT model is not meant to be merely a new label for otherwise unexplained variance in an analysis that uses stable teacher and student characteristics as predictors. Instead, it suggests characterizing the context in order to integrate relevant features into a predictive model.

For example, Stark, Camburn & Kaler (this volume) demonstrate that teacher motivation varies across different but typical work activities. But instead of the ‘classic approach’ to rely on item construction for a cross-sectional study (‘When I teach in the classroom…’, ‘When I interact with colleagues…’, ‘When I grade papers…’, etc.), they use the ‘day reconstruction method’ (DRM) to get not only a more valid measurement of the motivational state of teachers in a given context but also a precise account of how often teachers encounter those qualitatively different, but nevertheless typical, professional situations. It is obvious that a teacher's motivational state during actual teaching is not predictive of their long-term experience of burnout, for example, if this context represents only a small fraction of the professional contexts a teacher navigates on a daily basis. They are able to demonstrate that roughly two thirds of the variance in teacher motivation lies between periods, that is, distinct situations throughout a workday, negligible variance between days (controlling for periods) and roughly a third of the variance resides (stably) between teachers.

Wang, Thompson-Lee and Klassen (this volume) combine the emphasis on ‘situatedness’ with the advances made in classroom simulations as a tool for teacher training, which is steadily moving towards the use of virtual reality as a standard tool (see Huang et al., 2023). Wang et al. demonstrate that even in the reduced complexity setting of a simulation in an online training setting, the success in adequately reacting in a set of 15 situations a teacher typically encounters on a daily basis has a consistent impact on student teachers self-efficacy beliefs and their assessment of how good they see themselves aligned with the affordances of the job. Unintentionally, the exposure to the scenarios tends to have a somewhat sobering effect since self-efficacy and career intentions trended down on average. However, one could argue that this is reflective of a more accurate self-assessment of the students regarding their readiness to be a teacher. They can take this either as a call to intensify their learning efforts or as a critical appraisal of their decision to become a teacher. As long as the 15 scenarios authentically reflect the professional life of a teacher, this study implicitly reflects the situational variability of the profession, and one is invited to speculate how this might impact a teacher's motivation in the long run.

Similar to Stark, Camburn and Kaler (this volume), Bross, Frenzel and Nett (this volume) consider ‘day’ the key temporal unit of observation for a longitudinal study on teacher motivation or, in this case, the emotion regulation of teachers. Emotion regulation is strongly related to teacher motivation as successful regulation of negative emotions an important predictor of maintained teacher motivation is (Wang et al., 2023). The interesting twist in their study is the use of latent profile analysis that allows them, in addition to identifying coping patterns for two emotions in different situational settings, to reveal flexibility/consistency of teachers' emotion regulation across situations as a trait-like characteristic. Even if the authors do not discuss this explicitly, their approach introduces an interesting expansion of the SEVT model: While it is true that situations matter for the response of teachers, only some teachers actually vary in their response to negative emotions while the majority of teachers show very similar emotion regulation patterns. This could be understood as a situation by person interaction: Only 17.4% of the teacher sample used different combinations of flexibility across situations. The approach also reveals that the remaining three patterns consist of teachers who differ in their coping profiles but not across situations. This opens the door for further investigation beyond the emotion regulation research because it is conceivable that similar ‘meta patterns’, that is, stability of different patterns across situations for some teachers but not for others, exist for other motivational constructs as well.

Moving to the papers that focus on the instructional process, we again see the need to resort to more complex statistical tools if ‘situatedness’ is of particular interest. Oschwald, Moeller, Kracke, Viljaranta and Dietrich (this volume) present probably the most fine-grained analysis of ‘situatedness’ in the context of motivational research to date, analysing the ‘micro-cycles’ of instructional quality on college students motivation in 9-min intervals (combining three ratings of 3 min). The basic idea was to illustrate that change/variation in the instructional clarity (detail, variation, consistency) has an immediate/short-term lag effect on student motivation. While the authors are very circumspect in considering methodological and conceptual shortcomings of their Null findings, I am more inclined to take them at face value: Motivational dispositions of students, as conceptualized in the SEVT context, are more inert than the study design implies. If this is true, it is good news for future research in the sense that it is not necessary to choose such a high-resolution (and hence expensive) research design. Most likely, a low-clarity teaching style simply does not dampen college students motivation immediately and maybe not even from 1 day to the next. However, if a teacher consistently over days and weeks teaches with low clarity, students become gradually frustrated, start to question their own competence, etc.

The idea that zooming out the time-frame somewhat is corroborated by the Rubach and von Keyserlink paper (this volume) which used 5 weeks within the semester as the elapsed time to investigate longitudinal trends. The consistency of the student assessment of the quality of the instruction dominated observation specificity when the course was held constant. However, at a given time point, students rated different courses differently, suggesting that their assessment reflected substantial differences in their perception of the different courses. Also important is their finding that roughly 30% of variance is a stable difference between students who adds substantial noise to any statistical analysis that aims at identifying causal impact over time. Accordingly, Rubach and von Keyserlink acknowledge that their study is limited as it is a single source study, that is, students rated the instructional quality as well as their interest and expectations.

But that consistency of instructional quality throughout the semester is a limiting factor to demonstrate ‘situatedness’ of student motivation comes from other research contexts as well, for example, the research on the often replicated ‘thin-slice-effect’ (Ambady & Rosenthal, 1993): Student evaluations at the end of the semester can be extremely well predicted by the assessment of the first 10 min of the first lecture of the semester. While this is often taken as proof of the importance of the first impression, our own (experimental) research suggests that this high correlation is mainly due to the consistency of teacher behaviour throughout the semester (Samudra et al., 2016). The first impression is a good indicator of the teaching quality for the teacher's behaviour/quality of the rest of the semester. A final course evaluation may well be more or less an accurate average of the experience throughout the semester and therefore a valid measure of instructional quality. With the caveat that student assessment and student motivation are different constructs, this observation would suggest for the Oschwald et al. study that the authors would find more robust effects if the time unit was not 9-min intervals, but daily or weekly aggregates of instructional quality.

For both, the Oschwald et al. as well as the Rubach and von Keyserlink study, the measurement of instructional quality becomes a critical issue when we want to avoid artefacts of common-source bias or too short-cycled causal models. Göllner, Lazarides and Stark (this volume) make a foray into new territory by exploring the validity of large language models (LLMs) to assess teaching quality which, in the future, could eliminate the human factor in coding entirely. If a holistic semantic analysis could be able to capture relevant aspects of teaching quality reliably, human coding through expert or student assessment would become obsolete. Quality could even be assessed in real time as the teaching is still happening or shortly thereafter, opening the opportunity to use it as immediate feedback in teacher training. In a more rudimentary fashion, we used the same idea for specific teacher training purposes a decade ago. A voice-recording device (LENA) that distinguished teachers' and students' speaking turns identified in-class discourse segments the teachers were learning to use more frequently in their mathematics classes. Teachers received feedback within 24 h, and for some (not all), it was helpful for improving their teaching (Wang et al., 2014).

Göllner et al.'s cutting-edge exploratory study shows that LLMs have potential in this regard, but we have still ways to go. The semantic representations are ‘sensitive enough’ to reflect variation between segments, lessons and teacher. They also were associated with human-coded quality assessment, but a ballpark 20% of shared variance is not even close to the level where the human–AI interrater reliability could reach the level of human–human reliability after efficient coder training. However, they used a zero-shot GPT model which mean that no additional information was provided to guide the semantic analysis, and the PCA-based dimensionality reduction is indicative of the exploratory nature of the approach with its inherent difficulty to interpret the dimensions and questions of replicability. However, the prompted transcript analysis is a first step towards a use of LLMs that is closer aligned with theoretical concepts and hence a promising step to the next level. After all, the LLM can identify the strength of instructional dialogue best when it can use samples of human-identified examples of dialogue that represent the quality dimension in question (multi-shot GPT). There is no doubt that LLMs will in the near future take over a lot of (if not all) coding tasks of texts and video footage. But what and how the AI codes material will always depend on theoretical considerations about student–teacher and student–student interactions and how they facilitate academic learning. The tool does not come with a guiding theory and Göllner et al.'s contribution makes that clear.

In her reflections on the situative approach to research in educational psychology, Nolen (2024) points out that the situative view leads to an emphasis on understanding the processes that underlie change. This, in turn, leads to a reflection on what kind of change is to be analysed and what kind of change is considered desirable. Academic learning in educational psychology is, for the most part, conceptualized as a cumulative process, as relatively stable gains over an observed time period, adding to the prior knowledge level. Weeks or months as temporal units of analysis seem appropriate as standard in the learning context of curriculum-based schooling, unless the learning of smaller units is the focus, like learning the content of one particular mathematics lesson.

In contrast, the underlying idea in the Stark, Camburn and Kaler contribution on teacher motivation is that high teacher motivation is desirable and a potential goal for interventions. Or it stimulates a teacher's self-directed action by minimizing exposure to situations that are demotivating or to change the quality of the social interactions to avoid the demotivating impact. It is not a cumulative, but rather a protective change model. At least implicitly, the self-efficacy belief of student teachers in Wang, Thompson-Lee and Klassen similarly is a variable one would wish to be and remain high, based on the normative assumption that high self-efficacy beliefs are a characteristic of a good teacher. But different from academic learning, there is a logical ceiling for self-efficacy beliefs. Therefore, it is not a cumulative change model, but an optimization model. The environment should lead teacher to—and keep them at—a ‘5 out of 5’ level of self-efficacy.

While those two papers have similar underlying change models, the theory of emotion regulation in Bross, Frenzel and Nett is based on a qualitatively different conceptualization of change: homeostasis. For a teacher, anger is arguably a dysfunctional state and it is desirable to quickly and effectively regulate it down to an emotional set point if the situational trigger cannot be avoided. It is apparent that the logical temporal unit of analysis in this context is probably minutes, if the goal is to investigate the process as such. This, of course, is not the intention of the authors as their focus lies on the coping patterns of teachers across situations encountered throughout the day. The assumption is, in fact, cumulative in the sense that exposure to a lot of anger-inducing situations paired with a suboptimal coping pattern will wear teachers down in the long run and reduce their professional motivation.

The intention of the Oschwald et al. study was to demonstrate that instructional clarity has an immediate positive impact on college students learning motivation—again not as a cumulative model but with the normative goal to reach and maintain a high level of learning motivation. At least implicitly, the assumption is that a somewhat consistent lack of clarity over a longer period of time, that is, not 9 min but several weeks of low-clarity instruction, will wear a student's learning motivation down. Even if the short-term lag effect could not be shown, the long-term effect might still—and it is likely to—exist.

The measurement used in the Rubach and von Keyserlink study is Likert-scale based, which means that it comes with a maximal value despite the fact that theoretically, at least, interest is logically unlimited and could therefore follow a cumulative model. If the quality of the instruction is extremely high every week I am in class, my interest might continuously grow until the end of the term. I might reach the scale's ceiling, but that would be an artefact of the measurement scale.

Why are these considerations important? They identify the epistemological challenge of an overly situation-focused perspective. While it might be relevant in some research contexts to understand features of the situation and not treat it as error variance (Nolen, 2024), we will still need to transcend the insights gained from these analyses to a more general level in order to be of educational relevance. At least for the run-of-the-mill K-12 schooling context, it would be difficult to drop the traditional positivistic rationale when we consider the practical relevance of our research: Once causal mechanisms are identified as tentative truths, they are of practical relevance only if they show long-term impact on academic learning and psychosocial development across a fairly broad class of situational contexts. The more specific the context is defined in the research, the more limited the practical implications. For example, it might be of psychological interest to demonstrate that a student's academic self-concept dips down after 20 instances of unclear instruction. But if the teacher simply was underprepared on that day and otherwise presented the material clearly and accessibly throughout the semester, treating this as a random ‘error’ is probably justified. When long-term development is the main focus of our research (here motivation), the minute-to-minute fluctuations in the clarity of instructions are unlikely to be important. The reason is that, in the back of our heads, we have a model of how motivation affects learning. A student who is—more or less—stably interested in the content of the class will be more likely to work happily on assignments, etc. in the evening and on weekends. As a general rule (non-situative), research has shown that unclear instruction has a negative impact on self-concept and interest in the long run, and we assume that this is true for a broad array of situations, student characteristics, grade levels, etc. This is why it is reasonable that teacher training works with student teachers on instructional clarity as a skill set. If done well, across a variety of situations and contexts a teacher will experience. Even if we recognize that every situation is different and mechanisms are ‘complex’, ‘situatedness’ cannot mean that educational psychology, as an applied science, loses sight of the long-term developmental goals that are the ultimate ‘dependent variables’ of educational processes. It means to be more attentive to conditions of the learning environment that are necessary for the assumed impact of certain independent variables on learning success. Situatedness means to acknowledge the contextual embeddedness of teaching, but this becomes a non-trivial paradigm only if the goal is to identify situational characteristics that allow generalizations. The Stark, Camburn and Kaler paper provides a strong example for this idea because their diary method allowed the teachers themselves to identify situations and their similarities over time and how they felt in those contexts. While every meeting with other teachers might be different from the next, they share as a group many features that contrast with other situations in the daily professional routine, for example, actual instruction in the classroom. Reflecting on the situations we navigate on a daily basis seems to be a good starting point to translate ‘situatedness’ into a research paradigm that does not lose sight of the core focus of our discipline—the process of learning in institutional settings.

Abstract Image

查看原文本刊更多论文

教育研究中的情境性。

在教育心理学中，强调情境情境显然是“当下的”，这一点在她与人合著的几篇论文中（Eccles & Wigfield， 2020年，2024年；Gladstone等人，2022年）中将埃克尔斯的期望值模型重新命名为“情境期望值模型”（SEVT）中最为明显。根据Eccles和Wigfield（2024）的说法，从一开始作为解释学生学习动机和教育选择的性别差异的框架，到现在成熟的社会认知发展理论，这种计划性的转变是必要的，以反映理论的扩展。因此，该模型明确说明了潜在过程的递归性质，并承认每个行为时刻的特殊情况，无论是学生决定上什么课，还是老师决定给每个学生的反馈。虽然这在概念上很有意义，但模型的新框架带来了两个挑战。一个是认识论性质的，与强调“情境性”削弱了经验发现对其他，甚至非常相似的背景的概括性这一事实有关。第二个挑战在于将扩展的模型转化为充分的实证研究策略，以反映新模型的复杂性，或者更简单地说：我们如何克服问卷调查作为这类研究中最常用的数据收集工具的局限性？现在把“情境性”放在条目的词干里是不够的，比如，“当你做数学作业的时候……”或者“总的来说，我喜欢做一名科学老师”。从逻辑上讲，这可能使响应在某种程度上与上下文相关，但在SEVT的意义上是与情况相关的。克服这一限制是贯穿六篇论文的共同主题，每一篇论文都以其独特的方式，推动一种更有说服力的经验方法来说明和理解情境背景的相关性，并确定它的各个方面，使我们能够仔细地将研究结果推广到类似的情况。后者很重要，因为在使用稳定的教师和学生特征作为预测因子的分析中，SEVT模型中的“情境性”并不意味着仅仅是一个新的标签，否则无法解释的差异。相反，它建议描述上下文，以便将相关特征集成到预测模型中。例如，Stark, Camburn & Kaler（本卷）证明教师的动机在不同但典型的工作活动中是不同的。但是，与依靠项目构建进行横断面研究的“经典方法”（“当我在课堂上教书时……”，“当我与同事互动时……”，“当我给论文打分时……”等）不同，他们使用“日重构法”（DRM），不仅可以更有效地测量教师在给定环境中的动机状态，还可以精确地描述教师遇到那些性质不同但却很典型的专业情况的频率。很明显，教师在实际教学中的动机状态并不能预测他们长期的职业倦怠经历，例如，如果这个背景只代表教师每天所处理的专业背景的一小部分。他们能够证明，大约三分之二的教师动机差异存在于时间段之间，也就是说，整个工作日的不同情况，天之间的差异可以忽略不计（控制时间段），大约三分之一的差异存在于教师之间（稳定）。Wang， Thompson-Lee和Klassen（本卷）将对“情境性”的强调与课堂模拟作为教师培训工具所取得的进步相结合，这一工具正稳步朝着使用虚拟现实作为标准工具的方向发展（见Huang等人，2023）。Wang等人证明，即使在在线培训环境中模拟的复杂性降低的情况下，教师在每天通常遇到的15种情况下成功地做出适当反应，也会对学生教师的自我效能感信念和他们对自己与工作的能力相一致的评估产生一致的影响。无意中，由于自我效能感和职业意向的平均下降，这些场景的暴露往往会产生某种清醒的效果。然而，有人可能会说，这反映了学生对他们是否准备好成为一名教师的更准确的自我评估。他们既可以将此视为加强学习努力的号召，也可以将其视为对自己成为一名教师的决定的批判性评估。只要这15个场景真实地反映了教师的职业生活，本研究就隐含地反映了该职业的情境可变性，并邀请人们推测这可能会如何影响教师的长期动机。与Stark、Camburn和Kaler（本卷）类似，Bross、Frenzel和Nett（本卷）认为“日”是教师动机纵向研究的关键时间观察单位，在这种情况下，是教师的情绪调节。情绪调节与教师动机密切相关，因为成功调节负面情绪是维持教师动机的重要预测因素（Wang et al., 2023）。在他们的研究中，一个有趣的转折是使用了潜在轮廓分析，这使得他们除了确定在不同情境下两种情绪的应对模式外，还揭示了教师在不同情境下情绪调节的灵活性/一致性，这是一种类似特质的特征。即使作者没有明确地讨论这一点，他们的方法也引入了SEVT模型的一个有趣的扩展：虽然情境对教师的反应很重要，但只有一些教师对负面情绪的反应不同，而大多数教师表现出非常相似的情绪调节模式。这可以理解为人与人之间的互动：只有17.4%的教师样本在不同的情况下使用不同的灵活性组合。该方法还揭示了其余三种模式由教师组成，他们的应对方式不同，但不是跨情境的。这为情绪调节研究之外的进一步调查打开了大门，因为可以想象，类似的“元模式”也存在于其他动机结构中，即某些教师在不同情况下不同模式的稳定性，而其他教师则不存在。转到关注教学过程的论文，我们再次看到，如果对“情境性”特别感兴趣，则需要诉诸更复杂的统计工具。Oschwald, Moeller, Kracke， Viljaranta和Dietrich（本卷）可能是迄今为止动机研究背景下最细致的“情境性”分析，分析了9分钟间隔（结合3分钟的三个评级）中大学生动机的教学质量的“微周期”。基本的想法是说明教学清晰度（细节，变化，一致性）的变化/变化对学生动机有直接/短期的滞后效应。虽然作者在考虑他们的Null发现的方法论和概念上的缺陷时非常谨慎，但我更倾向于从表面上接受它们：学生的动机倾向，在SEVT背景下概念化，比研究设计所暗示的更惰性。如果这是真的，这对未来的研究来说是个好消息，因为没有必要选择如此高分辨率（因此昂贵）的研究设计。最可能的是，低清晰度的教学方式不会立即抑制大学生的学习动机，甚至可能不会从一天到第二天。然而，如果一个老师连续几天或几周的教学不清晰，学生就会逐渐感到沮丧，开始质疑自己的能力，等等。Rubach和von Keyserlink的论文（本卷）在一定程度上证实了缩小时间框架的想法，他们在学期中使用了5周的时间来调查纵向趋势。当课程保持不变时，学生对教学质量评价的一致性主导了观察特异性。然而，在给定的时间点上，学生对不同课程的评价不同，这表明他们的评价反映了他们对不同课程的看法的实质性差异。同样重要的是，他们发现，大约30%的差异是学生之间的稳定差异，这给任何旨在确定随时间推移的因果影响的统计分析增加了大量噪音。因此，Rubach和von Keyserlink承认他们的研究是有限的，因为它是一个单一来源的研究，也就是说，学生对教学质量以及他们的兴趣和期望进行了评价。但是，整个学期教学质量的一致性是证明学生动机的“情境性”的限制因素，这也来自其他研究背景，例如，关于经常被重复的“薄片效应”的研究（Ambady & Rosenthal, 1993）：学生在学期结束时的评估可以通过学期第一堂课的前10分钟的评估来非常好地预测。虽然这通常被认为是第一印象重要性的证据，但我们自己的（实验）研究表明，这种高相关性主要是由于整个学期教师行为的一致性（Samudra et al., 2016）。第一印象是一个很好的指标，可以反映出老师在本学期的教学质量。最后的课程评估或多或少是整个学期经验的准确平均，因此是对教学质量的有效衡量。需要注意的是，学生评估和学生动机是不同的结构，这一观察结果表明，对于Oschwald等人的研究，如果时间单位不是间隔9分钟，而是每天或每周的教学质量总和，作者会发现更强大的效果。对于Oschwald等人以及Rubach和von Keyserlink的研究来说，当我们想要避免共同来源偏差或太短周期因果模型的人为因素时，教学质量的测量成为一个关键问题。Göllner， Lazarides和Stark（本卷）通过探索大型语言模型（llm）的有效性来评估教学质量，在未来，可以完全消除编码中的人为因素，从而进军新领域。如果整体语义分析能够可靠地捕获教学质量的相关方面，那么通过专家或学生评估的人工编码将变得过时。质量甚至可以在教学仍在进行或不久之后进行实时评估，从而有机会将其作为教师培训的即时反馈。十年前，我们以一种更基本的方式，将同样的想法用于特定的教师培训目的。一种语音记录装置（LENA）可以区分教师和学生的说话转向，识别教师在数学课堂上学习更频繁使用的课堂话语片段。教师在24小时内收到反馈，对一些（不是全部）教师来说，这有助于提高他们的教学水平（Wang et al., 2014）。Göllner等人的前沿探索性研究表明，法学硕士在这方面有潜力，但我们还有很长的路要走。语义表示“足够敏感”，可以反映片段、课程和教师之间的差异。它们也与人类编码的质量评估有关，但大约20%的共享方差甚至不接近人类-人工智能口译员在经过有效的编码培训后可以达到人类-人类可靠性水平的水平。然而，他们使用了零射GPT模型，这意味着没有提供额外的信息来指导语义分析，而基于pca的降维表明了该方法的探索性，其固有的困难是解释维度和可复制性问题。然而，提示转录分析是法学硕士使用的第一步，它与理论概念更接近，因此是迈向下一个层次的有希望的一步。毕竟，法学硕士可以最好地识别教学对话的强度，当它可以使用代表所讨论的质量维度（多镜头GPT）的人类识别对话示例的样本时。毫无疑问，法学硕士将在不久的将来接管大量（如果不是全部的话）文本和视频片段的编码任务。但是，人工智能编码材料的内容和方式将始终取决于对学生与教师、学生与学生互动以及它们如何促进学术学习的理论考虑。这个工具并没有一个指导理论，Göllner等人的贡献使这一点很清楚。在对教育心理学情境研究方法的反思中，诺伦（2024）指出，情境观点导致了对理解变化背后过程的强调。这反过来又导致对要分析什么样的变化以及什么样的变化被认为是可取的思考。在教育心理学中，学术学习在很大程度上被概念化为一个累积的过程，是在一段观察时间内相对稳定的收获，增加了先前的知识水平。在以课程为基础的学校学习环境中，以周或月作为分析的时间单位似乎是合适的标准，除非较小单位的学习是重点，比如学习一节特定数学课的内容。相比之下，Stark、Camburn和Kaler对教师动机的贡献的基本思想是，教师的高动机是理想的，是干预的潜在目标。或者，它通过尽量减少接触使教师失去动力的情况来刺激教师的自我指导行动，或者改变社会互动的质量以避免失去动力的影响。它不是一个累积的，而是一个保护性的变化模型。至少隐含地，Wang、Thompson-Lee和Klassen的学生教师的自我效能感信念同样是一个人们希望保持高水平的变量，这是基于高自我效能感信念是好教师的一个特征的规范性假设。但与学术学习不同的是，自我效能感信念有一个逻辑上限。因此，它不是一个累积变化模型，而是一个优化模型。环境应该引导教师——并使他们保持在“5分满分”的自我效能水平上。虽然这两篇论文具有相似的潜在变化模型，但Bross、Frenzel和Nett的情绪调节理论基于一种质的不同的变化概念：内稳态。对于老师来说，愤怒可以说是一种功能失调的状态，如果情境触发无法避免，那么快速有效地将其调节到一个情绪设定值是可取的。很明显，在这种情况下，分析的逻辑时间单位可能是分钟，如果目标是调查过程的话。当然，这并不是作者的本意，因为他们关注的是教师在一天中遇到的各种情况下的应对模式。事实上，这个假设是累积性的，因为长期来看，暴露在大量引发愤怒的情况下，加上不理想的应对模式，会让教师精疲力竭，降低他们的职业动机。Oschwald等人研究的目的是证明教学清晰度对大学生的学习动机有直接的积极影响——同样不是作为累积模型，而是以达到并保持高水平的学习动机为规范目标。至少，隐含的假设是，在较长一段时间内，即不是9分钟，而是几周的低清晰度教学，某种程度上持续缺乏清晰度，会消磨学生的学习动机。即使短期滞后效应无法显现，长期效应可能仍然存在，而且很可能存在。Rubach和von Keyserlink研究中使用的测量方法是基于李克特量表的，这意味着尽管至少在理论上，兴趣在逻辑上是无限的，因此可以遵循累积模型，但它还是有一个最大值。如果每周上课的教学质量非常高，我的兴趣可能会持续增长，直到学期结束。我可能会达到体重计的上限，但那是体重计造成的假象。为什么这些考虑很重要？他们确定了过度情境聚焦视角的认识论挑战。虽然在某些研究背景下，理解情况的特征可能是相关的，而不是将其视为误差方差（Nolen, 2024），但我们仍然需要将从这些分析中获得的见解超越到更一般的水平，以便具有教育相关性。至少在普通的K-12学校教育背景下，当我们考虑我们研究的实际相关性时，很难放弃传统的实证主义理论：一旦因果机制被确定为试探性真理，它们只有在相当广泛的情景背景下对学术学习和心理社会发展表现出长期影响时才具有实际相关性。研究中对语境的界定越具体，其实际意义就越有限。例如，在20次不清晰的教学后，学生的学术自我概念会下降，这可能会引起心理学上的兴趣。但是，如果老师只是在那天准备不足，而在整个学期中都清晰易懂地呈现了材料，那么将其视为随机“错误”可能是合理的。当长期发展是我们研究的主要焦点时（这里是动机），指示清晰度的每分钟波动不太可能是重要的。原因是，在我们的大脑深处，我们有一个关于动机如何影响学习的模型。一个对课堂内容或多或少稳定感兴趣的学生，更有可能在晚上和周末愉快地完成作业等。作为一般规则（非情境），研究表明，从长远来看，不清晰的教学对自我概念和兴趣有负面影响，我们假设这对广泛的情境、学生特征、年级水平等都是如此。这就是为什么教师培训与实习教师一起将教学清晰度作为一种技能是合理的。如果做得好，教师将经历各种情况和背景。即使我们认识到每个情况都是不同的，机制是“复杂的”，“情境性”也不能意味着教育心理学作为一门应用科学，忽视了作为教育过程最终“因变量”的长期发展目标。它意味着更加关注学习环境的条件，这些条件对于某些自变量对学习成功的假设影响是必要的。情境性意味着承认教学的情境嵌入性，但只有当目标是确定允许归纳的情境特征时，这才成为一个重要的范式。 Stark， Camburn和Kaler的论文为这一观点提供了一个强有力的例子，因为他们的日记方法允许教师自己识别情境，随着时间的推移，他们的相似之处，以及他们在这些情境中的感受。虽然与其他教师的每次会面可能与下一次不同，但他们作为一个群体具有许多与日常专业程序中的其他情况相比的特征，例如，在课堂上的实际教学。反思我们每天所面对的情况似乎是一个很好的起点，可以将“情境性”转化为一种研究范式，这种范式不会忽视我们学科的核心焦点——机构环境中的学习过程。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

British Journal of Educational Psychology PSYCHOLOGY, EDUCATIONAL-

CiteScore

7.70

自引率

2.70%

发文量

期刊介绍： The British Journal of Educational Psychology publishes original psychological research pertaining to education across all ages and educational levels including: - cognition - learning - motivation - literacy - numeracy and language - behaviour - social-emotional development - developmental difficulties linked to educational psychology or the psychology of education