In multimedia learning environments, pedagogical agents have emerged as an innovative tool to enhance digital instruction, yet optimising their design for maximal learning effectiveness remains underexplored.
This study aimed to investigate how specific design elements of pedagogical agents, namely appearance and voice type, affect multimedia learning performance.
A 2 (appearance: formal vs. informal) × 2 (voice type: human vs. engine-generated voice) between-subjects design was employed, incorporating eye-tracking technology. A total of 115 participants completed a multimedia learning module on chemical synaptic transmission. Learning outcomes were assessed using retention and transfer tests. Learner perceptions were measured across five indicators: social perception, perceived difficulty, lecture engagement, situational interest, and cognitive load.
Pedagogical agents with a formal appearance positively influenced learning outcomes, increasing fixation duration and fixation count, while reducing perceived material difficulty and intrinsic load. Agents with human voices similarly enhanced learning outcomes, increasing fixation counts, social perception, lecture engagement, and situational interest, while reducing perception difficulty, intrinsic load and extraneous load. The combination of a human voice and formal appearance produced the greatest benefits in learning performance. Meanwhile, compared to the combination of informal appearance and engine-generated voice, the formal appearance with a human voice indirectly affected learning outcomes by reducing perceived difficulty, intrinsic load and extraneous load. It also indirectly increased fixation count through enhanced social perception, lecture engagement and situational interest. These findings advance our understanding of the role of pedagogical agents in multimedia learning and offer valuable insights for designing effective instructional tools that maximise engagement and learning outcomes.