The testing effect disappeared in two carefully controlled online experiments

Two Prolific-based studies found no clear advantage for retrieval practice over restudying, despite delayed tests, feedback, and attention checks — raising questions about online learning research conditions.

Contents

Split scene comparing flashcard testing with active brain activity versus passive book reading, with balance scale and participant icons

Illustration: retrieval practice vs restudying in an online study setting — conceptual image, not from the cited study.

Short version

Retrieval practice — learning by actively recalling information instead of simply rereading it — is one of the best-known findings in learning science.

But two new online experiments produced a surprising result:

Retrieval practice did not outperform restudying.

The studies were conducted on Prolific, a widely used online platform for behavioral research. Importantly, the researchers did not run careless or minimal experiments. They included delayed testing, corrective feedback, attention checks, validated materials, and fair participant payment — all features normally expected to strengthen the testing effect.

Even with those safeguards, participants who practiced retrieval performed about the same as those who simply restudied the material.

The authors do not argue that the testing effect is false. Instead, they suggest the problem may lie in the realities of online participation itself: divided attention, multitasking, lower engagement, and weaker control over study conditions.


What the researchers studied

The research was conducted by Sigayret, Parmentier, and Silvestre.

Their goal was not to challenge decades of memory research directly, but to investigate whether the testing effect still appears reliably when experiments move from tightly supervised laboratories to crowdsourced online environments.

To test this, they conducted two experiments using participants recruited through Prolific.

In both studies, participants were randomly assigned to one of two conditions:

  • a retrieval-practice group, where people actively recalled information during learning;
  • or a restudy group, where people reviewed the same material again without retrieval demands.

The researchers intentionally designed the studies to maximize the chances of finding a testing effect.

For example, they included:

  • delayed post-tests instead of immediate testing only;
  • corrective feedback after practice;
  • both factual and application-style questions;
  • attention checks;
  • and participant prescreening procedures.

The authors also discussed several challenges specific to online research platforms, including short task sessions, inconsistent motivation, and the possibility that participants divide attention across multiple devices or activities while completing experiments.


What the experiments found

The central result was straightforward:

Retrieval practice did not significantly outperform restudying in either experiment.

This null result appeared even though the studies incorporated several design choices that previous literature suggests should strengthen retrieval-based learning effects.

The researchers do not interpret this as evidence that the testing effect itself is theoretically wrong. Instead, they argue that online environments may make it harder to observe effects that depend heavily on sustained cognitive engagement.

In a traditional laboratory setting, researchers can monitor distractions, enforce pacing, and maintain participant attention more closely. Online platforms offer much less control.

That matters because retrieval practice is not merely exposure to information — it depends on active mental effort. If participants skim tasks, multitask, or disengage mentally, the mechanism behind retrieval practice may weaken substantially.


Why this matters

The study is important partly because retrieval practice has become extremely popular in educational technology, self-study systems, and online learning platforms.

Many modern learning tools assume that active recall will reliably outperform rereading under almost any condition.

This paper adds a more cautious perspective.

It suggests that:

The effectiveness of a learning strategy may depend not only on the method itself, but also on the quality of engagement during learning.

That distinction matters for both researchers and everyday learners.

A flashcard app does not automatically create deep retrieval practice simply because questions appear on a screen. If users rush through cards, guess carelessly, or split attention between multiple tabs and devices, the cognitive process may become much closer to passive exposure.

The findings also highlight a broader issue in modern psychology and educational science:

Online research platforms dramatically increase speed, scale, and participant diversity — but they may also introduce new forms of noise that weaken certain cognitive effects.


What this could mean in practice

For students and self-learners, the study offers an important reminder:

Active recall only works when recall is actually happening.

Using flashcards while distracted, tired, or multitasking may reduce many of the benefits that retrieval-based learning is supposed to provide.

The paper also suggests several practical lessons for educators and researchers:

  • attention and engagement may matter as much as the study technique itself;
  • online learning experiments should monitor participation quality carefully;
  • time-on-task and distraction levels may influence results more than expected;
  • educational apps may need stronger mechanisms to encourage focused recall instead of passive clicking.

For researchers, the study raises interesting methodological questions about whether some classic laboratory effects become harder to detect in crowdsourced online environments.


Important limitations

The authors describe the work as preliminary rather than definitive.

Only two experiments were conducted, and the publicly available information mainly comes from the abstract rather than the full article.

Because of that, many details — including full sample sizes, statistical analyses, and exact effect sizes — were not available in the source used here.

The studies also do not prove that retrieval practice fails online in general. Instead, they suggest that under some online conditions, the effect may become harder to detect reliably.

Future research may compare online and in-person groups directly using identical materials and stronger controls over multitasking and engagement.


Final thoughts

The study does not overturn the broader evidence supporting retrieval practice.

Instead, it highlights something more subtle:

Evidence-based learning methods still depend on evidence-based implementation.

Retrieval practice may remain highly effective when learners genuinely engage with memory. But if online study becomes shallow, distracted, or automatic, even strong cognitive strategies may lose much of their advantage.

For anyone building learning systems — from classrooms to flashcard apps — the message is clear:

The quality of attention may matter as much as the algorithm itself.


This is a plain-language summary of: “Testing the testing effect on prolific: when retrieval practice fails to boost learning”.

Source: Frontiers in Psychology (2026).