Daily text-message quizzes did not improve residents’ exam scores in a large U.S. study

Nearly 300 pediatric residents received one practice question per day for a year — but most barely engaged, and exam scores did not meaningfully change.

Published : 28 October 2025

Short version

Retrieval practice — actively answering questions instead of only rereading notes — is one of the most reliable findings in learning science.

Earlier small studies suggested that daily text-message reminders might help medical residents practice more consistently and perform better on exams. But a larger national study in the United States found that sending one question per day by SMS was not enough on its own.

What the researchers did

Medical residents study under constant time pressure while trying to absorb huge amounts of information. Flashcards and exam-style question banks are popular because they turn studying into active recall instead of passive review.

Still, much of the research behind these tools comes from small or highly controlled studies. Nelson and colleagues wanted to see whether a very lightweight retrieval-practice system could work in real training conditions across multiple residency programs.

They enrolled pediatric residents in a year-long national study. Every morning, participants received a text message with a single multiple-choice question written in an exam style.

Researchers then compared standardized exam scores before and after the intervention, as well as scores between residents who received the texts and a control group who did not.

What they found

The study included 293 residents. At the beginning, exam performance looked similar between groups.

The biggest problem was engagement.

Most residents answered very few questions during the year, and many never answered any at all. The daily messages simply did not become part of most participants’ routines.

As a result, the intervention did not lead to measurable improvements in exam scores. Researchers also could not find a meaningful relationship between the number of answered questions and score changes.

The authors point out that earlier observational studies often showed that residents who completed more practice questions tended to score higher on exams. Smaller reminder-based pilots had also reported positive effects.

This larger study suggests that scaling a learning intervention is harder than it looks. Sending reminders is not the same as creating real participation.

What this may mean for learning

The study does not argue against retrieval practice itself. Instead, it highlights a common problem in educational technology: good learning methods still depend on human behavior.

A flashcard app, quiz system, or spaced-repetition platform only works if people regularly come back to it. When reminders feel easy to ignore or emotionally disconnected from real goals, participation can collapse — even if the underlying strategy is scientifically strong.

For developers of study tools, the findings are especially practical. Measuring “messages sent” is not enough. What matters is whether learners actually engage, answer, and build habits over time.

The paper is also a reminder that small pilot studies can sometimes overestimate what happens in everyday use at larger scale.

Limitations

The study could not fully answer whether retrieval practice itself would have helped because most residents barely participated.

It also focused only on pediatric residents in the United States and on one type of exam environment, so results may not transfer directly to other specialties or countries.

Future research, the authors suggest, should focus less on simply delivering reminders and more on understanding what makes learners consistently return to practice.

Final thoughts

The gap between “available learning tool” and “real daily habit” is larger than many educational systems assume.

This study suggests that retrieval practice still needs motivation, structure, and meaningful engagement to work in real life. A scientifically sound strategy does not automatically survive contact with busy human schedules.