A reinforcement-learning knowledge tracer beat standard baselines on ASSISTments

A 2026 Sensors paper reports that DRAKT, which combines reinforcement learning with a forgetting curve, predicted student performance better than common baselines.

Published : 17 March 2026

In one sentence

Researchers introduced DRAKT, a knowledge-tracing model that mixes reinforcement learning with a forgetting-curve idea, and it predicted future student answers better than several standard baselines on ASSISTments datasets.

What the researchers did

Knowledge tracing tries to answer a practical question: after a student solves a sequence of problems, what is the chance they will answer the next one correctly? Good estimates can help tutoring systems choose better practice and spot when a learner may be forgetting.

In this paper, the authors proposed DRAKT, short for a deep reinforcement-learning-based adaptive knowledge tracing model. Their design combines two ideas. First, it tracks a student's changing knowledge state across exercises. Second, it explicitly models memory decay with a forgetting curve, so time gaps between attempts are not treated as irrelevant noise. The reinforcement learning part is meant to help the system adaptively update how it represents a learner's state as new interactions arrive.

The model was evaluated on two widely used public datasets from ASSISTments, a common benchmark in educational data mining. The authors compared DRAKT with several existing approaches, including more traditional and more neural-network-heavy knowledge-tracing baselines. They reported standard prediction metrics such as AUC and accuracy to judge how well each model estimated whether a student would get the next item right.

The paper's aim was not to test a classroom intervention directly. Instead, it focused on whether a better learner model could support future adaptive systems by making stronger predictions from historical practice data.

What they found

Across the ASSISTments datasets, DRAKT outperformed the baselines the authors tested. In plain language, the model was better at distinguishing between future correct and incorrect responses. That suggests the combination of adaptive updating and explicit forgetting information added useful signal beyond what the comparison systems captured.

The paper argues that this matters because students do not just accumulate knowledge in a straight line. They also forget, recover, and respond differently depending on when they last practiced related material. By incorporating a forgetting-curve component, DRAKT appears to handle those time-sensitive patterns more effectively than models that lean mostly on response sequences alone.

The reported gains were not framed as magical leaps. They were incremental but consistent enough across the benchmark datasets to support the authors' main claim: modeling forgetting and adaptation together can improve knowledge tracing quality.

For people building educational software, that is the main result. A system that estimates knowledge more accurately has a better chance of choosing the right next question, spacing reviews more intelligently, and avoiding both overpractice and premature difficulty jumps.

What this means for learners and educators

This study does not prove that students learn more just because a model's AUC is higher. But it does suggest a path toward tutoring tools that better respect how memory actually changes over time.

For educators, the interesting takeaway is conceptual: student performance is partly a timing problem, not just a content problem. If a platform knows that two students both answered a topic correctly, but one has not seen it for much longer, their true readiness may differ. Systems like DRAKT try to account for that.

For learners, the study indirectly supports the common intuition behind spaced review. If forgetting dynamics help predict performance, then review schedules should probably pay attention to when knowledge was last used, not just how many items were completed.

For product teams, the paper points toward adaptive practice engines that combine knowledge tracing, scheduling, and reinforcement learning rather than treating them as separate modules.

Limitations and what we don't know yet

The biggest limitation is that this is a modeling benchmark paper, not a direct learning-outcomes trial. Better prediction on historical datasets does not automatically mean better real-world teaching decisions or stronger long-term retention.

The evaluation also depends on the structure and biases of ASSISTments data. A model that performs well there may not generalize equally well to other subjects, age groups, or platforms with different item types and learner behavior.

The paper summary provided here does not establish whether DRAKT is easier to deploy, more interpretable, or computationally efficient enough for production use. Those practical questions matter if schools or edtech teams want to adopt such systems at scale.

Finally, reinforcement learning in education often sounds more powerful than it is in practice. The next important step is not just better offline prediction, but prospective studies asking whether a DRAKT-style system helps students learn faster, retain more, or receive fairer and more useful recommendations over time.