Evaluating the impact of a reward system on student engagement

Project scope

Team: 1 product manager, 2 designers, 1 user researcher (me)
Timeline: 2 weeks comparative usability testing + 2 weeks A/B testing

Company: Sofatutor - a digital learning platform for students

Methods: Usability testing, A/B test

The challenge

Sofatutor wanted to improve student engagement, a key driver of cancellations. With the release of the new reward system, the team wanted to motivate users to keep practicing to deepen and consolidate their knowledge.

Research goals

Pre-launch: Understand how well students comprehend and interact with the new medal system.
Post-launch: Define key metrics and evaluate the system’s impact on engagement.

Impact

The research influenced the final design decisions and the rollout strategy:

Usability testing helped identify comprehension issues that led to a simplification of the design.
The A/B test ensured we could confidently measure what worked.

The final design that was implemented resulted in:

A 240% relative improvement in repeated practice of the exercise sets
A slight 11% relative improvement in session frequency

Engaging users with rewards

One of Sofatutor’s key business challenges was increasing student engagement on the site:

Cancellation surveys revealed that one of the main reasons for cancellation for parents was insufficient usage by their children.
Engagement metrics (session frequency and duration) had remained stagnant in the last years

To address this, Sofatutor introduced a medal-based reward system as a first step, where students could earn medals by repeatedly practicing exercises within a topic. This aimed to motivate students to practice more and deepen their knowledge through positive reinforcement.

The process

This was one of my first research projects at Sofatutor, and I joined at a stage where the medal system was already fleshed out, with high-fidelity designs ready to test. Given this, the research process focused on two key phases:

Phase 1: Usability Testing (2 Weeks) to evaluate usability and comprehension of the new medal system before launch.
Development phase: Based on the findings from the usability testings, refinements were made before development.
Phase 2: A/B testing (4 weeks) to measure the impact of the medal system on engagement metrics.

1. Usability testing

I started with a research briefing to align on goals, questions, and key concerns. The feature was already designed, but not yet tested, so our main goal was to identify usability issues before launch.

Management was especially concerned about whether younger students would understand how medals are earned.

There were also conflicting opinions on the final design, so we tested two variations to see which one worked better.

Sampling

To reduce the risk of misunderstanding, I focused on a balanced and targeted sample: 8 students total, across grades 1–5. Grades 1–2 were included to test edge cases, and grades 3–5 represented the core user base.

I also recruited both internal users familiar with the old system and external students who had never used Sofatutor before. This helped us evaluate both comprehension and first-time experience.

Analyzing

To make analysis more inclusive and efficient:

I translated all session notes into English so the team (who hadn’t attended and didn’t speak German) could access them
I hosted optional “clustering hours” where team members could drop in to help group insights

We used the time of the clustering hours to share observations and spot early patterns together. Even though it was optional, most of the team joined, especially the designers. The clustering helped them start thinking about potential design changes right away, instead of the need to wait for a whole report.

For me, it was a reminder that making research collaborative doesn’t have to be formal — just accessible and lightweight.

What we learned and what changed:

We kept the medal as-is because the medal system concept worked. Students understood that higher medals reflected better performance, so the core mechanic was solid.
The research sparked internal (and still ongoing) discussions about simplifying and combining the reward system in the future, because the too many reward types (coins, medals, learning points) caused confusion.
We chose Version 1 and used it as the base for refinement, because version 1 felt more motivating to students. They liked seeing all the stars displayed, that they collected.
A semi-transparent overlay was added to reduce distraction, because background visuals distracted younger students. Instead of recognizing the call-to-actions, they described ships and buildings in the UI.

2. A/B test

Before launch, I collaborated with product and design to define expectations and translate them into testable hypotheses and success metrics.

To design a valid experiment, I:

Formulated clear, falsifiable hypotheses, e.g., “Being in the B variant increases repeat completion of the same exercises.”
Calculated the required sample size per variant
Estimated the minimum experiment duration based on the average weekly unique visitors
Defined and implemented custom tracking events to measure specific behaviors

Once the experiment launched, I built a dashboard in Amplitude to monitor the predefined metrics. When we reached the required sample size, I ran a detailed analysis and had it reviewed by our data analyst, who gave me a thumbs-up (which already felt like a win, because hey, I can build dashboards now!).

We learned...

...that the new medal system significantly improved engagement, leading to higher repeat practice rates and increased session engagement.

For example, users in the medals version showed a relative increase of 240% in completing the same exercise set within 14 days compared to the original version, with nearly 90% of the students doing so in the same session.

The outcome

Following the success of this experiment, the new medal system has been fully rolled out to all users.

However, one thing was bugging me since the start of this research project: not having a full understanding why students weren’t using the platform consistently. The top cancellation reason of "insufficient usage" was too vague to act on, and I wanted to uncover the real barriers to engagement.

The medal system’s success proved the value of research-driven insights, helping me secure some capacity to investigate this further. While it’s still not a top priority, I have been given some capacity to explore engagement and motivation more deeply.

Reflections and learnings

It doesn't end with a successful experiment

Even though the A/B test was successful in improving our primary metrics, some secondary ones, like completion rate, remained the same compared to the original version. I wanted to run a follow-up ideation session to build on the experiment, but the engineering capacity had already been allocated for the rest of the term. Next time, I’ll align earlier with the product manager to plan for a potential follow-up sprint for any experiment we run.

Patience and repetition are powerful tools

While the medal system was one step toward improving engagement, a deeper question still was unanswered: why were so many users not using the product in the first place and what does "insufficient usage" even mean? Although I have flagged it before, it wasn’t seen as a priority.

Rather than pushing hard, I focused on keeping it visible by:

Continuously gathering supporting evidence to strengthen my case
Bringing it up strategically in conversations and meetings to keep it in stakeholders' minds
Letting the idea sit, so that over time, they could start to recognize its importance

After the success of the medal system, I finally secured the capacity (even if limited) to explore it further. It was a reminder that advocacy in research is often about persistence, not instant wins.