On Math Learning on WhatsApp

Designing Math Learning on WhatsApp: Why a Single-Image MCQ Is the Sanest Option

When you try to teach mathematics over WhatsApp, the problem is not a lack of features. It is a mismatch between the nature of mathematics and the structure of the medium.

Mathematics is inherently two-dimensional. Even at a basic level, meaning is carried by layout:

  • a fraction like this depends on vertical separation

    2x2+3x5x1 \frac{2x^2 + 3x - 5}{x - 1}
  • this root depends on visual enclosure

    2x8y3 \sqrt{2x - 8y^3}
  • and even something simple like

    12 \frac{1}{2}

    communicates structure more clearly than 1/2

When expressions are flattened into plain text, the burden shifts from perception to interpretation. The student must reconstruct structure mentally — an unnecessary cognitive load, and precisely where many learners struggle.


The obvious fix: render expressions as images

If plain text distorts mathematical structure, the natural solution is to render expressions as images. Send an image of the expression, ask a question in text, and let the student reply.

This helps. But it surfaces a second problem immediately.


The image-per-expression problem

Consider a multi-step question like this:

  • first evaluate A=22+34 A = 2^2 + 3 \cdot 4
  • then compute B=A5 B = A - 5
  • finally evaluate C=B1+1 C = \frac{B}{1 + 1} , find C C

Rendered faithfully, each line might arrive as a separate image. The student is now navigating a thread — scrolling back, reconstructing context, checking which value they computed in which step. The interaction pattern itself becomes an obstacle, independent of the mathematics.

And the problem compounds across a session. Some questions are single-step, others multi-step, some require recalling earlier values, others do not. That inconsistency introduces overhead that has nothing to do with the subject.


The text fallback is no better

One response is to keep everything in text, including answer options. Consider this simplification question:

4x16y3 \sqrt{4x - 16y^3}

Flattened answer options:

  • sqrt(2x - 8y^3)
  • 2sqrt(x - 4y^3)
  • sqrt(2)(x - 4y^3)
  • sqrt(2x) - sqrt(8y^3)

The issue is not correctness but readability. The student must now parse subtle structural differences through linear text — and those distinctions are precisely what the question is testing. The very thing you want them to see becomes harder to see.


Three requirements, all in tension

At this point, three things are clearly pulling against each other:

  1. Expressions must be visually accurate for students to understand them reliably.
  2. The interaction pattern must remain consistent across questions.
  3. The number of messages must stay low enough to avoid fatigue.

Improving one tends to degrade another. More images for accuracy means more messages. Fewer messages means collapsing steps, which means either text fallbacks or cramped layouts.


What holds up: one image, one reply

The structure that survives all three constraints is simple:

  • each question is delivered as a single image
  • the image contains the full problem and all answer options, properly rendered
  • the student responds with A, B, C, or D

Example 1 — Expression simplification:

<Image content>

Simplify 4x16y3 \text{Simplify } \sqrt{4x - 16y^3}

Options:

A) 2x8y3 \sqrt{2x - 8y^3}

B) 2x4y3 2\sqrt{x - 4y^3}

C) 2(x4y3) \sqrt{2}(x - 4y^3)

D) 2x8y3 \sqrt{2x} - \sqrt{8y^3}


Example 2 — Function evaluation:

<Image content>

f(x)=2x2+3x5x1,find f(3) f(x) = \frac{2x^2 + 3x - 5}{x - 1}, \quad \text{find } f(3)

Options:

A) 11

B) 10

C) 9

D) 8


Example 3 — Multi-step reasoning embedded in one frame:

<Image content>

Let A=22+34B=A5C=B1+1 \begin{align*} \text{Let } A &= 2^2 + 3 \cdot 4 \\\\ B &= A - 5 \\\\ C &= \frac{B}{1 + 1} \\\\ \end{align*}

Find CC

Options:

A) 4

B) 5

C) 6

D) 7


In each case, mathematical structure is preserved visually, and the interaction is identical regardless of the question’s complexity.


So, do we simply let go of the complexity?

An important shift happens here. Instead of distributing complexity across the interaction — multiple images, multiple exchanges, branching replies — it is concentrated inside the problem.

A single well-constructed image can still ask questions that require multi-step reasoning, identification of incorrect steps, comparison of similar expressions, or conceptual understanding of identities. The depth lives in how the question is designed, not in how the conversation is structured.


What this approach gives up

It does not support:

  • free-form algebraic input
  • step-by-step submission
  • or adaptive branching within a single problem

But attempting to include them within WhatsApp tends to introduce more friction than value, especially for learners still developing fluency. A student who is unsure of the notation, the platform, and the subject simultaneously will struggle with all three.


Final observation

If you were building a dedicated learning application, the solution would look very different: structured input fields, equation editors, step tracking, and so on.

But when the goal is to reach learners through WhatsApp - a tool they already use without friction - the design space narrows considerably. Within that space, a system that is visually faithful, interactionally consistent, and operationally scalable converges on something simple: render the full mathematical object clearly once, and ask for one unambiguous decision in response. It is not the most expressive model. It is, however, the one that remains stable under the many identified constraints.

In the end, math learning on WhatsApp is not about simulating a full learning management system. It is about finding the simplest possible interaction that still honors the mathematics well enough to be useful.