Developmental Science

The Research Behind the App

What the science actually says — and why it changed how we built this.

Your toddler understands more than she can show you. Long before she can name a color, point to a letter, or answer a question about her day, she's already built the categories, noticed the patterns, and started sorting the world — she just doesn't have the words or the motor skill to prove it yet. That gap between what a child knows and what she can perform is the starting point for this app, and for the research below. It also explains why the screen question is more interesting than "good" or "bad": young children learn less from passive, one-way video than from live interaction, but that gap narrows sharply under specific, well-documented conditions. Here is the research that shaped how we think about both.

Discovery 1

What she already knows

Comprehension arrives first. The ability to say it, point to it, or perform it on request comes later — sometimes by months.

Long before your child can tell you what she knows, she's already showing you — if you know where to look. Researchers have found that infants recognize the meanings of many common nouns — foods, body parts, everyday objects — by around nine months, and that six-month-olds already understand words like "nose" and "tummy," long before they can say any of those words themselves. Comprehension isn't a smaller or less important version of production — it's usually the first version, arriving first and staying ahead of what a child can perform for months at a stretch.

Researchers measure this comprehension gap through methods like eye-tracking and preferential looking, which can detect understanding before a child has the motor or verbal ability to demonstrate it in more traditional ways.

Evidence

Bergelson & Swingley (2012) — PNAS Tincoff & Jusczyk (2012) — Infancy

Discovery 2

It's not the object. It's what it asks of her.

What predicts whether something teaches a child isn't what it's made of — a screen, a book, a toy — it's whether it asks her mind to do something: compare, guess, retrieve a word, notice what's the same and what's different.

It's tempting to rank learning tools by category — books are good, screens are risky, "educational" toys are safer than "electronic" ones. But looking closely at why different objects succeed or fail reveals a more useful pattern: an object that hands a child a label without asking her to do anything with it teaches less than one that asks her to notice, compare, or retrieve something herself — regardless of what the object is made of. A flashcard showing one red circle, with nothing to compare it to, is a weak way to teach "red." A toy with so many buttons, sounds, and modes that a child can't tell what caused what is a weak way to teach anything, no matter how many features it has. A parent narrating a picture book, and a well-designed screen experience that asks a child to find, compare, or respond — despite looking nothing alike — can be asking her mind to do the same kind of work.

The evidence for this is more specific than "screens are risky." One study of parent-infant dyads found that electronic toys were associated with less parent talk and fewer conversational turns than traditional toys or books — a difference in the language environment around the toy, not a direct measure of what the child learned. Another found that toddlers learned new words from a touchscreen only when the screen responded specifically to their own touch — not from the same content played passively. And in at least one study, toddlers learned more from simply watching a screen than from an app that required them to tap to make a selection — a reminder that "interactive" isn't automatically better if tapping isn't the part of the task that matters.

"The best question to ask about any toy, book, or app isn't 'what is this made of?' It's 'what is my child's mind doing right now?'" This reframes "screen time" debates around task demand and contingency rather than medium category — consistent with a growing developmental-media literature that treats content, context, and interactivity as more predictive than device type alone.

Webb et al. (2024) — JAMA Network Open

Not all screen use is equal — and some of it actively competes with learning

A 2024 study of 62 toddlers found that greater home media use, and commercial tablet game play specifically, was associated with reduced responsiveness to joint-attention prompts — children were measurably less likely to follow a parent's point or gaze. The correlation was meaningful (ρ = −0.47, p < .001), with the strongest effects linked specifically to commercial gaming apps.

Joint attention — the shared focus between parent and child on the same object or event — is one of the strongest predictors of vocabulary development. An app that competes with it is working against the parent, not with them.

It's one study, and correlational — it doesn't prove commercial games cause the drop in responsiveness. But it's consistent with a simple design choice: this app has no engagement algorithms, no streaks, no notifications, no rewards designed to maximize time on screen. The goal is to deliver clean signal and get out of the way. Every minute your child spends in this app should make the next minute with you more productive — not less.

Additional evidence

Sosa (2016) — JAMA Pediatrics Kirkorian, Choi & Pempek (2016) — Child Development Ackermann, Lo, Mani & Mayor (2020) — PLOS ONE

Discovery 3

Screens, contingency, and calibration

Screens are one place this shows up especially clearly — because for a long time, researchers found that young children learned far less from a screen than from the same information delivered live. That gap is real. It's also more specific, and more fixable, than "screens are bad" suggests.

Troseth & DeLoache (1998) — Child Development

Toddlers don't automatically learn from screens

This is the paper that started it all. Researchers showed 24-month-olds a video of someone hiding a toy in the next room, then asked them to go find it. The toddlers couldn't do it — even though they had just watched exactly where it went. But when children watched through a window showing the identical scene, they found the toy almost every time.

Same information. Completely different result.

What this tells us: toddlers don't automatically treat what they see on a screen as real and reliable. The connection between flat surfaces and three-dimensional reality has to be built. Developmental psychologist Judy DeLoache called this dual representation — the child must understand a symbol as two things simultaneously: an object in itself, and a representation of something else. A screen is both a glowing rectangle and a window onto a real place. Very young children struggle with this dual nature, and a passive television gave them no particular reason to work through it. That's what the calibration stages of this app are designed to do.

Strouse & Samson (2021) — Child Development

The gap is real — and measurable

A 2021 meta-analysis across 122 effect sizes found an average video deficit of roughly half a standard deviation — the difference between a child learning from a live interaction and the same child learning from an equivalent screen presentation. That's not a small effect. It held across word learning, imitation, and object retrieval, though it decreased with age and was largest for object-retrieval tasks. It's the baseline this app is designed to work against.

Troseth, Saylor & Archer (2006) — Child Development

But the right kind of screen interaction can sharply reduce that gap

Researchers gave two-year-olds just five minutes of live, back-and-forth video interaction with an adult who responded specifically to them — used their name, reacted to what they did, had a real conversation. After that, the same children who previously couldn't use screen information suddenly could.

The key wasn't just seeing a screen. It was experiencing a screen that genuinely responded to them.

This finding points to contingency as the mechanism. Across developmental research, infants preferentially attend to and learn from people and environments that respond to their actions. This isn't a screen-specific phenomenon — it's a fundamental feature of how early learning works. The difference between live interaction, video chat, and passive video may reflect, above all else, how much contingency each provides. This is why the early stages of this app use the front-facing camera rather than videos or images. A screen that responds to your child's specific movements is fundamentally different from a screen that plays content at them — though at this age, the research suggests that response usually still works best with a caregiver present, not as a stand-alone substitute for one.

Troseth (2003) — Developmental Psychology

Contingent self-video can improve screen transfer

Researchers gave toddlers two weeks of daily exposure to live closed-circuit footage of themselves — footage that responded when they moved, that showed them their own face in real time. Those children later showed meaningfully better transfer from screen to a real-world retrieval task than controls who hadn't had that experience. (The study tested object retrieval specifically, not word learning.)

Two weeks. Not years. The deficit isn't fixed.

The evidence suggests it's a preparation problem, and that it's at least partly solvable. Most of the foundational video-deficit studies were conducted before smartphones, front-facing cameras, and ubiquitous video chat. Those findings remain important — but they were generated using technologies that provided far less contingency than the screens children routinely encounter today. The intervention Troseth ran in a research lab in 2003 now requires an app and a selfie camera.

Miyazaki & Hiraki (2006) — Child Development

Timing matters more than you'd think

Researchers introduced a delay between a toddler's movement and their reflection on screen. Even lags of one to two seconds affected children's ability to recognize the reflection as themselves — though the effect varied by age and task. Children who had been walked through progressively longer delays, however, built tolerance for the gap.

This finding shaped one of the most important technical decisions in building this app: the delay sequence in calibration is incremental because the research suggests incremental works. We also engineered the live mirror response to under 300 milliseconds — faster than a blink — because even small delays can disrupt the feedback loop the child needs to learn from.

Bahrick & Watson (1985) — Developmental Psychology

Infants show early sensitivity to contingency

Early evidence suggests infants can detect the difference between a video that responds to their movements and one that doesn't, with clearer discrimination by around five months. They aren't recognizing themselves yet — but they already notice when something moves because they moved.

That sensitivity is what the app's first stage taps into. Your child doesn't need to understand what a screen is for the early stages to work. They just need to notice that something responds when they do — and the evidence suggests they develop that sensitivity early.

Rochat & Morgan (1995) — Developmental Psychology

Self-awareness starts with sensing your own movement

Infants distinguish between a camera feed that responds to their movements and one that doesn't — and this sensitivity is one of the earliest forms of self-awareness researchers have been able to detect. Mirror self-recognition emerges reliably between 18 and 24 months (Lewis & Brooks-Gunn, 1979, Social Cognition and the Acquisition of Self, Plenum Press) — later than this early contingency-detection ability, which is one reason researchers suspect the two are related. No study has directly tested whether early contingency detection causes later mirror self-recognition, or whether experience with one accelerates the other — but it's a plausible connection, and one reason we treat contingent, responsive screen experiences as a meaningful category of their own, not just "screen time."

DeLoache (2000) — Child Development

The path from live video to photos has to be gradual

A photograph of a ball is not the same as a ball — and for a young toddler, that distinction is genuinely hard to bridge. DeLoache's dual representation framework explains why: the child must hold both ideas at once — "this is a flat image AND it represents something real" — and that's a genuinely difficult mental move that takes time to develop. Moving too fast from video to still images skips a step the brain actually needs.

This is why the app moves gradually from live camera through recorded video to still images. Each step removes one more layer of contingency and familiarity. The sequence follows the path the research suggests children's brains actually take.

Discovery 4

Why variety beats repetition

Learning "red" isn't really about red. It's about knowing red isn't blue, isn't yellow, isn't green. A single red ball teaches a label. Red next to other colors teaches a category.

A child doesn't learn what something is just by seeing it once — she learns it by encountering it more than once, in different forms, with everything irrelevant changing and the thing that matters staying the same. Researchers have found that children don't learn color words one at a time; they learn them as a system of contrasts, where "red" only becomes a stable category once it's been seen next to blue, yellow, and green. The same logic extends further: a single example teaches a label. Multiple examples, varying in every way except the one that matters, teach a category — the kind of knowledge that transfers to something the child has never seen before.

Sandhofer & Smith (1999) — Developmental Psychology

Learning a category means learning what it isn't, not just what it is

Researchers at Indiana University found that children don't learn color words one at a time — they learn them as a system of mappings. A child who only ever hears "red" doesn't really know what red means, because they have nothing to contrast it against. It's only when they encounter red next to blue, or yellow next to green, that the boundaries between categories become clear.

This principle guided our design across every domain. Presenting items in isolation teaches labels. Presenting items in contrast builds category boundaries. When introducing a new concept — whether a color, a shape, an animal, or a letter — the app pairs it with a clearly different contrast, so the child's brain can see the boundary, not just the label. The direct evidence is strongest for colors; we apply the same logic as a design principle across other domains.

Knowing what something is requires knowing what it isn't.

Twomey, Ranson & Horst (2014) — Infant and Child Development

Multiple examples build categories that transfer

A single example teaches a label. Multiple examples — varying in every way except the one that matters — teach something that transfers to a thing the child has never seen before. In one study, toddlers around two and a half years old who were taught a new word using several different-looking examples of the same category retained the word days later; toddlers taught with a single repeated example did not.

Discovery 5

Sometimes, comparison is the lesson

Some things aren't learned by seeing more examples — they're learned by directly comparing two things side by side and noticing what's the same and what's different.

Not every stumble means a skill is missing. Sometimes a child already has everything she needs to solve a problem, and what's actually missing is the specific chance to compare — to hold two things up next to each other and ask "how are these alike?" Studies of category learning have found that when children are asked to find "another one like this" with no further hint, they often reach for something the object is used with or found near — not something that shares its deeper category. Give the same task a small nudge toward comparison, and children as young as one or two can respond quite differently. The lesson, in both cases, isn't a different fact — it's a different chance to compare.

Evidence

Markman & Hutchinson (1984) — Cognitive Psychology

Discovery 6

Performance is not always understanding

A child who can recite the alphabet looks like she's learning to read. A child who can recognize one letter out of context is quietly doing something much harder — and it matters more.

Some of the most impressive-looking performances — reciting a memorized sequence, singing a song start to finish — turn out to be a weaker signal of understanding than something much quieter: recognizing one single piece of it, out of order, without the rest of the sequence to lean on. Letters are a clear example, and reading aloud is another.

Dehaene-Lambertz, Monzalvo & Dehaene (2018) — PLOS Biology

Letters aren't special — they're shapes the brain learns to recognize

The brain region that eventually specializes in reading doesn't start out specialized for letters at all — it's a general-purpose visual region that gets repurposed as a child learns to read (the study followed a small group of six-year-olds through their first year of reading instruction). That finding doesn't by itself tell you how to teach letters — but it's consistent with our design choice to introduce letters as visual objects, distinct shapes to recognize, before connecting them to sounds or names.

Massaro (2015) — Journal of Literacy Research

What actually makes reading aloud so powerful — and it's not the pictures

UC Santa Cruz psychologist Dominic Massaro analyzed the vocabulary in picture books and compared it to what parents say to their children in everyday conversation. Picture books contain two to three times as many rare and unusual words as normal parent-child conversation — and even more than most adult-to-adult conversation.

What drives the benefit of reading aloud appears to be the language the parent uses while reading — the words on the page, spoken aloud, that a parent would rarely use otherwise. The pictures are the occasion. The parent's voice does much of the work.

This finding shapes how the app thinks about parent involvement. The parent's voice is the most powerful learning tool in the room. The app is designed to amplify that voice — to give it more to work with, at the right moment, in the right domain — not to replace it.

Want to go deeper?

This page covers the individual findings. The book goes further — into how these pieces fit together, what we still don't know, and where the evidence runs out and a design choice begins. Read the Substack for the ongoing version of that thinking, or start with the book itself.

Read the Substack Learn about the book