How Accurate Is Everyone Can Garden?

Most plant ID apps measure accuracy one way: did it name the plant correctly? That's a useful number. But it's not the number that matters when you're standing in your garden wondering why your tomato looks wrong.

What matters is: did the app give you the right answer for what to do next?

That's a much harder problem. And it's the one we're focused on.

What we learned from our Austin beta

We launched Everyone Can Garden in Austin, Texas — one of the most botanically interesting and ecologically complex gardening environments in the United States. Zone 8b. Hot summers. Unpredictable late frosts. Native plants that look nothing like anything in a standard botanical database.

The early beta taught us something important: the app got a lot of things wrong. Not catastrophically wrong — but wrong in ways that mattered to experienced gardeners. It misidentified native insects as pests. It gave advice calibrated for a generic U.S. garden rather than the specific conditions of Central Texas. It returned high-confidence answers on genuinely ambiguous photos.

That feedback put us on the right path. We rebuilt the diagnosis engine from the ground up — adding multiple specialized CV providers, a geo-validation layer, zone-aware advice, and a new design principle that guides everything we build now:

Say when you know it. Say when you don't.

A confident wrong answer is worse than an honest "I'm not sure." We'd rather tell you we're uncertain and ask one targeted question than confidently send you to treat a native insect that belongs in your garden.

What other apps measure

The leading plant identification apps publish impressive accuracy numbers. Here's what independent research actually shows:

App / API	What's measured	Accuracy
PlantNet	Species identification only	~90%
Plant.id	Species identification (TOP 1)	~85%
PictureThis	Disease flag accuracy	~75%
PlantNet	Disease diagnosis (useful result)	~30%

Notice the gap between species ID (~85–90%) and disease diagnosis (~30–75%). Naming the plant is the easy part. Knowing what's wrong with it — and what to do about it — is where most apps fall short.

What ECG measures

ECG doesn't just identify the plant. Every diagnosis attempts to answer four questions:

What plant is this? — Species or common name, with confidence level
What's wrong with it? — Pest, disease, environmental stress, or nothing
How serious is it? — Minor, moderate, severe, or critical
What should I do? — Specific steps, product names, time estimates

Our accuracy benchmark scores the full chain — not just question 1.

How we score

Every test case in our benchmark database has an expert-verified answer sourced from university extension publications (Texas A&M AgriLife, NC State Extension, Clemson, University of Florida IFAS). We score each ECG diagnosis against the known answer on a defined rubric:

Score	Criteria
5/5	Correct ID, correct treatment recommendation, correct severity, zone-aware context
4/5	Correct ID, minor gap in advice or missing regional context
3/5	Correct plant, wrong pest/disease — OR correct pest with wrong treatment
2/5	Wrong ID, plausible reasoning shown, no harmful advice given
1/5	Wrong ID, no useful information returned
0/5	Confident wrong answer that would cause harm if followed

Scoring is done against the expert-verified answer, not user feedback. We don't use thumbs up/down ratings as our accuracy benchmark — voluntary user feedback skews toward problem reporters and is not a reliable signal for diagnostic accuracy.

Our current benchmark results

We run monthly stress tests using real-world photos — the kinds of photos real gardeners actually take, not controlled lab conditions. Each photo has a known expert-verified answer. We score the full diagnostic chain.

March 2026 benchmark — 11 test cases:

Category	Score
Multi-organism diagnosis (aphids + beneficial predators)	4/4 — 100%
Plant disease identification	2/2 — 100%
Invasive plant recognition	1/2 — 50%
Insect identification	3/5 — 60%
Overall	43/55 — 78%

We publish these results because we think transparency builds more trust than marketing claims.

What 78% actually means

Our 78% benchmark covers the hardest version of the problem: the full diagnostic chain, using real-world photos taken by real gardeners, across multiple photo angles and lighting conditions.

The three cases that pulled our score down were all the same plant — a yucca with Yucca Plant Bug, a native Texas insect that's genuinely difficult to distinguish from scale insects in photographs. Three different photos of the same plant returned three different diagnoses. We identified this failure mode and shipped a fix the same day.

When we exclude those three cases, our score is 87% on the remaining test cases.

The standard we hold ourselves to

"Say when you know it. Say when you don't."

When ECG isn't confident in a diagnosis, it tells you. It returns a list of possible candidates with distinguishing features, and asks you one targeted question to help narrow it down.

We'd rather say "this looks like it could be flea beetles or false chinch bugs — are the bodies shiny and hard, or soft and matte?" than confidently tell you it's aphids when it isn't.

This is a deliberate design choice. Confident wrong answers cost gardeners time, money, and plants. Honest uncertainty costs nothing.

What makes ECG different

Zone-aware advice. A plant diagnosis that's accurate for Austin, Texas may be wrong for Denver or Miami. ECG incorporates your USDA hardiness zone, current weather conditions, and regional pest timing into every diagnosis.

Ecological literacy. Not everything on a plant is a problem. Yucca Plant Bugs are native Texas insects — their presence is a sign of a healthy garden ecosystem, not a pest problem. ECG distinguishes between insects that need treatment and insects that belong there.

The full chain. Species ID is the beginning, not the end. ECG identifies the plant, diagnoses the problem, calibrates severity, and gives you a specific 3-step action plan with product names and where to buy them.

Honest uncertainty. When a photo is ambiguous, we say so. When multiple candidates are plausible, we list them. When we need more information, we ask one targeted question. We don't guess confidently.

How we're getting better

Every time a user submits a correction, that signal feeds back into our prompt engineering and testing process. Every month we run a new benchmark against a growing test case database sourced from expert extension publications.

We're not at 95% yet. We're working toward it — in public, with real numbers.

The Austin beta got us to 78%. The fixes we shipped from those learnings got us to 87% on all cases except the genuinely hard edge cases. The goal is 95%+ on the full diagnostic chain — not just naming the plant, but getting the whole answer right.

That's the mission. Everyone can garden.

Benchmark last updated: March 2026. Results reflect current model performance on the described test set. Accuracy varies by plant category, photo quality, and geographic region. Sources: Plant.id accuracy from Kindwise published benchmarks; PlantNet disease diagnosis from Plant Doctor News independent testing (2025); PictureThis from independent comparison testing (2025).