Most AI vision vendors lead with a number. "99.9% accuracy." It sounds rigorous. It is not, without context.
Accuracy of what? Detecting which defect type? Measured at what confidence threshold? Under what lighting conditions, on what substrate, at what line speed? The number alone tells you almost nothing. Understanding what sits behind it is the difference between a system that works on your line and one that works in a vendor's lab.
The metrics that actually matter in machine vision accuracy
Before interpreting any benchmark claim, manufacturers need to know what each term means.
Detection rate is the share of real defects the system finds. A 99% detection rate means one in a hundred defects passes through undetected. Whether that is acceptable depends entirely on your defect severity and downstream risk.
Precision measures how often a positive call is correct. High precision means few good parts are flagged as bad.
Recall is the inverse concern: how many actual defects were caught. Precision and recall trade off against each other. Raising the confidence threshold improves precision but reduces recall. Lowering it does the opposite.
False positive rate is the proportion of good parts incorrectly rejected. In manufacturing, this translates directly to yield loss and operator workload.
False negative rate is the proportion of defective parts that pass inspection. This is the number most vendors prefer not to emphasise.
False reject rate (FRR) is the metric manufacturers feel most directly in their operations. Every falsely rejected part costs material, operator time, and throughput. High FRR erodes trust in the system and pushes operators to override or bypass it. The impact compounds quickly. A 2% FRR on a 500-unit-per-hour line is ten parts per hour that someone has to handle, re-inspect, or write off.
OEE impact follows from all of the above. False rejects reduce availability and performance. Escaped defects create quality losses. A vision system that improves detection but drives FRR above 1% may deliver a negative net effect on Overall Equipment Effectiveness.
Why "99.9% accuracy" is a marketing number
The term "accuracy" in machine vision is often used to describe overall classification correctness across a test set. That test set is the critical variable.
If a vendor tests on a balanced dataset with equal good and defective samples, the number looks strong. Real production lines are heavily imbalanced. Defects may represent 0.1% of throughput. In that environment, a model that calls everything "good" would achieve 99.9% accuracy while detecting zero defects.
Lighting conditions, surface variation, part-to-part tolerance, and product model diversity all affect real-world performance. A benchmark run in controlled lab conditions on a narrow sample does not transfer automatically to a line running 8,000+ product variants under variable ambient lighting.
How to interpret an AI vision vendor's benchmark claim
When a vendor presents accuracy figures, these are the questions to ask.
| Question | Why it matters |
|---|---|
| What dataset was the benchmark tested on? | Lab datasets inflate performance versus production-representative data |
| How many defect classes were included? | Single-defect benchmarks miss multi-class performance gaps |
| What was the defect prevalence in the test set? | Imbalanced real-world data changes accuracy figures significantly |
| What confidence threshold was used? | Threshold selection shifts precision and recall in opposite directions |
| What was the false reject rate on good parts? | This is the metric you will feel every shift |
| What were the imaging conditions? | Lighting, resolution, and line speed all affect transferability |
| Was the benchmark run on production-representative parts or curated samples? | Curated samples are rarely representative of real variation |
| How does performance change as new product variants are introduced? | Static models degrade; ask about the update process |
| Is the benchmark independently validated or internally generated? | Internal benchmarks are not auditable |
| What is the retraining protocol when FRR rises in production? | Drift is inevitable; the response plan matters |
How HyperQ AI Vision establishes its benchmark
HyperQ AI Vision (/solutions/hyperq-ai-vision) holds a false reject rate benchmark of under 0.5% on production-representative datasets. That figure is not a lab result. It is measured against the kind of variation found on real manufacturing lines, with the part diversity and imaging conditions that production actually involves.
The benchmark is maintained through an active learning cycle. When the system encounters edge cases or marginal calls in production, those samples are flagged, reviewed, and used to retrain the model. This is what keeps performance stable as products change, tooling wears, and lighting shifts over time. Without this cycle, any model drifts. Detection rate falls. FRR climbs. The benchmark becomes historical rather than current.
Training data volume also shapes initial performance. A model trained on 1,000 images behaves differently from one trained on 10,000 images of the same defect class. HyperQ AI Vision's approach requires sufficient defect representation to build robust decision boundaries, not minimum viable samples to hit a launch timeline.
A 60-80% reduction in false positives compared to legacy rule-based systems is achievable. That reduction directly returns yield and reduces the inspection burden on operators who currently spend time re-checking good parts the system incorrectly flagged.
The right question is not "how accurate?"
The right question is: accurate at what, under what conditions, and how does that hold up six months into production?
A vendor who can answer that specifically, with production-representative data and a documented retraining protocol, is worth a serious conversation. A vendor who leads with a single percentage and resists the follow-up questions is telling you something.
You can review how false reject rate is measured and reduced in practice (/blog/false-reject-rate-in-ai-vision-what-it-is-how-to-measure-it-and-how-to-reduce-it), what the AI vision quality inspection (/pillars/ai-vision-quality-inspection) framework looks like end to end, and how model drift affects accuracy over time (/blog/what-manufacturers-need-to-know-about-ai-model-drift) if your current vendor has not addressed it.
If you want to put your vendor's benchmark claims to the test, or see what HyperQ AI Vision's numbers look like on your specific product types, talk to the team at https://apac.hypernology.net/contact. Bring their data sheet and we will go through it with you.
