Every AI vision vendor in your RFP claims 99.9% accuracy. Not one of them is lying. And not one of them is telling you the full story.
That number appears in datasheets, slide decks, and procurement justifications across the industry. It sounds rigorous. It has a decimal point. And it is, in almost every case, completely devoid of useful information.
The number is real. The context is missing.
99.9% is a detection rate. It tells you what percentage of defects the system caught -- on a specific dataset, under specific conditions, for a specific defect type. Change any of those three variables and the number changes with them.
A system trained on surface scratches on polished aluminium will not perform the same way on surface scratches on brushed stainless steel. A model validated on a controlled test set of 10,000 curated images behaves differently when it sees your actual production line at 3am in November, when your lighting is degraded and your product has minor batch-to-batch variation. 99.9% on their data and 99.9% on your data are not the same measurement.
Most vendors do not volunteer this distinction. That is worth noting before you sign anything.
The metric that actually costs you money
Detection rate gets quoted because it sounds good. The metric that actually affects your business case is the false reject rate (FRR) -- how often the system flags good product as defective.
A system with a 99.9% detection rate and a 2% false reject rate will stop your line, trigger re-inspection, create rework queues, and frustrate your operators every single shift. The detection rate looks excellent in a report. The FRR shows up in your OEE.
This is not an edge case. It is one of the most common failure modes in AI vision deployments. The vendor optimises for detection. The buyer pays for rejects.
Four metrics that matter in the real world
Before any system goes into production, get clear on all four of these:
- Detection rate. What percentage of actual defects does the system flag? Specify defect type and severity threshold.
- False positive rate. Of all the alerts the system generates, what proportion are genuine defects? High false positive rate means low operator trust.
- False reject rate (FRR). What percentage of good product gets pulled for re-inspection? This is the one that hits your throughput.
- OEE impact. Net effect on overall equipment effectiveness once false rejects, downtime from alerts, and re-inspection time are factored in. This is the number your production manager cares about.
If a vendor cannot give you all four of these -- separately, from production-representative data -- the 99.9% figure tells you very little.
What production-representative actually means
"Production-representative" is doing a lot of work in that sentence above, so it deserves clarification.
Test datasets built by AI vision vendors tend to be clean. They are curated to demonstrate the system's capabilities. Your production environment is not curated. It has variable lighting, line speed fluctuations, worn tooling, substrate variation, and a dozen other factors that introduce noise. A model validated on vendor test data has not been validated on your product.
The right question to ask is not "what is your detection rate?" It is "what is your detection rate on data that looks like what I actually make?"
HyperQ AI Vision targets an FRR below 0.5% on production-representative datasets. That figure was chosen because it corresponds to a real-world threshold -- the point at which false rejects stop being a meaningful source of line disruption. It is measured on images that reflect actual production conditions, not curated test environments.
Accuracy degrades. Good systems adapt.
Even a well-validated model will drift. Product designs change. Materials change. Equipment wears. A static model that performed at 99.9% on day one will not stay there without intervention.
This is where active learning matters. HyperQ AI Vision uses active learning to continuously refine the model on new production data. Edge cases that the system flags but cannot confidently classify get surfaced for review. That feedback tightens the model over time rather than letting it drift. It also means the system improves on your specific product -- not on a generic test set.
Three questions to ask any AI vision vendor
You do not need a long checklist. You need three direct questions before the procurement decision is made.
- What dataset was this accuracy measured on? If the answer is a vendor-controlled test set with no connection to your product, the number is a demonstration, not a guarantee.
- What is your false reject rate on production-representative data? If they do not have this number, they have not measured the thing that will actually affect your operations.
- How does the model maintain accuracy over time? If the answer is a scheduled manual retraining cycle, ask how long that cycle is and who is responsible for triggering it.
The answers will tell you more than any accuracy figure in a datasheet.
The honest position
99.9% detection rate is achievable. It is also measurable, reproducible, and genuinely useful -- when it is specified correctly. The problem is not the metric. The problem is deploying it without the context that makes it meaningful.
Hypernology publishes its performance figures against production-representative data and is direct about what those figures do and do not cover. If you want to understand how HyperQ AI Vision would perform against your specific product, defect types, and production conditions, talk to the team at https://apac.hypernology.net/contact -- the conversation starts with your data, not a datasheet.
