Edge AI vs cloud AI for manufacturing quality inspection: how APAC factories should decide
Sixty milliseconds. That is the inspection window for a one-inch part on a conveyor moving at 80 feet per minute, which is a typical line speed across automotive parts and electronics assembly in APAC. Two hundred to five hundred milliseconds is the round-trip time to a cloud inference endpoint at typical industrial connectivity in Penang, Hanoi, or Jakarta. The part has moved past the reject mechanism before the cloud finishes thinking about whether to flag it. That is one of three independent reasons cloud inference cannot run a production inspection line.
The other two are connectivity reliability — industrial zones in APAC are not 99.99 percent uptime environments, and a line that depends on cloud round-trip stops the moment the connectivity drops — and data sovereignty, where Vietnam's Cybersecurity Law, Indonesia's Personal Data Protection Act of 2024, and the equivalent statutes in adjacent jurisdictions prohibit production data from crossing borders to a cloud inference endpoint hosted outside the country. Each of these three constraints disqualifies cloud independently. A facility that solves latency with a CDN-edge inference endpoint still fails when the WAN drops. A facility that solves reliability with redundant connectivity still fails the data-sovereignty requirement. The combined argument is that cloud does not have a path to running production inspection at all on most APAC lines.
The architecture decision is rarely framed this way in vendor sales conversations. The default frame is that edge is faster, cloud is cheaper, and the buyer chooses based on requirements. That frame produces the wrong answer because it treats three independent disqualifiers as a single performance tier. This post is the APAC-specific version of the architecture decision, with the data-sovereignty layer added to the foundational technical analysis on edge inference for manufacturing AI we published earlier.
Most factories inherit the architecture from the vendor
The procurement reality on most AI inspection deployments in APAC is that the buyer does not choose edge or cloud. The buyer chooses a vendor. The vendor has an architecture preference baked into their product, and the buyer inherits the consequences.
Vendors prefer cloud inference for three reasons that are about the vendor's economics rather than the customer's. Cloud is cheaper to operate per inference because the GPU capacity is amortised across customers. Cloud makes vendor-side model updates trivial because the model lives in one place. Cloud lets the vendor monitor model performance across the customer base, which is valuable to the vendor's product roadmap. The customer pays for these advantages in latency, in dependency risk, and in data exposure. The trade is real, and the customer is rarely shown the trade-off explicitly during procurement.
The consequence is a population of deployed inspection lines whose architecture matches the vendor's economics rather than the customer's operational requirements. The customer's inspection latency runs into the InspectWindow timer on the PLC. The customer's line stops when the WAN drops. The customer's images cross a border that the regulator has now made non-crossable. None of these are vendor problems. All of them are customer problems.
The honest evaluation question is not "which architecture is faster" but "whose architecture is this, and whose economics is it built around." When the answer is "the vendor's," the customer should expect the consequences to land on the customer.
The three constraints that operate independently
The first constraint is latency, which is a physics problem. A one-inch part on a 65-foot-per-minute line has a 76-millisecond inspection window. A typical APAC automotive parts line at 80 feet per minute drops that to 60 milliseconds. Cloud inference adds the round-trip from the plant to the inference endpoint and back. The constraint is the speed of light and the routing distance, neither of which negotiates with bandwidth.
The mechanism that converts latency into scrap is the InspectWindow timer in the PLC. The standard practitioner pattern wires a vision system into PLC control by treating every part as a reject until the inspection result confirms accept, within a fixed timer window. The PLC starts the timer at the rising edge of the camera trigger. If the result arrives within the window, the part is accepted or rejected based on the result. If the result arrives after the window, the part is rejected because the line has to make a decision before the part reaches the rejector. Cloud inference at 200 milliseconds is not "a little slower than edge." It is structurally outside the InspectWindow on every line where the window is set against typical conveyor speeds. The cloud system does not just miss defects. It rejects good parts because the result arrives too late to influence the gate.
The second constraint is connectivity reliability. An APAC industrial zone in Vietnam, Indonesia, the Philippines, or rural Malaysia does not run on a 99.99 percent uptime network. A practitioner running a plant on a cloud-dependent MES described the failure mode plainly on a public industrial controls forum: transaction times went from one second to three minutes when the connectivity dropped, and the plant ground to a halt. The same mechanism applies to inference. A line that depends on a cloud round-trip becomes a single point of failure on the WAN. The first internet outage stops production. The first packet-loss spike pushes inference latency outside the InspectWindow and the line starts rejecting good parts. A ten-minute outage at one thousand parts per minute is ten thousand uninspected parts that either auto-reject as scrap or pass without inspection — the failure mode is worse than downtime, because it shows up as silent loss in the line's quality numbers rather than as an alert anyone is paying attention to.
The third constraint is data sovereignty, which is a compliance problem the regulator has codified. Vietnam's Cybersecurity Law requires personal data and other regulated categories to be stored within the country and prohibits transfer to certain offshore endpoints without explicit approval. Indonesia's Personal Data Protection Act of 2024 imposes similar localisation requirements with material penalties for non-compliance. Malaysia's PDPA and the broader APAC direction of travel are aligned. Production images from a manufacturing line are not always personal data, but they are usually customer data under the OEM contract, and they often contain visible workforce identification that does qualify. Cloud inference hosted outside the country is at minimum a contractual exposure and at maximum a legal violation. The constraint is not a vendor preference. It is excluded by the regulator before the technical evaluation begins.
The deployment evidence
The Auto Parts customer (Client A) operates 8,000 product variants across six lines at 11,520 units per day per line on edge-resident inference. No cloud round-trip in the inspection path. The data sovereignty position is on-premise by default. The latency is sub-millisecond at the local PLC interface. The connectivity reliability of the inspection layer is independent of the WAN — the line continues running through internet outages without any change in inspection behaviour.
The Semiconductor Parts customer (Client B) is the cleanest example of the deployment-speed argument. The customer is a Korea plant of a Japanese semiconductor parts vendor. The hardware-locked vision incumbents had walked away — the irregular defect signatures and the small product geometry were outside what their systems could resolve, and the alternative AI vendors in the bid had proposed an expensive 3D vision rebuild as the answer. HyperQ AI Vision delivered the inspection on a 2D vision setup at roughly a third of the proposed 3D capital cost, with on-site setup completed in two days. The architecture was edge-resident from day one because the IP-protection requirement common to semiconductor customer contracts excluded cloud inference before any technical evaluation.
These deployments did not choose edge over cloud as a performance preference. The architecture was the only one compatible with the line speed, the contract terms, and the operational reliability the customers required.
What the cloud-fits use cases actually are
The cloud-versus-edge framing risks dismissing cloud entirely, which is the wrong conclusion. Cloud has legitimate manufacturing use cases. Fleet-wide analytics across multiple plants. Cross-site benchmarking. Long-term model improvement. Centralised dashboards for executive oversight. Customer-facing reports and audit-trail archival. Each of these is a workload where the latency, reliability, and sovereignty constraints relax — the work is asynchronous, the failure mode of a delayed report is acceptable, and the data sovereignty position can be managed at the analytics level by aggregating before the data leaves the country.
The honest answer is that the inspection-and-reject loop runs on edge and the analytics layer can run on cloud. The two are different workloads. Cloud sync of aggregated, anonymised, or already-localised data is a downstream activity that does not affect the line's inspection behaviour. Cloud inference on the production-decision path is the disqualified architecture.
This split is what we covered in detail in the foundational edge inference post for manufacturing AI. The APAC overlay is the data-sovereignty layer, which makes the split sharper than in jurisdictions with looser localisation requirements.
What the buyer-side filter looks like
The buyer-side discipline that produces a defensible architecture decision is the same one we covered in the buyer's guide for evaluating AI vision systems for manufacturing operations, with three additions specific to the cloud-versus-edge decision.
Ask the vendor where the inference runs, in physical terms. Country, data centre, network path. If the answer is vague, the architecture is cloud and the data-sovereignty exposure has not been thought through. If the answer is the customer's edge device, ask which device, what the sustained-state thermal envelope is, and what the inference latency looks like at the ninety-ninth percentile under sustained eight-hour continuous load.
Ask the vendor what happens when the connectivity drops. The right answer is "nothing, the inspection layer continues running on edge." The wrong answer is "the system has retry logic that handles transient outages." Retry logic is the sign of a cloud-first architecture with offline mode bolted on, which fails the moment the outage runs longer than the retry window. We covered this distinction in the foundational edge post.
Ask the vendor what audit trail the customer holds, where it is stored, and who has access to it. The audit trail is the customer's compliance position. If it lives in the vendor's cloud and the customer reads it through a vendor dashboard, the customer's data sovereignty is the vendor's problem, which means it is not solved.
The cost of an architecture that fails one of the three constraints is hidden during procurement and visible during operation, which is exactly the cost-line pattern we covered in the post on the maintenance and retraining costs that determine whether AI vision holds its accuracy in production. The hidden cost on the cloud-architecture side is the ten thousand parts the line rejects during the next outage, the regulatory exposure on the next data-sovereignty audit, and the Inspector who walks in during a thirty-minute connectivity gap and finds the system not running.
What you can verify before any commitment
Send a representative sample set: a few hundred labelled images per defect class, captured under the actual lighting and camera conditions the line will run. State the line speed, the InspectWindow timer setting, the connectivity profile of the site, and the data-sovereignty position the customer contracts require. Within two weeks, we run the inference on the edge hardware that would deploy at your line speed and return four artefacts. Inference latency at the ninety-ninth percentile under sustained eight-hour load. Confusion matrix per defect class on your data. A connectivity-drop simulation showing how the inspection behaviour changes when the WAN goes offline. A written assessment of the data-sovereignty position the deployment achieves against the specific jurisdictions the customer operates in.
Deployment timeline is four to eight weeks from contract signing to live operation, with two days on-site for installation and PLC integration. Hardware footprint runs 30 to 50 percent lower than hardware-locked vision ecosystems, and the inference architecture is air-gappable by default — cloud sync is optional at the analytics layer, never required in the inspection-and-reject loop.
Edge AI or cloud AI is not really the question once the three constraints are made explicit. The question is which decisions in your operations require real-time certainty, and which can wait for batch analysis. Inspection-and-reject requires real-time certainty in APAC manufacturing for reasons of physics, network reliability, and law. Once that boundary is drawn, the architecture follows.
