Machine Vision Maturity
AI vision finds its footing on the factory floor
Three firms at the forefront of automotive AI vision inspection reveal why most pilots fail, why optics and lighting still govern outcomes, and how quality systems may soon prevent defects rather than merely catch them.
Few promises in modern manufacturing have been made as loudly, or kept as inconsistently, as that of AI-powered visual quality inspection. The technology has been deployed, piloted, praised, and quietly abandoned across automotive plants worldwide. The gap between the demonstration room and the production floor remains, for most, stubbornly wide.
Three companies with serious credentials in the field offer a frank assessment of where AI vision genuinely works, where it does not, and what it will take for the technology to fulfil its potential. They are 36ZERO Vision, a Munich-based spin-out from the BMW Group; phil-vision, a machine vision specialist whose leadership contributed to the GenICam standard; and aku.automation, a systems integrator with more than four decades of experience in automotive inspection. Their perspectives span the full stack, from optics and lighting through to causal AI and agentic quality systems, and together they constitute an unusually complete picture of what industrial AI vision deployment actually requires.
The problem that started it all
36ZERO Vision did not begin in a research laboratory. It began inside an automotive production environment, at a BMW Group hackathon, working on a challenge that would define the company's entire subsequent direction. The problem was surface-level defect detection on components, where existing rule-based systems were either missing defects or generating excessive false positives, "to the point where operators had stopped trusting them," says Zeeshan Karamat, co-founder and CTO of 36ZERO Vision.
What distinguished the early approach was not greater algorithmic sophistication but a deliberate decision to use consumer-grade mobile devices for image capture directly on the production floor. This was not, Karamat is quick to clarify, a statement about image quality being unimportant.
It was a recognition that, with the right AI architecture, "industrial-grade accuracy could be achieved without dependence on high-end, specialised camera hardware, provided the model is designed to handle the variability inherent in real production environments."
That recognition led to two to three years of foundational research and the collection of more than 22 million images from mobile devices under unconstrained conditions - changing lighting, vibrations, and inconsistent positioning - from which the company developed its multi-stage foundation model, designed from the outset to be robust to environmental variability and fully hardware-agnostic.
Even a small percentage of false positives at production volume is sufficient to erode operator trust, and once that trust is lost, it is extremely difficult to recover
Training on a handful of images
The claim that a production-ready inspection model can be trained on as few as five to twenty annotated images is, understandably, met with scepticism by an industry accustomed to requiring thousands - if not hundreds of thousands - of labelled samples. The architecture behind it, however, follows a coherent logic.
Because the foundation model has already acquired robust visual representations through pre-training on 22 million highly variable images, the small number of samples serves to specialise it for a specific inspection task rather than to teach it visual understanding from scratch. Generative AI is used to simulate additional defective examples and synthetic variations across lighting conditions, orientations, and surface characteristics. "The result", says Karamat, is that "a productive model can be prepared within a matter of hours."
The limits of this efficiency are real and honestly acknowledged. For highly subtle defects, or cases where the boundary between acceptable and unacceptable is ambiguous even to human experts, additional data and iteration are required. "The careful and transparent definition of quality criteria remains a human responsibility," says Karamat, "and one that directly determines the effectiveness of any AI-based system."
The same point is made, with equal force, by Boris Gierszewski, CEO of aku.automation, drawing on his own firm's deployment experience in the field.
The false positive trap
No failure mode has done more damage to AI vision's credibility in manufacturing than the false positive, or what 36ZERO Vision calls "the pseudo-defect". The dynamic is well understood. A system generates a high volume of spurious alerts; operators begin to override or ignore them; the business case collapses - not through technical failure - but through the erosion of trust.
"The pseudo-defect problem is, in our experience, the single most common reason AI vision deployments fail to gain acceptance on the production floor," says Karamat. "Even a small percentage of false positives at production volume is sufficient to erode operator trust, and once that trust is lost, it is extremely difficult to recover."
The root cause, in Karamat's analysis, is architectural. Conventional systems that rely on unsupervised anomaly detection learn what normal looks like and flag deviations. In theory, this is reasonable. In a real production environment, however, normality is not static. Lighting conditions shift. Surface reflections vary. Material batches change. These natural variations are misinterpreted as potential defects, producing a sustained volume of pseudo-defects that operators learn, rationally, to disregard.
36ZERO Vision's approach uses a self-supervised, multi-stage architecture that learns to recognise specific error patterns rather than deviating from an idealised baseline, complemented by a human-in-the-loop mechanism through its Worker UI that enables operators to confirm or relabel findings and creates a continuous improvement cycle.
It is a design choice with its roots in the same production-floor realities that Gregor Philipiak, CEO and Chief Pixel Officer of phil-vision, identifies at the hardware-software interface: that environmental variability, properly managed from the physical layer upwards, is "the core challenge" against which any inspection system must be judged.
Successful deployments require a holistic system approach, where image acquisition, hardware, and AI are designed together rather than treated as separate layers
Where the laws of physics outrank the algorithm
The software industry's habit of treating image acquisition as a solved problem is, in Philipiak's view, a persistent and consequential mistake. And of course, the consequences reverberate throughout automotive production systems worldwide. "Much of the current discussion around AI vision focuses on software, models, and cloud-based training," he says. "What is often overlooked is that AI fundamentally depends on high-quality image data, and generating that data is not an AI problem."
The reasoning is direct. "It is primarily a matter of physics and classical engineering. Optics, lighting, sensor selection, synchronisation with PLCs and robot controllers, and overall system timing all determine the quality and consistency of the images." The consequences of ignoring this layer are unforgiving. Poor lighting, motion blur, reflections, or insufficient resolution degrade performance regardless of how capable the model is.
"Many AI software vendors implicitly assume that suitable images are already available," says Philipiak. "In reality, this is often the hardest part." The implication for deployment is drawn plainly: "Successful deployments require a holistic system approach, where image acquisition, hardware, and AI are designed together rather than treated as separate layers." It is a point that would find agreement across most serious practitioners in the field, even if it remains underweighted in the public conversation about AI vision.
The standard that holds it all together
Phil-vision's contribution to the GenICam standard - the protocol that enables cameras from different manufacturers to communicate with vision software in a consistent, predictable way - places the company at the foundations of modern machine vision interoperability. Its importance to AI-powered automotive inspection is difficult to overstate, and Philipiak is well placed to explain it.
Without such a standard, the machine vision ecosystem would be far more fragmented. Each camera would arrive with proprietary interfaces, integration would be slower and more complex, and scaling solutions across multiple systems or sites would be significantly harder. "For AI-powered automotive inspection, this would mean a lot of effort being spent on integration rather than innovation," says Philipiak. "Instead of focusing on improving algorithms and inspection quality, companies would have to deal with compatibility issues and custom interfaces. In practice, this would slow down adoption and limit the scalability of AI in industrial environments."
In many industrial applications it is critical to understand why a system made a specific decision. AI models can behave like black boxes, while rule-based systems provide clear cause-and-effect relationships
GenICam also addresses a concern central to automotive procurement strategy. Standardised interfaces provide supply chain security: users are not locked into a single vendor and can switch or combine components from different manufacturers without redesigning the entire system. "GenICam strikes an effective balance between interoperability and differentiation," Philipiak explains.
"It defines a common framework that ensures compatibility, while still leaving enough room for manufacturers to innovate and differentiate through performance, features, or specialisation." In an industry defined by long product lifecycles and rigorous cost management, that flexibility has material value, and it is the kind of value that is easiest to appreciate only after it has been lost.
When AI earns its place, and when it does not
Rupert Stelz, COO of phil-vision, brings an operational deployment perspective to the question of where AI genuinely outperforms classical rule-based vision, and where the older approach remains superior. The advantages of AI are clearest in complex or irregular surface structures, high variability in parts or materials, defect patterns that are difficult to define explicitly, and tasks based on visual similarity rather than precise geometry, "including some that have not been feasible before," says Stelz.
Classical rule-based vision retains its advantages where inspection criteria are clear and stable, precise measurements are required, and logic can be described deterministically. "In such cases, classical systems are often faster to implement, easier to validate, and more transparent," Stelz notes.
The question of understandability carries practical weight. "In many industrial applications it is critical to understand why a system made a specific decision. AI models can behave like black boxes, while rule-based systems provide clear cause-and-effect relationships."
The practical response, in most sophisticated deployments, is a hybrid approach that draws on the strengths of both. "In practice, these facts often lead us to hybrid approaches, combining classical machine vision and AI to leverage the strengths of both," says Stelz. Gierszewski holds precisely the same position.
"Our particular strength lies in combining rule-based algorithms with artificial intelligence," he says. "Although this hybrid approach is difficult for many to fully grasp, it delivers the most robust and reliable process results, as proven by our long-standing experience in automotive series production."
The brownfield challenge
The technical specification for an AI vision system, however comprehensive, rarely captures the physical reality of integrating into a production line that has been running for decades. Automakers of every stripe are well versed in this structural obstacle. "The key challenges in brownfield installations lie in the physical boundary conditions," says Gierszewski. "Limited access to the inspection object, existing mechanical components, and restricted installation space directly affect optics, lighting, and image acquisition."
Aku.automation has operated as both application developer and system integrator since 2007, drawing on a wide supplier network that enables custom solutions in situations where standard components will not serve. For 36ZERO Vision, the hidden integration challenges that customers most consistently underestimate centre on data infrastructure - specifically "reliably transporting images from camera to processing unit, synchronised with the production cycle," says Karamat - and the organisational change management that brownfield deployments require far beyond the technical integration itself.
Phil Vision brings deep expertise in the physical layer...addressing what they rightly identify as the gap many software-focused AI vendors underestimate
36ZERO Vision integrates with existing camera infrastructure through industry-standard protocols, including GenICam, while its trained models run on an edge client operating on-premise without cloud dependency during production. For more complex integrations, the company's partner network - including phil-vision for optics, lighting design, sensor selection, and PLC synchronisation, and aku.automation for broader systems integration - addresses the physical layer that software-focused AI vendors most often underestimate. "Phil Vision brings deep expertise in the physical layer," Karamat acknowledges, "addressing what they rightly identify as the gap many software-focused AI vendors underestimate."
Humanoid robots and the expanding inspection boundary
Camera-guided robotics and inspection technology are converging across automotive production lines. The same robot that handles a part is increasingly capable of inspecting it. Gierszewski sees this convergence as still early in its trajectory, with the most significant developments ahead rather than behind. "We are convinced that this field will evolve substantially and undergo fundamental change in the coming years," he says.
The development that aku.automation believes will drive the next step is the introduction of humanoid robots into industrial manufacturing. "With the future introduction of humanoid robots into industrial manufacturing processes, machine vision will become a key enabling technology," Gierszewski explains. "Camera-based systems will not only be relevant for traditional inline inspection tasks, but increasingly also for inspections outside fixed quality stations, especially in areas where quality is still assessed visually by humans today."
The implication is a gradual but structural expansion of inspection coverage beyond established quality gates, into areas of the production process that have historically been outside the reach of automated systems. "Camera-controlled robotics and inspection technology are thus converging functionally and expanding the possibilities of quality assurance beyond previous line and process boundaries," says Gierszewski.
The human factor
As AMS has reported on repeatedly, automation does not eliminate human judgement. It repositions it. The transition from a quality engineer who makes calls to one who reviews AI-flagged results requires careful management of both technical foundations and organisational culture. Gierszewski's experience has produced a clear view on the prerequisite for making this transition succeed.
"The key factor for the successful implementation of AI-based inspection systems is the careful definition of quality parameters already in the early project phase," he says. "Only when thresholds and the criteria for good and bad parts are clearly, transparently, and systematically defined by humans will the system's subsequent decisions be transparent and plausible to all stakeholders."
The consequence of getting this right extends well beyond the launch phase. "If the foundations are established diligently, future conflicts of interest between humans and the system can be avoided," says Gierszewski. When the foundations are not established, when quality parameters are loosely defined or inadequately communicated across shifts and roles, disagreements between operators and the system become not merely likely but structurally unresolvable.
36ZERO Vision addresses the same challenge through its Worker UI, keeping operators meaningfully involved in a continuous feedback loop that improves the model over time. The objective, as Karamat frames it, is not to remove human expertise from the quality process but to keep it engaged in a way that adds to, rather than duplicates, what the AI is already doing.
The 77% that never make it
Around 77% of AI vision implementations in manufacturing never advance beyond the pilot phase. The number is widely cited, rarely examined, and worth examining carefully. Karamat's diagnosis is direct. "The high failure rate of AI vision pilots is, in our assessment, not primarily a technology problem. It is a complexity and ownership problem."
Pilots typically succeed because they operate under controlled conditions with dedicated specialist attention. The step to production, across shifts, plants, varying operators, and legacy infrastructure, introduces a different order of challenge. The relevant questions shift from whether the model works to "who retrains it when a new variant is introduced," "how does it integrate with the existing PLC environment," and "who is accountable when the system and an operator disagree," says Karamat.
Technically, 36ZERO Vision addresses the most significant barrier - integration complexity - through a two-step architecture in which model training and validation take place in the cloud while prediction runs on the edge. This removes the requirement for dedicated hardware investment in model development.
Organisationally, two factors are consistently underestimated. First, a clear ownership structure for the AI system within the plant, with someone accountable for its performance, maintenance, and evolution. Second, realistic expectations about what production readiness entails. "It does not mean flawless performance from day one," says Karamat. "It means the system improves over time and the organisation has established processes to support that improvement."
[Long-term success] depends on building systems that can be adapted in a controlled and sustainable way over a multi-year lifecycle if necessary
Stelz makes a related point about the long-term maintenance burden. Automotive production lines operate for many years, but the conditions within them change continuously. Camera calibration drifts. Lighting degrades. New materials and variants are introduced. "It is essential to treat maintenance not as an exception, but as a core part of the system lifecycle," he says, noting that this ongoing burden is "often underestimated in the total cost of ownership, both for AI and traditional machine vision systems." Long-term success, in Stelz's assessment, "depends on building systems that can be adapted in a controlled and sustainable way over a multi-year lifecycle if necessary."
The EV imperative
Battery manufacturing has altered the stakes of a missed defect in a way that no other development in recent automotive history has matched. A quality failure in a conventional powertrain component typically generates a warranty claim. In a battery cell or module, it can escalate to a safety event. For AI vision providers, this changes the consequences of error in a categorical rather than merely a quantitative way.
36ZERO Vision's platform supports a wide range of imaging technologies, including line scan, 3D, thermal, and standard area scan cameras - flexibility that is particularly relevant in EV production, where manufacturing lines are comparatively new and quality criteria are still being characterised. "Defect patterns are still being characterised, quality criteria are evolving, and the ability to train on minimal data and iterate rapidly means the inspection system can evolve alongside the production process," says Karamat.
The convergence of stricter safety requirements, emerging manufacturing processes, and the need for flexible and scalable inspection is, in Karamat's assessment, driving demand across the EV value chain and across adjacent industries facing similar structural challenges.
Beyond detection
The most consequential question in AI vision is not whether systems can detect defects. Competently built systems can. The question is whether they can understand why defects occur, trace them to upstream process variables, and initiate corrections before the next component leaves the station.
The term "agentic," which features in Karamat's own description of what 36ZERO Vision is building, points directly at this ambition. "Today, most AI vision in manufacturing is reactive," he says. "A defect is identified, flagged, and a human investigates. The defect has already occurred, and the underlying cause remains unaddressed until manual analysis is completed."
Imagine a quality system that detects a rising defect rate on a Monday morning shift, traces it to a temperature drift in an upstream welding process, and recommends a parameter adjustment before the quality engineer has finished their first coffee. That is the system we are building.
The system under development connects inspection results to upstream process data - including sensor readings, machine parameters, and worker input - and reasons about causation rather than merely appearance.
Karamat describes the objective with unusual clarity. "Imagine a quality system that detects a rising defect rate on a Monday morning shift, traces it to a temperature drift in an upstream welding process, and recommends a parameter adjustment before the quality engineer has finished their first coffee. That is the system we are building." Initial benchmark results are, by the company's account, strong compared to both frontier models and causal reasoning approaches. The system is currently in pilot stage with customers. Full production deployment is planned within the coming year.
Much of the current discussion around AI vision focuses on software, models, and cloud-based training. What is often overlooked is that AI fundamentally depends on high-quality image data, and generating that data is not an AI problem. It is primarily a matter of physics and classical engineering. Optics, lighting, sensor selection, synchronisation with PLCs and robot controllers, and overall system timing all determine the quality and consistency of the images
"We believe this will fundamentally change how manufacturing thinks about quality," says Karamat, "from catching defects to preventing them. The objective is not to replace human expertise, but to give every quality engineer the analytical depth that today only the best-instrumented factories in the world can offer."
That ambition is not, on the evidence presented here, merely promotional. It reflects a coherent trajectory from a company that began with a production-floor problem, stayed close enough to real manufacturing to understand what deployment actually requires, and built its technology around the constraints that defeat most of its competitors before the first shift is done.