Executives may be eager to scale AI across manufacturing, but without a robust data infrastructure, pilot projects will fail to take flight. Understanding how to treat data as infrastructure, rather than an afterthought, is now critical to achieving fundamental transformation.
The AI hype curve is merciless. One moment, an industrial proof-of-concept attracts board-level excitement; the next, it is abandoned due to a lack of tangible returns. This repeated failure to scale often has little to do with the sophistication of the AI models themselves. Instead, the problem lies deeper and messier, in the fragmented, poorly governed, and context-starved data environments typical of legacy manufacturing ecosystems.
For Mark Van de Wiel, Field CTO at Fivetran, the answer begins with a radical reframing: treat data as infrastructure, not as a by-product of systems, or a source of insight, or a compliance headache, but as core to everything from production schedules to predictive maintenance. Much like roads or electricity, data must be maintained, standardised, and monitored for quality.
“There has been too much emphasis on AI models and too little on the raw materials that fuel them,” Van de Wiel says. “If you want to build a machine learning project, if you want to be successful at an AI implementation, data is part of that infrastructure. Without it, it is not going to succeed.”
From fragmented data to full context
It is a truth many have learned the hard way. Organisations trial ambitious AI initiatives only to find they cannot demonstrate return on investment quickly enough. With rising costs and shifting priorities, these projects are often quietly shelved. Others fall at an earlier hurdle, relying on incomplete, outdated, or incorrect data sets.
In manufacturing, these issues are magnified by scale and complexity. Between industrial sensors, ERP systems, production lines, and supply chains, a single manufacturer might generate dozens of distinct data streams. However, stitching these together into a coherent picture, especially one that supports predictive or generative models, remains a technical and cultural challenge.
“The ability to contextualise is vital,” Van de Wiel says. “You need that holistic view to build meaningful data products. And that means consolidated data sets, consistent definitions, and clarity around ownership. You cannot build anything sustainable if your teams are pulling different numbers from different systems.”
The need for a shared vocabulary is often overlooked, but it is critical. Something as seemingly simple as defining what constitutes a customer can create confusion if different departments, systems, or applications treat the term differently. Consistency in definitions is not just an operational concern; it is fundamental to delivering reliable AI outputs across an enterprise.
The goal is not simply to store more data, but to create the conditions for alignment. That includes technical tooling, of course, but it also demands organisational maturity. Governance, quality assurance, lineage, access controls, encryption, and bias mitigation must all be built into the data stack from day one.
Agentic AI needs context, not just access
One of the more intriguing recent developments is the emergence of agentic AI. Touted by cloud giants and increasingly trialled in enterprise environments, agentic systems promise to act autonomously, navigating digital ecosystems, querying systems, and generating actions without human intervention.
But Van de Wiel remains cautious about the manufacturing implications. “Agentic AI is exciting, but it will struggle without context,” he continues. “It is the next iteration of what used to be called data federation or data virtualisation. Without a consolidated, well-governed foundation, it risks falling into the same traps, lack of reliability, inconsistent answers, and brittle performance at scale.”
While agentic AI may thrive in structured domains such as finance or HR, the physical and distributed nature of industrial data creates a distinct set of demands. Knowing when to produce a part, how to optimise uptime, or how to forecast availability all require deeper integration with supply chains, operations, and asset performance data. Autonomous agents cannot function without this structured substrate beneath them.
The modern industrial data stack
So what should manufacturers prioritise when building this substrate? According to Van de Wiel, the backbone starts with reference data, ERP systems, customer hierarchies, and supply relationships. Still, it must quickly extend to include IoT sensor streams and real-time production telemetry. From there, consolidation becomes essential.
“Streaming platforms are becoming increasingly important,” Van de Wiel says. “They allow you to pipe in data from all kinds of sources, whether from machines, control systems, or even external vendors, and bring it into your analytical and AI environments in near real time. But what matters most is that these streams feed into a shared, open, and well-understood storage format.”
Data lakes have evolved significantly to meet this need. Once a vague term for centralised data dumps, they are now underpinned by robust technologies such as Apache Iceberg and Delta Lake, enabling structured, scalable, and cost-efficient data access. Combined with unified catalogues and open table formats, they provide manufacturers with the flexibility to run multiple workloads on a single set of clean, governed data.
These advances are gradually closing the gap between traditional data warehouses and data lakes. Where warehouses excelled at structured analytics and lakes at unstructured data capture, new formats offer a convergence. In time, Van de Wiel believes the distinction may fade entirely.
Culture, governance and the need for ROI
None of this, however, guarantees success. The hardest part is often not the architecture but the alignment. Legacy culture, particularly in environments where IT, OT, and data science operate in silos, can stall even the most elegant solutions. “Executives need to lead from the top,” Van de Wiel says. “Pick a small but meaningful problem where data can make a measurable difference. Deliver that outcome, show the ROI, and use it as a case study to build trust internally. That is how you start to shift the culture.”
This cultural shift must also address the question of ownership. Data must be seen not as a technical asset but as a business-critical resource. Governance mechanisms, ranging from data lineage and master data management to regulatory compliance and algorithmic transparency, should be viewed as strategic enablers, rather than merely box-ticking exercises.
Legacy systems remain a barrier, but Van de Wiel warns against using them as an excuse. Smaller, isolated use cases may prove tactical wins, but without a broader architectural vision, they risk becoming dead ends. “You still need to think big,” Van de Wiel says. “Design for integration and scale, even if your first project is narrow. That way, you avoid building fragmented tools that cannot grow.”
A digital backbone worth investing in
When asked what infrastructure investment he would prioritise, the answer is unequivocal. “Data lakes. That is the foundation. It is what allows everything else, streaming, AI, real-time decision-making, and generative applications, to function properly.”
Many manufacturers remain in the early stages of this journey. While some have adopted modern pipelines and data-centric strategies, others continue to struggle with ageing systems, siloed teams, and limited pilot projects. But according to Van de Wiel, the shift is accelerating. “In the last year alone, we have seen huge momentum around open table formats and unified infrastructure,” he concludes. “That is where the frontier is now. The opportunity is to use this moment, when technologies are maturing and organisations are reassessing their digital strategies, to build something robust and scalable.”
Ultimately, the winners in manufacturing AI will not be those with the fanciest models or the slickest dashboards. They will be the ones who took data seriously enough to build the roads before driving the car. The future of industrial transformation depends not on abstract intelligence, but on the quiet infrastructure that makes it real.