From Oil to Intelligence: Compute, Talent and Data Quality as the Real Bottlenecks

# From Oil to Intelligence: Why Compute, Talent and Data Quality Are the Real Bottlenecks of the Algorithmic Age The metaphor of data as the new oil has shaped a decade of corporate strategy, venture capital theses and political rhetoric. It is a tidy phrase, and like most tidy phrases in questions of power, it has the disadvantage of being wrong in the places where precision matters. In the following reflection, drawn from the analytical architecture of my book Algorithmus. Wer die KI kontrolliert kontrolliert die Zukunft, I want to examine why the oil analogy collapses under scrutiny, and why the real bottlenecks of the algorithmic age lie elsewhere: in compute, in talent, and in the quality of domain data. The consequences of this reframing are not academic. They determine where European industry, and in particular the data-rich Mittelstand, can still build structural advantage against the hyperscalers of the Pacific economies. ## The Failure of an Analogy Oil became a metaphor for data because both appeared, at first glance, to share the qualities that define a strategic resource: scarcity, extractability, refinability, and the capacity to confer production power on whoever controlled the reserves. The comparison entered boardrooms around 2010 and has since been repeated so often that it has acquired the weight of an axiom. Axioms, however, deserve examination precisely in the moments when they feel most self-evident. Data is not scarce. In 2023 the world produced roughly 2.5 quintillion bytes of data per day, a figure that grows by more than twenty-three percent annually. The overwhelming majority of that mass is, in any strategic sense, worthless: redundant noise, unstructured fragments, obsolete records, technical metadata without semantic content. What is scarce is not the raw material but the capacity to distil decision intelligence from it. The oil analogy obscures this distinction and, in obscuring it, leads entire industries to invest in accumulation when they should be investing in refinement. The empirical record of the past fifteen years confirms this reading. Google possessed more data than any other company on earth in 2010, and yet Microsoft, Amazon, Meta and dozens of start-ups went on to build successful digital markets. Netflix held more viewing data than every traditional studio combined, and the studios retained relevant positions in the market for filmed entertainment. Bloomberg sat on more financial data than any hedge fund, and the hedge funds continued to generate alpha. In each case, the decisive variable was not the volume of data held but the capacity to model, to interpret and to integrate findings into decisions. ## Compute as the Acute Bottleneck Of the three genuine scarcities of the algorithmic age, compute is the most acute. Training GPT-4 cost, according to estimates by researchers at Stanford University, between sixty-three and one hundred million dollars in pure compute time on NVIDIA A100 chips. That figure corresponds to roughly twenty-one thousand training hours on chips that individually cost between ten and fifteen thousand dollars, housed in data centres whose monthly operating costs begin at half a million dollars and rise from there. The next generation of frontier models, according to projections by the research institute Epoch AI, will require training investments that exceed one billion dollars for a single run. A billion dollars for one training run is not an incremental barrier. It is a structural filter that restricts competition at the foundation-model layer to a handful of globally capitalised actors. It also explains why NVIDIA's quarterly data centre revenue rose from 4.3 billion dollars in the third quarter of 2022 to 18.4 billion dollars in the third quarter of 2023, and why the company's market capitalisation surpassed three trillion dollars by the end of 2024. Compute is the resource on which the entire edifice rests, and compute is concentrated in a geography and a supply chain whose fragility I have described at length elsewhere in the book. For any enterprise operating below hyperscaler scale, the strategic conclusion is not to attempt symmetrical competition at the foundation layer. That contest has been decided by capital. The conclusion is rather to understand where compute scarcity creates dependency, and to architect one's own position with that dependency in mind. ## Talent as the Least Scalable Resource The second bottleneck is talent, and it is the one that resists scaling most stubbornly. The number of people worldwide capable of developing, training and improving frontier AI models is estimated by industry analysts at a few thousand, possibly fewer than five thousand individuals. These researchers are the object of a global competition in which OpenAI, Google DeepMind, Anthropic, Meta AI, Microsoft Research, Baidu, Huawei and a growing number of national research institutions are bidding for the same pool, with compensation packages for experienced researchers routinely reaching between five hundred thousand and two million dollars per year. The MacroPolo institute has shown that roughly seventy percent of the most-cited AI researchers work in the United States, although forty-nine percent of them were born outside it. America, in other words, does not primarily produce AI talent. It imports it, and converts imported talent into national capability. This pattern describes a second-order dependency that every European strategist should sit with for a long moment: even if Europe possessed the compute, it would still face a talent gradient that decades of educational policy have not closed. Talent, unlike compute, cannot be bought with capital alone. It follows research ecosystems, intellectual density, legal predictability and a certain cultural permission to take technical risks. The quiet erosion of European research positions over two decades is not a question of salaries but of the gravitational field in which talent chooses to locate itself. ## Data Quality as the Underestimated Edge The third bottleneck, and the one most consistently underestimated in strategic discussions, is data quality. The difference between a useful and a transformative AI system often lies not in model architecture but in the character of the training data. Synthetically generated data, increasingly deployed to compensate for domain scarcity, cannot fully substitute for the irregularities, the contextuality and the specific noise of data drawn from reality. A model trained on synthetic patient records behaves differently from one trained on real ones, because real data carries patterns that no simulation reproduces in full. It is precisely here that the strategic opportunity for the European Mittelstand becomes visible. A medium-sized pharmaceutical company that has accumulated thirty years of clinical trial data possesses a proprietary asset that no Silicon Valley laboratory can purchase or synthesise. A mid-market mechanical engineering firm with forty years of sensor data from installed machines worldwide holds a domain corpus that exceeds any general industrial dataset. A logistics operator with twenty years of route optimisation data for a specific geography can train models that outperform any general navigation algorithm within that territory. Siemens Xcelerator illustrates the logic in operation. Decades of machine operating data from hundreds of thousands of installed plants are used to train AI models for predictive maintenance, process optimisation and fault diagnosis, in a configuration that no general industrial model can replicate. The result is a competitive advantage that greater compute capacity and broader general training data cannot easily overtake, because the underlying substrate is not available on the open market. ## The European Refinery The strategic path for data-rich actors without hyperscaler resources is, in its structure, almost classical. Proprietary domain data of genuine quality is to be treated as the raw material, and specialised algorithmic competence as the refinery. Neither element alone produces structural advantage. Data without the capacity to refine it remains a dormant asset, visible on a balance sheet perhaps but absent from operational reality. Algorithmic capacity without proprietary data reduces the enterprise to a reseller of general models, competing on margins that the platform owners define. I have argued throughout Algorithmus. Wer die KI kontrolliert kontrolliert die Zukunft that Europe's best opportunity in the algorithmic age does not lie in reproducing the foundation-model economy, which has already been capitalised beyond the reach of any realistic European effort. It lies in a deliberate strategy of vertical depth: specific industries, specific domains, specific decision problems, addressed with proprietary data and engineered algorithmic competence. That strategy is neither defensive nor modest. It is a precise answer to where the real bottlenecks lie, and where they do not. What this requires, institutionally, is the willingness to treat data governance as a first-order strategic question rather than a compliance obligation. The Mittelstand companies that will hold meaningful positions in 2030 are those that begin, now, to catalogue, structure and refine the data they have produced over decades without knowing its future value. The ones that delegate this work to IT departments as a back-office task will discover, as Kodak discovered and Nokia discovered, that the moment for transformation lies always earlier than it seems. The phrase data is the new oil has served its rhetorical purpose and should now be retired. It misdescribes the resource, misnames the scarcity and, most consequentially, misdirects investment. The real bottlenecks of the algorithmic age are compute, which is concentrated in a fragile geography and accessible only at scale; talent, which flows toward dense research ecosystems regardless of national ambition; and data quality, which can only be accumulated through time and domain presence. Of these three, only the third is a scarcity in which Europe, and in particular the European Mittelstand, holds genuine positional advantage. This is the ground on which a serious European answer to the algorithmic age can still be built. The question I leave with the reader, as I leave it with the reader of Algorithmus. Wer die KI kontrolliert kontrolliert die Zukunft, is not whether that ground exists. It does. The question is whether those who stand on it will recognise it as such before the moment for refinement has passed. Dr. Raphael Nagel (LL.M.)

For weekly analysis on capital, leadership and geopolitics: follow Dr. Raphael Nagel (LL.M.) on LinkedIn →