AI Ethics and LLMs Reality Check: Was Timnit Gebru Right?

Nikolaj Kolbasko

Nikolaj Kolbasko

June 12, 2026

9 min

An AI robot gazes into a crystal ball reflecting symbols of compute power, energy, and society. An illustration exploring Timnit Gebru's critique of large language models.

The debate around AI ethics didn't start with ChatGPT. Back in late 2020, a research paper by Timnit Gebru and her co-authors sparked intense discussions within the AI community. The former co-lead of Google's Ethical AI team warned about the societal, economic, and environmental consequences of increasingly large language models.

Her four core criticisms:

  • AI research often prioritizes scaling and benchmarks over societal impact.
  • Large models amplify existing biases found in training data.
  • Energy and resource consumption is growing exponentially.
  • Compute power, data, and influence are becoming concentrated among a few tech giants.

Five and a half years later, Large Language Models shape our daily work: companies are investing billions in AI infrastructure, the EU AI Act is establishing the first regulatory guardrails, and the most capable models come almost exclusively from a handful of global tech corporations.

As a computer scientist in the EU who works with LLMs daily and conducts training on AI, AI Ethics, and the EU AI Act, I revisited Timnit Gebru's main concerns and compared them with current developments in practice and research.

My guiding question: How accurate were those warnings, really?

AI Research Between Innovation & Responsibility

Back in late 2020, Timnit Gebru and Co-Authors criticized that AI research was increasingly focusing on scaling, benchmarks, and model sizes rather than societal impact.

Even today, benchmark results should be viewed critically. They provide important indicators of a model's capabilities, but they don't necessarily reflect real-world use cases. Benchmark data inevitably bleeds into the training process. And "cheating" is hard to avoid. After all, what do you tell investors when hundreds of millions of dollars in LLM development costs go down the drain?

During the pandemic, the German company CureVac showed how closely technological expectations and company valuations are linked. Although CureVac was considered a pioneer in mRNA technology, the market reacted sensitively to a lack of immediate success.

Societal impacts, however, don't show up in benchmark rankings or quarterly reports. This is exactly where Timnit Gebru's criticism hits home.

A look at social media platforms has shown for years that economic incentives don't necessarily align with societal benefit. Attention generates reach; reach generates revenue. The question of whether technological development is keeping pace with societal responsibility is more relevant today than ever.

Bias in LLMs: Still a Problem?

Large Language Models can inherit and sometimes even amplify existing biases from their training data. This "algorithmic bias" remains one of the central challenges of modern AI systems.

A major reason for this is the sheer volume of training data. Driven by the principle of veracity, massive, terabyte-scale datasets flow into LLM training. Complete manual oversight is neither economically nor organizationally feasible.

At the same time, models have evolved significantly in recent years. In tests with current models (ChatGPT 5.2, Gemini 3, DeepSeek-V3), I could no longer reproduce classic gender bias examples. On the contrary: the models actively pointed out that certain statements or gender roles cannot be statistically assigned with certainty.

This doesn't mean bias has disappeared. Rather, the discussion has shifted from obvious distortions to more subtle cultural, linguistic, and political influences.

Consumption of Energy & Ressources

The energy and resource consumption of large AI systems is one of the criticisms that has become particularly visible in recent years.

I found it especially striking when Microsoft announced it would no longer plan new AI workloads for certain Azure regions in Europe. The background: energy bottlenecks and limited capacity.

At that point, it became clear that AI energy consumption is no longer a theoretical discussion. It has concrete impacts on infrastructure decisions.

When Google suddenly develops an interest in nuclear energy and Sam Altman invests in companies like Oklo (a developer of modular nuclear reactors), it's safe to say that AI's energy demands are now about much more than just a few extra server racks.

The rising water consumption of data centers is also increasingly coming into focus. The EU now requires operators to disclose these metrics. And let's not even start on the battle for rare earth elements needed for AI chips.

Big Tech and the Data Power

In my view, this is where Timnit Gebru and her colleagues were closest to today's reality.

My own AI journey started back in 2018 with Word2Vec and BERT. Even then, it was clear that serious experiments with neural networks required GPUs and TPUs. Thanks to platforms like Google Colab, students and researchers could still gain practical experience on a manageable budget.

With GPT-3, the situation changed fundamentally.

Today, the most capable models are developed by companies whose market valuations sometimes exceed the economic output of entire nations.

These major AI providers can afford experiments that would simply be unpayable for most other companies: backed by massive cash reserves, a vast treasure trove of data, and steadily growing business models.

While OpenAI, Anthropic, or Alibaba invest billions, many European providers are fighting for funding, talent, and infrastructure. Even prominent projects like Aleph Alpha have since adjusted their original LLM strategy.

Why I Don't Train My Own LLM

A colleague recently asked me why I don't just train my own LLM. After all, it would be enough if it could just help me with Python.

My answer was surprisingly unspectacular:

  • Data acquisition is complex.
  • Good training data for code is even more complex.
  • The model's viability is hard to estimate in advance.
  • Highly capable models already exist.
  • Training a usable model would be expensive.

So why reinvent the wheel?

Even researchers rarely train their own models from scratch anymore. Instead, they experiment with existing LLMs and build research projects on top of them.

One of my professors now describes researching with LLMs as his "bread-and-butter business." That sums up the situation remarkably well.

The Infrastructure Behind the AI Boom

For some time now, we've also been seeing a massive increase in prices for RAM, SSDs, and compute capacity. Large cloud and data center operators are constantly driving up demand, causing many manufacturers to shift their focus more heavily toward the B2B market.

The result: Anyone wanting to experiment with AI privately will sooner or later end up either in the cloud or facing hardware prices that bring little joy.

Conclusion: Was Timnit Gebru Right?

I bought my first NVIDIA stock back in 2020. Not because of a stock tip, but because both Google and our university data center relied on NVIDIA hardware for AI workloads. However, I certainly didn't anticipate that this infrastructure dominance would become one of the main drivers of the AI boom just a few years later.

When I first read the paper in late 2020, I thought much of it was exaggerated. The AI world was different back then. GPT-3 was new, ChatGPT didn't exist yet, and hardly anyone outside of research and the tech industry was talking about Large Language Models.

Five and a half years later, I have to admit:
Even if not every single one of her predictions played out exactly as stated, her warnings, especially regarding energy consumption, infrastructure, market concentration, and AI governance feel remarkably relevant today.

The most exciting questions around AI are no longer just about algorithms or model sizes. They are about energy supply, regulation, competition, and societal responsibility.

That is exactly why I now have a much better understanding of why the EU AI Act was created.

No matter how you look at it: AI remains one of the most exciting and controversial topics of our time. And the longer I deal with it, the more respect I have for people who ask uncomfortable questions before everyone else realizes they were important.

Source:


Timnit Gebru et al. (2021): On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?. Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (FAccT '21). DOI: 10.1145/3442188.3445922

Link to full article: https://dl.acm.org/doi/pdf/10.1145/3442188.3445922

MORE ARTICLES LIKE THIS


April 22, 2025 | 16 mins

Tech certifications – hobby or actually helpful for your career?

Are tech certifications truly career boosters or just trendy badges? This article explores their real-world value in hiring, talent development, and p ...

July 1, 2025 | 7 mins

Remove Unwanted Apps: Microsoft Intune Gets Rid of Bloatware

Discover how to simplify Windows 11 enterprise environments by removing bloatware with Microsoft Intune. Streamline performance, reduce risks, and imp ...

Follow Us

Our Services

shiftavenue® and the shiftavenue® logo are registered trademarks of shiftavenue GmbH.