From Code to Data: Openness in the Age of AI

Tom Williams
Dec 16, 2025
3 min read

TL;DR:

Open source began as a commitment to transparency and shared progress, not just open code.
In AI, openness limited to code is no longer sufficient.
Responsible AI requires visibility into data, model weights, and objectives.
Policy is beginning to reflect this shift, but developers are already there.
Openness is becoming a practical requirement, not a philosophical choice.

Abstract geometric art with blue and white cubes on a black background. White lines form a grid pattern, creating a modern, structured look.

The spirit of the original open-source movement

Open source didn’t start as a licensing model. It started as a belief: that progress is enhanced when people can see how things work, question them, and improve them together. The early success of projects like Linux and Kubernetes wasn’t just technical; it was cultural. “View source” became a promise, a statement of trust that systems shaping the world could be understood by the people building on top of them.

Why code is no longer the whole story

For decades, open source had a clear meaning: the source code was available, and that availability was considered sufficient. If developers could inspect the code and understand how a system worked, openness was a given.

In 2025, that definition no longer holds. Publishing an AI model’s architecture tells only part of the story. Today, AI systems are defined by their training data, model weights, and objectives; all of which directly determine how a system behaves in practice. Without visibility into them, claims of openness are necessarily incomplete.

The missing link: open data and ethical AI

This disconnect became clear to me while studying AI ethics at Oxford. AI companies often publish ethical principles, but developers working with these systems tell a different story. As AI moves into healthcare, finance, and transportation, that reality matters more than ever.

Researchers like Luciano Floridi and Josh Cowls have distilled ethical AI into four core principles: transparency, accountability, fairness, and oversight. These echo the values that built open source in the first place.

But here’s what I found in direct interviews with AI developers: without structural openness, ethical AI is almost impossible. To address this, I propose the idea of “open sandboxes”: shared environments where developers, regulators, and researchers can test systems with full visibility into training data, model objectives, and observed behavior.

How policy is catching up

This idea is moving from theory to practice. In early 2025, the Open Source Initiative (OSI) released its Open Source AI Definition, which goes beyond code to include model weights, data lineage, and build methods. And from August this year, the EU AI Act requires general-purpose AI providers to disclose details about their training datasets and development practices.

These policies shift openness from a voluntary ideal to a shared expectation. While led by the EU, they signal a global shift toward defining what openness in AI really requires.

What developers already know

Policymakers may still be debating these issues, but developers have already moved. A joint report by SlashData and Catchy published last month found that over two-thirds of developers choose open-source models when integrating AI, and not just for cost reasons. Developers value trust, adaptability, and being able to see how models behave.

These are the same principles behind responsible AI. Open ecosystems make it easier to audit behavior, test assumptions, and fine-tune tools to real-world needs. The same report also illustrates how openness enables deeper insight.

What openness must mean now

In the AI era, openness must go deeper. Transparency has to span the full stack: training data, model weights, objectives, and the architectural scaffolding that binds them. These are the factors shaping how AI behaves.

For developers, this means going beyond publishing code. It means opening the artefacts that influence model behavior. For policymakers, it means enabling that transparency in ways that are secure and scalable. For tech leaders, it means viewing openness not as a checkbox, but as a driver of trust and resilience.

As regulation expands and expectations rise, the organizations that thrive will be those willing to self-audit and invite external scrutiny. For the open-source community, the challenge is clear: extend an ideology built on code to encompass the data and methods now powering the next generation of AI systems.

Catchy

Contact Us