Can We Move Forward with the Open Source AI Definition?

December 04, 2024

Ben Cotton
Kusari

You can't observe what you can't see. It's why application developers and DevOps engineers collect performance metrics, and it's why developers and security engineers like open source software. The ability to inspect code increases trust in what the code does. With the increased prevalence of generative AI, there's a desire to have the same ability to inspect the AI models. Most generative AI models are black boxes, so some vendors are using the term "open source" to set their offerings apart.

But what does "open source AI" mean? There's no generally-accepted definition.

The Open Source Initiative (OSI) defines what "open source" means for software. Their Open Source Definition(link is external) (OSD) is a broadly-accepted set of criteria for what makes a software license open source. New licenses are reviewed in public by a panel of experts to evaluate their compliance with the OSD. OSI focuses on software, leaving "open" in other domains to other bodies, but since AI models are, at the simplest level, software plus configuration and data, OSI is a natural home for creating a definition of open source AI.

What's Wrong with the OSAID?

The OSI attempted to do just that. The Open Source AI Definition (OSAID), released at the end of October, represents a collaborative attempt to craft a set of rules for what makes an AI model "open source." But while the OSD is generally accepted as an appropriate definition of "open source" for software, the OSAID has received mixed reviews.

The crux of the criticism is that the OSAID does not require that the training data be available, only "data information." Version 1.0(link is external), and its accompanying FAQ(link is external) require that training data be made available if possible, but still permits providing only a description when the data is "unshareable." OSI's argument is that the laws that cover data are more complex, and have more jurisdictional variability, than the laws governing copyrightable works like software. There's merit to this, of course. The data used to train AI models includes copyrightable works like blog posts, paintings, and books, but it can also include sensitive and protected information like medical histories and other personal information. Model vendors that train on sensitive data couldn't legally share their training data, OSI argues, so a definition that requires it is pointless.

I appreciate the merits of that argument, but I — and others — don't find it compelling enough to craft the definition of "open source" around it, especially since model vendors can find plausible reasons to claim they can't share training data. The OSD is not defined based on what is convenient, it's defined based on what protects certain rights for the consumers of software. The same should be true for a definition of open source AI. The fact that some models cannot meet the definition should mean that those models are not open source; it should not mean that the definition is changed to be more convenient. If no models could possibly meet a definition of open, that's one thing. But many existing models do, and more could if the developers chose.

Any Definition Is a Starting Point

Despite the criticisms of the OSI's definition, a flawed definition is better than no definition. Companies use AI in many ways: from screening job applicants to writing code to creating social media images to customer service chatbots. Any of these uses pose the risk for reputational, financial, and legal harm. Companies who use AI need to know exactly what they're getting — and not getting. A definition for "open source AI" doesn't eliminate the need to carefully examine an AI model, but it does give a starting point.

The current OSD has evolved over the last two decades; it is currently on version 1.9. It stands to reason that the OSAID will evolve as people use it to evaluate real-world AI models. The criticisms of the initial version may inform future changes that result in a more broadly-accepted definition. In the meantime, other organizations have announced their own efforts to address deficiencies in the OSAID. The Digital Public Goods Alliance — a UN-endorsed initiative — will continue to require published training data(link is external) in order to grant Digital Public Good status to AI systems.

It is also possible that we'll change how we speak about openness. Just like OSD-noncompliant movements like Ethical Source(link is external) have introduced a new vocabulary, open source AI may force us to recognize that openness is a spectrum on several axes, not simply a binary attribute.

Ben Cotton is Head of Community at Kusari

Industry News

LambdaTest Launches HyperExecute MCP Server

April 14, 2025

LambdaTest announced the launch of the HyperExecute MCP Server, an enhancement to its AI-native test orchestration platform, HyperExecute.

Cloudflare Announces Workers VPC and VPC Private Link

April 14, 2025

Cloudflare announced Workers VPC and Workers VPC Private Link, new solutions that enable developers to build secure, global cross-cloud applications on Cloudflare Workers.

Nutrient Expands Cloud-Based Services

April 14, 2025

Nutrient announced a significant expansion of its cloud-based services, as well as a series of updates to its SDK products, aimed at enhancing the developer experience by allowing developers to build, scale, and innovate with less friction.

Check Point Recognized for #1 AI-Powered Cyber Security Platform by Miercom

April 10, 2025

Check Point® Software Technologies Ltd.(link is external) announced that its Infinity Platform has been named the top-ranked AI-powered cyber security platform in the 2025 Miercom Assessment.

Orca Introduces Bitbucket App

April 10, 2025

Orca Security announced the Orca Bitbucket App, a cloud-native seamless integration for scanning Bitbucket Repositories.

Live API for Gemini Models in Preview

April 10, 2025

The Live API for Gemini models is now in Preview, enabling developers to start building and testing more robust, scalable applications with significantly higher rate limits.

Backslash Security Digital Twin Approach to Application Security Gains Traction as Legacy Tools Fall Short

April 09, 2025

Backslash Security(link is external) announced significant adoption of the Backslash App Graph, the industry’s first dynamic digital twin for application code.

SmartBear Releases API Hub for Test

April 09, 2025

SmartBear launched API Hub for Test, a new capability within the company’s API Hub, powered by Swagger.

Akamai Announces App & API Protector Hybrid

April 09, 2025

Akamai Technologies introduced App & API Protector Hybrid.

Veracode Secures Patent for Veracode Fix

April 09, 2025

Veracode has been granted a United States patent for its generative artificial intelligence security tool, Veracode Fix.

Zesty Introduces Vertical Pod Autoscaler

April 09, 2025

Zesty announced that its automated Kubernetes optimization platform, Kompass, now includes full pod scaling capabilities, with the addition of Vertical Pod Autoscaler (VPA) alongside the existing Horizontal Pod Autoscaler (HPA).

Check Point Software Emerges as a Leader in Attack Surface Management After Acquiring Cyberint, According to Latest GigaOm Radar Report

April 08, 2025

Check Point® Software Technologies Ltd.(link is external) has emerged as a leading player in Attack Surface Management (ASM) with its acquisition of Cyberint, as highlighted in the recent GigaOm Radar report.

GitHub Releases Security Campaigns with Copilot Autofix

April 08, 2025

GitHub announced the general availability of security campaigns with Copilot Autofix to help security and developer teams rapidly reduce security debt across their entire codebase.

DX and Spotify Partner on Portal for Backstage

April 08, 2025

DX and Spotify announced a partnership to help engineering organizations achieve higher returns on investment and business impact from their Spotify Portal for Backstage implementation.

Appfire Launches Cloud Advantage Alliance

April 07, 2025

Appfire announced its launch of the Appfire Cloud Advantage Alliance.

DEVOPSdigest

What's Wrong with the OSAID?

Any Definition Is a Starting Point

Industry News

Upcoming Webinars

On-Demand Webinars

Analyst Reports

White Papers

Media Partners

The Latest

Hot Topics

What's Wrong with the OSAID?

Any Definition Is a Starting Point

Related Links

Industry News

Search form

Upcoming Webinars

On-Demand Webinars

Analyst Reports

White Papers

Media Partners

User login

The Latest

Hot Topics