Amazon Web Services (AWS) announced new capabilities for Amazon Q Developer, a generative AI assistant for software development, that take the undifferentiated heavy-lifting out of complex and time-consuming application migration and modernization projects, saving customers and partners time and money.
You can't observe what you can't see. It's why application developers and DevOps engineers collect performance metrics, and it's why developers and security engineers like open source software. The ability to inspect code increases trust in what the code does. With the increased prevalence of generative AI, there's a desire to have the same ability to inspect the AI models. Most generative AI models are black boxes, so some vendors are using the term "open source" to set their offerings apart.
But what does "open source AI" mean? There's no generally-accepted definition.
The Open Source Initiative (OSI) defines what "open source" means for software. Their Open Source Definition (OSD) is a broadly-accepted set of criteria for what makes a software license open source. New licenses are reviewed in public by a panel of experts to evaluate their compliance with the OSD. OSI focuses on software, leaving "open" in other domains to other bodies, but since AI models are, at the simplest level, software plus configuration and data, OSI is a natural home for creating a definition of open source AI.
What's Wrong with the OSAID?
The OSI attempted to do just that. The Open Source AI Definition (OSAID), released at the end of October, represents a collaborative attempt to craft a set of rules for what makes an AI model "open source." But while the OSD is generally accepted as an appropriate definition of "open source" for software, the OSAID has received mixed reviews.
The crux of the criticism is that the OSAID does not require that the training data be available, only "data information." Version 1.0, and its accompanying FAQ require that training data be made available if possible, but still permits providing only a description when the data is "unshareable." OSI's argument is that the laws that cover data are more complex, and have more jurisdictional variability, than the laws governing copyrightable works like software. There's merit to this, of course. The data used to train AI models includes copyrightable works like blog posts, paintings, and books, but it can also include sensitive and protected information like medical histories and other personal information. Model vendors that train on sensitive data couldn't legally share their training data, OSI argues, so a definition that requires it is pointless.
I appreciate the merits of that argument, but I — and others — don't find it compelling enough to craft the definition of "open source" around it, especially since model vendors can find plausible reasons to claim they can't share training data. The OSD is not defined based on what is convenient, it's defined based on what protects certain rights for the consumers of software. The same should be true for a definition of open source AI. The fact that some models cannot meet the definition should mean that those models are not open source; it should not mean that the definition is changed to be more convenient. If no models could possibly meet a definition of open, that's one thing. But many existing models do, and more could if the developers chose.
Any Definition Is a Starting Point
Despite the criticisms of the OSI's definition, a flawed definition is better than no definition. Companies use AI in many ways: from screening job applicants to writing code to creating social media images to customer service chatbots. Any of these uses pose the risk for reputational, financial, and legal harm. Companies who use AI need to know exactly what they're getting — and not getting. A definition for "open source AI" doesn't eliminate the need to carefully examine an AI model, but it does give a starting point.
The current OSD has evolved over the last two decades; it is currently on version 1.9. It stands to reason that the OSAID will evolve as people use it to evaluate real-world AI models. The criticisms of the initial version may inform future changes that result in a more broadly-accepted definition. In the meantime, other organizations have announced their own efforts to address deficiencies in the OSAID. The Digital Public Goods Alliance — a UN-endorsed initiative — will continue to require published training data in order to grant Digital Public Good status to AI systems.
It is also possible that we'll change how we speak about openness. Just like OSD-noncompliant movements like Ethical Source have introduced a new vocabulary, open source AI may force us to recognize that openness is a spectrum on several axes, not simply a binary attribute.
Industry News
OpenText announced a strategic partnership with Secure Code Warrior to integrate its dynamic learning platform into the OpenText Fortify application security product suite.
Salesforce announced a series of updates for Heroku, a platform as a service (PaaS) offering that enables teams to build, deploy, and scale modern applications entirely in the cloud.
Onapsis announced the expansion of its Control product line to include a new bundle that enhances application security testing capabilities for SAP Business Technology Platform (BTP).
Amazon Web Services announced new enhancements to Amazon Q Developer, including agents that automate unit testing, documentation, and code reviews to help developers build faster across the entire software development process, and a capability to help users address operational issues in a fraction of the time.
Amazon Web Services (AWS) and GitLab announced an integrated offering that brings together GitLab Duo with Amazon Q.
Tenable announced the release of Tenable Patch Management, an autonomous patch solution built to quickly and effectively close vulnerability exposures in a unified solution.
SurrealDB announced the launch of Surreal Cloud, a Database-as-a-Service (DBaaS) offering.
SmartBear announced its acquisition of QMetry, provider of an AI-enabled digital quality platform designed to scale software quality.
Red Hat signed a strategic collaboration agreement (SCA) with Amazon Web Services (AWS) to scale availability of Red Hat open source solutions in AWS Marketplace, building upon the two companies’ long-standing relationship.
CloudZero announced the launch of CloudZero Intelligence — an AI system powering CloudZero Advisor, a free, publicly available tool that uses conversational AI to help businesses accurately predict and optimize the cost of cloud infrastructure.
Opsera has been accepted into the Amazon Web Services (AWS) Independent Software Vendor (ISV) Accelerate Program, a co-sell program for AWS Partners that provides software solutions that run on or integrate with AWS.
Spectro Cloud is a launch partner for the new Amazon EKS Hybrid Nodes feature debuting at AWS re:Invent 2024.
Couchbase unveiled Capella AI Services to help enterprises address the growing data challenges of AI development and deployment and streamline how they build secure agentic AI applications at scale.
Veracode announced innovations to help developers build secure-by-design software, and security teams reduce risk across their code-to-cloud ecosystem.