Oracle announced the general availability of Oracle Exadata Database Service on Exascale Infrastructure on Oracle Database@Azure(link sends e-mail).
You can't observe what you can't see. It's why application developers and DevOps engineers collect performance metrics, and it's why developers and security engineers like open source software. The ability to inspect code increases trust in what the code does. With the increased prevalence of generative AI, there's a desire to have the same ability to inspect the AI models. Most generative AI models are black boxes, so some vendors are using the term "open source" to set their offerings apart.
But what does "open source AI" mean? There's no generally-accepted definition.
The Open Source Initiative (OSI) defines what "open source" means for software. Their Open Source Definition(link is external) (OSD) is a broadly-accepted set of criteria for what makes a software license open source. New licenses are reviewed in public by a panel of experts to evaluate their compliance with the OSD. OSI focuses on software, leaving "open" in other domains to other bodies, but since AI models are, at the simplest level, software plus configuration and data, OSI is a natural home for creating a definition of open source AI.
What's Wrong with the OSAID?
The OSI attempted to do just that. The Open Source AI Definition (OSAID), released at the end of October, represents a collaborative attempt to craft a set of rules for what makes an AI model "open source." But while the OSD is generally accepted as an appropriate definition of "open source" for software, the OSAID has received mixed reviews.
The crux of the criticism is that the OSAID does not require that the training data be available, only "data information." Version 1.0(link is external), and its accompanying FAQ(link is external) require that training data be made available if possible, but still permits providing only a description when the data is "unshareable." OSI's argument is that the laws that cover data are more complex, and have more jurisdictional variability, than the laws governing copyrightable works like software. There's merit to this, of course. The data used to train AI models includes copyrightable works like blog posts, paintings, and books, but it can also include sensitive and protected information like medical histories and other personal information. Model vendors that train on sensitive data couldn't legally share their training data, OSI argues, so a definition that requires it is pointless.
I appreciate the merits of that argument, but I — and others — don't find it compelling enough to craft the definition of "open source" around it, especially since model vendors can find plausible reasons to claim they can't share training data. The OSD is not defined based on what is convenient, it's defined based on what protects certain rights for the consumers of software. The same should be true for a definition of open source AI. The fact that some models cannot meet the definition should mean that those models are not open source; it should not mean that the definition is changed to be more convenient. If no models could possibly meet a definition of open, that's one thing. But many existing models do, and more could if the developers chose.
Any Definition Is a Starting Point
Despite the criticisms of the OSI's definition, a flawed definition is better than no definition. Companies use AI in many ways: from screening job applicants to writing code to creating social media images to customer service chatbots. Any of these uses pose the risk for reputational, financial, and legal harm. Companies who use AI need to know exactly what they're getting — and not getting. A definition for "open source AI" doesn't eliminate the need to carefully examine an AI model, but it does give a starting point.
The current OSD has evolved over the last two decades; it is currently on version 1.9. It stands to reason that the OSAID will evolve as people use it to evaluate real-world AI models. The criticisms of the initial version may inform future changes that result in a more broadly-accepted definition. In the meantime, other organizations have announced their own efforts to address deficiencies in the OSAID. The Digital Public Goods Alliance — a UN-endorsed initiative — will continue to require published training data(link is external) in order to grant Digital Public Good status to AI systems.
It is also possible that we'll change how we speak about openness. Just like OSD-noncompliant movements like Ethical Source(link is external) have introduced a new vocabulary, open source AI may force us to recognize that openness is a spectrum on several axes, not simply a binary attribute.
Industry News
Perforce Software announced its acquisition of Snowtrack.
Mirantis and Gcore announced an agreement to facilitate the deployment of artificial intelligence (AI) workloads.
Amplitude announced the rollout of Session Replay Everywhere.
Oracle announced the availability of Java 24, the latest version of the programming language and development platform. Java 24 (Oracle JDK 24) delivers thousands of improvements to help developers maximize productivity and drive innovation. In addition, enhancements to the platform's performance, stability, and security help organizations accelerate their business growth ...
Tigera announced an integration with Mirantis, creators of k0rdent, a new multi-cluster Kubernetes management solution.
SAP announced “Joule for Developer” – new Joule AI co-pilot capabilities embedded directly within SAP Build.
SUSE® announced several new enhancements to its core suite of Linux solutions.
Progress is offering over 50 enterprise-grade UI components from Progress® KendoReact™, a React UI library for business application development, for free.
Opsera announced a new Leadership Dashboard capability within Opsera Unified Insights.
Cycloid announced the introduction of Components, a new management layer enabling a modular, structured approach to managing cloud resources within the Cycloid engineering platform.
ServiceNow unveiled the Yokohama platform release, including ServiceNow Studio which provides a unified workspace for rapid application development and governance.
Sonar announced the upcoming availability of SonarQube Advanced Security.
ScaleOut Software introduces generative AI and machine-learning (ML) powered enhancements to its ScaleOut Digital Twins™ cloud service and on-premises hosting platform with the release of Version 4.
Kurrent unveiled a developer-centric evolution of Kurrent Cloud that transforms how developers and dev teams build, deploy and scale event-native applications and services.