Open-Source and open access licensing in an AI Large Language Model (LLMs) world
This text initially appeared as a blog post on gpai.ai, coauthored by Yann Dietrich, Catherine Stihler, and Lea Daun.
Abstract
With the rise of generative AI and other innovations, focus has increased on how to make open-source and open access AI models available in a safe and trusted manner. The OECD survey of G7 members in connection with the Hiroshima AI Process highlights that open-source is important for fostering innovation and competition. However, open-source AI also presents risks that need to be addressed.
The question of how to responsibly address open-source and open access AI models has captured policymakers’ attention. For instance, the EU AI Act includes exceptions from transparency, disclosure and documentation obligations for certain open-source AI applications. However, the scope of these exceptions is limited as it does not exempt AI systems that are monetized or considered high-risk. President Biden’s recent AI executive order seeks input on the implications of making AI model weights widely available, including the potential benefits and harms. The US Commerce Department has solicited public comment on this topic.
The GPAI IP Advisory Committee (the Committee) commenced a project in 2022 examining how standard contract terms can advance responsible AI data and model sharing. As part of this project, the Committee recently launched an AI Contract Terms Incubator (AI CTI) to bring different stakeholders together to share ideas on this topic. Open-source and open access AI models have been a focus of the Committee’s work.
On March 4, 2024, the Committee held a virtual multistakeholder global workshop focusing on contract terms for open-source and open access AI models. The workshop’s agenda included examining the current state of open-source contracting, and how open-source and open access has emerged in the AI model context. The workshop further examined how open-source and open access contract terms may advance the safe and responsible use of AI models. Finally, the workshop explored potential paths forward for creating appropriate open-source and open access contractual structures for responsible AI model use. The following summarizes the workshop’s key takeaways.[1] You can view a recording of the workshop here.
1. Unveiling the Layers of the Open-Source Licensing Landscape for AI Models
The workshop started with a reminder of the fundamentals of open-source software licensing and then compared and contrasted it to open-source and open access AI models. Participants explained how open-source and open access AI models typically incorporate elements in addition to copyrightable software. These additional elements include: 1) input data, used to train, test, and/or validate the AI models, 2) weights factors, and 3) AI model outputs.
Panelists discussed how these additional AI model elements may not be eligible for intellectual property (IP) protection. Contract terms could potentially help allocate usage and possibly other rights to some of these elements, such as weights and input data, in the absence of clearly governing IP laws. Contract terms also could potentially help address ethical concerns raised by open-source or open access AI models by imposing usage limitations. The potential effectiveness of such contractual usage limitations could be enhanced by appropriate contract enforcement mechanisms. Finally, contract terms might also help address transparency by, for example, contractually requiring certain disclosures about the input data and/or model weights.
The following explains in more detail some of the key differences between open-source software and open-source or open access AI models and related contract issues:
- Focus extends beyond source code. Open-source software licenses primarily focus on the terms and conditions for making software source code available, including whether it is on a copyleft or non-copyleft basis and attribution requirements. However, for open-source and open access AI models, additional questions typically arise. These questions include 1) Which elements of an AI system should be open? 2) What are the rights and obligations for each element? 3) Which elements are protected by intellectual property and/or contract?
- The differences may require adjustments to the contractual approach. Panelists agreed that typical open-source software licenses raise challenges in the open-source and open access AI model context, given the differences described above. Panelists discussed the need for a different or adjusted licensing approach for open access and open-source AI models. Some adjustments could seek to address the ethical and responsible use of AI models. However, panelists discussed how such usage limitations would likely depart from traditional open-source software licensing. Specifically, one of the cornerstones of the definition of open-source is the “no discrimination against field of endeavors.” Panelists also commented that this principle makes it challenging for any open-source license to impose any restriction of certain usage of the software. This could weigh in favor of using the term “open access” to describe AI model license agreements that promote openness but include some usage limitations.
Contracts can also help address the growing issues arising with respect to AI outputs. Current open-source software licenses typically address derivative works, a concept that arises under copyright law. However, this derivative work construct often is not well suited for addressing issues raised by AI outputs, which may (or may not) resemble some of the input data or be deemed a derivative work under copyright law. Similarly, derivative work concepts may not work well for changes or adjustments to AI model weights, particularly since they may not be copyrightable. These considerations also weigh in favor of new standard license agreements for open access AI models.
- Compliance mechanisms for AI model sharing are likely to be more complicated compared to open-source software. As noted above, AI has become more regulated than software. This heightens the need to create a structure that supports compliance and that allows for appropriate evaluation and allocation of liability should contract or other violations occur or if harms arise. New standard license agreements could potentially help address these considerations, too.
2. The New Open-Source and Open Access Approaches for AI Model Licensing
Workshop participants also discussed the following effort to develop AI model licenses and related topics.
- Responsible AI Licenses (RAIL). The RAIL licenses seek to address ethical concerns raised by AI model licensing through efforts commencing around 2018. The RAIL licenses contain restrictions on certain specified uses. While panelists agreed on the need to promote ethical use, questions arose whether such concerns should be addressed in license agreements. An alternative approach might be to address the concerns through laws and regulations, and/or perhaps codes of conduct. These approaches might be better suited to address complex concepts and provide remedies for unethical conduct in at least some situations. This remains an open question, particularly as AI laws and regulations increasingly develop.
- Liability Considerations. Panelists reiterated that historically, one of the key incentives for people to share open-source software is the ability to exclude their own liability for such contribution. They explained that people may be less likely to make software freely available if it could result in significant liability. With the understanding that AI models are more complex and can potentially present more harms and risks, questions were raised about how liability should be evaluated and considered in this context. Some panelists suggested creating some form of safe harbor for at least some open access AI model contributors.
- Complexity and Concentration. Panelists also raised concerns about the increased complexity of AI foundation models and the costs and access to compute power required to train them. On the one hand, panelists foresee this might make it more difficult for open-source AI models to emerge. On the other hand, open-source AI models — made available on a cloud basis — could unlock more innovation by reducing the need for innovators to acquire these costly resources.
3. A Path to Avoid Proliferation of Licenses
To avoid an unnecessary or unhelpful proliferation of open access AI model licenses, the workshop considered approaches for developing licenses. The success of Creative Commons was one example given where a simple solution to the problem of failed sharing online led to the creation of a global community where open licenses were globally adopted. By providing creators with a range of Creative Commons licenses from most open to least in the form of 6 licenses and 2 public domain tools, choice led to adoption because openness is not one thing, but as previously discussed, a spectrum.
Participants urged consideration of learning from the Creative Commons model to help guide development of open access AI model licenses. They also agreed that a menu of different licenses is desirable to accommodate the range of considerations and use cases.
The Open-Source Initiative discussed its ongoing work in co-creating an Open AI definition. When open-source was first created, it was created around a new ecosystem and a new technology. Today, there isa paradigm shift with AI, which requires new thinking and a new definition of “open” for this context, reflecting input from many stakeholders. The Open Knowledge Foundation, which is the custodian of the open data definition, is also conducting a multi-stakeholder review process of its definitions.
ML Commons is another example of an organization working to expand data sharing. It underscored the need for new license agreements that are tailored for data. New license terms could help clarify, among other things, how the data sets can be used. Panelists also encouraged the development of machine-readable licenses to make them easier to implement and potentially enforce.
Conclusion
Overall, the workshop underscored the critical need for developing appropriate contractual frameworks to address the unique challenges and opportunities presented by open-source and open access AI models. The core economic value of open-source manifests in it enabling of innovation and competition. As the AI landscape continues to evolve, collaborative efforts like those of the GPAI IP Advisory Committee and other organizations are essential in shaping effective licensing standards that foster innovation while safeguarding public interests.
Interested readers are invited to engage in the discussion by joining our next GPAI events and should contact Kaitlyn Bove (kaitlyn.bove@inria.fr).
This blog post summarizes panellist views and does not reflect the views of GPAI, the GPAI IP Advisory Committee, or its members.
GPAI experts produced this blog post under its former governance (2020-2024). It does not necessarily represent members’ views under the new integrated partnership with the OECD and the GPAI members as of July 2024.