<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Mia Lykou Lund &#8211; Open Source Initiative</title>
	<atom:link href="https://opensource.org/blog/author/mia-lykoulund/feed" rel="self" type="application/rss+xml" />
	<link>https://opensource.org</link>
	<description>The steward of the Open Source Definition, setting the foundation for the Open Source Software ecosystem.</description>
	<lastBuildDate>Thu, 30 Jan 2025 04:28:38 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	

<image>
	<url>https://i0.wp.com/opensource.org/wp-content/uploads/2023/01/cropped-cropped-OSI_Horizontal_Logo_0-e1674081292667.png?fit=32%2C32&#038;ssl=1</url>
	<title>Mia Lykou Lund &#8211; Open Source Initiative</title>
	<link>https://opensource.org</link>
	<width>32</width>
	<height>32</height>
</image> 
<atom:link rel="hub" href="https://pubsubhubbub.appspot.com"/><atom:link rel="hub" href="https://pubsubhubbub.superfeedr.com"/><atom:link rel="hub" href="https://websubhub.com/hub"/><site xmlns="com-wordpress:feed-additions:1">210318891</site>	<item>
		<title>Open Source AI Definition &#8211; Weekly update September 23</title>
		<link>https://opensource.org/blog/open-source-ai-definition-weekly-update-september-23</link>
		
		<dc:creator><![CDATA[Mia Lykou Lund]]></dc:creator>
		<pubDate>Mon, 23 Sep 2024 13:31:52 +0000</pubDate>
				<category><![CDATA[News]]></category>
		<category><![CDATA[ai]]></category>
		<category><![CDATA[Deep Dive: AI]]></category>
		<guid isPermaLink="false">https://opensource.org/?p=72633</guid>

					<description><![CDATA[Stay in the loop on shaping Open Source AI and catch up on the latest discussions from the past week!]]></description>
										<content:encoded><![CDATA[
<h2 class="wp-block-heading"><a target="_blank" href="https://discuss.opensource.org/t/draft-v-0-0-9-of-the-open-source-ai-definition-is-available-for-comments/513">Draft v.0.0.9 of the Open Source AI Definition is available for comments</a></h2>



<ul class="wp-block-list">
<li>@<a target="_blank" href="https://discuss.opensource.org/u/nemobis">nemobis</a> <a target="_blank" href="https://discuss.opensource.org/t/draft-v-0-0-9-of-the-open-source-ai-definition-is-available-for-comments/513/20">points out</a> that the term &#8220;skilled person&#8221; in the Open Source AI Definition needs clarification, especially when considering different legal systems. The term could lead to misinterpretations and suggests adjusting the wording to focus on access to data. Additionally, the term &#8220;substantially equivalent system&#8221; also requires a more precise definition.&nbsp;</li>



<li>@<a target="_blank" href="https://discuss.opensource.org/u/shujisado">shujisado</a> <a target="_blank" href="https://discuss.opensource.org/t/draft-v-0-0-9-of-the-open-source-ai-definition-is-available-for-comments/513/21">adds</a> that in Japan, the term &#8220;skilled person&#8221; is linked to patent law, which could complicate its interpretation. He proposes using a simpler term, like &#8220;person skilled in technology,&#8221; to avoid unnecessary debate.</li>



<li>@<a target="_blank" href="https://discuss.opensource.org/u/stefano">stefano</a> <a target="_blank" href="https://discuss.opensource.org/t/draft-v-0-0-9-of-the-open-source-ai-definition-is-available-for-comments/513/22">asks for suggestions</a> for a better alternative to &#8220;skilled person,&#8221; such as &#8220;practitioner&#8221; or &#8220;AI practitioner.&#8221;</li>



<li>@<a target="_blank" href="https://discuss.opensource.org/u/kjetilk">kjetilk</a> <a target="_blank" href="https://discuss.opensource.org/t/draft-v-0-0-9-of-the-open-source-ai-definition-is-available-for-comments/513/23">jokingly suggests</a> lowering the bar to &#8220;any random person with a computer,&#8221; emphasizing the importance of accessibility in open source, allowing anyone to engage regardless of formal training.</li>



<li>@<a target="_blank" href="https://discuss.opensource.org/u/samj">samj</a> <a target="_blank" href="https://discuss.opensource.org/t/draft-v-0-0-9-of-the-open-source-ai-definition-is-available-for-comments/513/26">highlights</a> that byte-for-byte reproducibility is unrealistic, as randomness and hardware variability make exact replication unachievable, similar to how different binaries perform equivalently despite differing checksums.</li>



<li>@<a target="_blank" href="https://discuss.opensource.org/u/samj">samj</a> <a target="_blank" href="https://discuss.opensource.org/t/draft-v-0-0-9-of-the-open-source-ai-definition-is-available-for-comments/513/27">notes</a> the existence of models like StarCoder2 and OLMo as examples of Open Source AI, refuting the claim that no models meet the standard. He stresses the need for the definition to encourage the development of new models rather than settling for an inadequate status quo.</li>
</ul>



<h2 class="wp-block-heading"><a target="_blank" href="https://discuss.opensource.org/t/case-in-point-zuckerbergs-blog-on-open-source/579">Case-in-Point: Zuckerberg’s blog on Open Source</a></h2>



<ul class="wp-block-list">
<li>@<a target="_blank" href="https://discuss.opensource.org/u/kjetilk">kjetilk</a> <a target="_blank" href="https://discuss.opensource.org/t/case-in-point-zuckerbergs-blog-on-open-source/579">reflects</a> on Mark Zuckerberg&#8217;s blog post about Llama 3.1, where Zuckerberg claims that &#8220;Open Source AI Is the Path Forward.&#8221; He points out that while it’s easy to agree with Zuckerberg’s sentiment, Llama 3.1 isn&#8217;t truly open source and wouldn&#8217;t meet the criteria for compliance under the OSAID. This raises important questions about how to engage with Meta: should the open-source community push them away, or guide them toward creating OSAID-compliant models? Furthermore, @<a target="_blank" href="https://discuss.opensource.org/u/kjetilk">kjetilk </a>wonders how this affects perceptions of open source, especially in light of EU legislation and the broader governance issues around open source.</li>



<li>@<a target="_blank" href="https://discuss.opensource.org/u/shujisado">shujisado</a> <a target="_blank" href="https://discuss.opensource.org/t/case-in-point-zuckerbergs-blog-on-open-source/579/2">responds</a> by noting that the Open Source Initiative (OSI) has already made it clear that Llama 2 (and by extension Llama 3.1) does not meet the Open Source definition, despite Zuckerberg’s claims. He suggests that Zuckerberg might be using a different definition of &#8220;open source,&#8221; particularly given the unclear legal landscape around AI training data and copyright. In his view, the creation of the Open Source AI Definition (OSAID) is the community&#8217;s formal response to Meta’s claims.</li>
</ul>



<h2 class="wp-block-heading"><a target="_blank" href="https://discuss.opensource.org/t/open-source-ai-definition-town-hall-september-20-2024/575">Open Source AI Definition Town Hall &#8211; September 20, 2024</a></h2>



<ul class="wp-block-list">
<li>The seventeenth edition of our town hall meetings was held on the 20th of September. If you missed it, <a target="_blank" href="https://discuss.opensource.org/t/open-source-ai-definition-town-hall-september-20-2024/575/2">the recording and slides can be found here</a>.</li>
</ul>
]]></content:encoded>
					
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">72633</post-id>	</item>
		<item>
		<title>Open Source AI Definition &#8211; Weekly update september 16</title>
		<link>https://opensource.org/blog/open-source-ai-definition-weekly-update-september-16</link>
		
		<dc:creator><![CDATA[Mia Lykou Lund]]></dc:creator>
		<pubDate>Mon, 16 Sep 2024 23:38:08 +0000</pubDate>
				<category><![CDATA[News]]></category>
		<category><![CDATA[ai]]></category>
		<category><![CDATA[Deep Dive: AI]]></category>
		<guid isPermaLink="false">https://opensource.org/?p=72100</guid>

					<description><![CDATA[Stay updated on what's happening in the forums!]]></description>
										<content:encoded><![CDATA[
<h1 class="wp-block-heading">Week 37 summary&nbsp;</h1>



<h2 class="wp-block-heading"><a target="_blank" href="https://discuss.opensource.org/t/endorse-the-open-source-ai-definition/570">Endorse the Open Source AI Definition</a></h2>



<ul class="wp-block-list">
<li><a target="_blank" href="https://discuss.opensource.org/t/endorse-the-open-source-ai-definition/570">OSI invites individuals and organizations to endorse the Open Source AI Definition (OSAID)</a>. Endorsers will have their name and affiliation listed in the press release for Release Candidate 1 (RC1), which is expected to be finalized by the end of September. Those endorsing version 0.0.9 will be contacted again to confirm their support if there are any changes leading up to RC1.</li>
</ul>



<h2 class="wp-block-heading"><a target="_blank" href="https://discuss.opensource.org/t/recommended-resources-us-copyright-office-guidance-on-tdm/565">Recommended Resources: US Copyright Office Guidance on TDM</a></h2>



<ul class="wp-block-list">
<li>@<a target="_blank" href="https://discuss.opensource.org/u/mjbommar">mjbommar</a> <a target="_blank" href="https://discuss.opensource.org/t/recommended-resources-us-copyright-office-guidance-on-tdm/565">encourages</a> reviewing the U.S. Copyright Office&#8217;s guidance on text and data mining (TDM) exceptions, which provides clear explanations and limitations, especially focusing on non-commercial, scholarly, and teaching uses. He emphasizes that the TDM guidance operates within narrow parameters that are often misunderstood or overlooked.</li>
</ul>



<h2 class="wp-block-heading"><a target="_blank" href="https://discuss.opensource.org/t/proposal-to-handle-data-openness-in-the-open-source-ai-definition-rfc/561">Proposal to handle Data Openness in the Open Source AI definition [RFC]</a></h2>



<ul class="wp-block-list">
<li>@<a target="_blank" href="https://discuss.opensource.org/u/quaid">quaid</a> <a target="_blank" href="https://discuss.opensource.org/t/proposal-to-handle-data-openness-in-the-open-source-ai-definition-rfc/561">proposes</a> adding nuance to the Open Source AI (OSAI) Definition by introducing two designations: OSAI D+ (with open data) and OSAI D- (without open data, due to legitimate reasons beyond the creator&#8217;s control). He suggests using a dataset certificate of origin (dataset DCO) for self-verification to ensure compliance.</li>



<li>@<a target="_blank" href="https://discuss.opensource.org/u/kjetilk">kjetilk</a> <a target="_blank" href="https://discuss.opensource.org/t/proposal-to-handle-data-openness-in-the-open-source-ai-definition-rfc/561/2">agrees</a> that verification is key but questions whether data information alone is sufficient for verification. He highlights that verifying rights to the data may not always be possible.</li>



<li>@<a target="_blank" href="https://discuss.opensource.org/u/stefano">stefano</a> <a target="_blank" href="https://discuss.opensource.org/t/proposal-to-handle-data-openness-in-the-open-source-ai-definition-rfc/561/3">appreciates</a> the quadrant system&#8217;s clarity and confirms @<a target="_blank" href="https://discuss.opensource.org/u/quaid">quaid</a>’s proposal for OSAI D- to be reserved for those with legitimate reasons for not sharing data.</li>



<li>@<a target="_blank" href="https://discuss.opensource.org/u/thesteve0">thesteve0</a> <a target="_blank" href="https://discuss.opensource.org/t/proposal-to-handle-data-openness-in-the-open-source-ai-definition-rfc/561/7">expresses skepticism</a> about broadening the &#8220;Open Source&#8221; label. He argues that without access to both data and code, AI models cannot truly be Open Source and suggests labeling such models as &#8220;<a href="https://opensource.org/ai/open-weights" data-type="page" data-id="120137">open weights</a>&#8221; instead.</li>



<li>@<a target="_blank" href="https://discuss.opensource.org/u/shujisado">shujisado</a> <a target="_blank" href="https://discuss.opensource.org/t/proposal-to-handle-data-openness-in-the-open-source-ai-definition-rfc/561/8">notes</a> the importance of data access in AI, pointing out that OSAID requires detailed information about how data is sourced, including provenance and selection criteria. He also discusses potential legal and ethical reasons for not sharing datasets.</li>



<li>@<a target="_blank" href="https://discuss.opensource.org/u/Shamar">Shamar</a> <a target="_blank" href="https://discuss.opensource.org/t/proposal-to-handle-data-openness-in-the-open-source-ai-definition-rfc/561/13">raises concerns</a> about &#8220;openwashing&#8221; in AI, where developers might distribute a model with a different dataset, undermining trust. He argues that distinguishing between OSAI D+ and D- risks legal complications for derivative works, suggesting that models without open data should not be considered truly open.</li>



<li>@<a target="_blank" href="https://discuss.opensource.org/u/zack">zack</a> <a target="_blank" href="https://discuss.opensource.org/t/proposal-to-handle-data-openness-in-the-open-source-ai-definition-rfc/561/14">supports the idea</a> of a tiered system (D+ and D-) as an improvement over the current situation, as it incentivizes progress from D- to D+. He is skeptical about verifiability but sees potential in the branding aspect of the proposal.</li>
</ul>



<h2 class="wp-block-heading"><a target="_blank" href="https://discuss.opensource.org/t/welcome-diverse-approaches-to-training-data-within-a-unified-open-source-ai-definition/531">Welcome diverse approaches to training data within a unified Open Source AI Definition</a></h2>



<ul class="wp-block-list">
<li>@<a target="_blank" href="https://discuss.opensource.org/u/stefano">stefano</a> <a target="_blank" href="https://discuss.opensource.org/t/welcome-diverse-approaches-to-training-data-within-a-unified-open-source-ai-definition/531/11">asks</a> @<a target="_blank" href="https://discuss.opensource.org/u/arandal">arandal</a> about suggested edits, which include renaming data as &#8220;source data,&#8221; allowing open-source AI developers to require downstream modifications with open data, and permitting downstream developers to use open data to fine-tune models trained on non-public data. He further asks if arandal compares training data to <a href="https://opensource.org/ai/open-weights" data-type="page" data-id="120137">model weights</a> as source code is to binary code.</li>



<li>@<a target="_blank" href="https://discuss.opensource.org/u/shujisado">shujisado</a> <a target="_blank" href="https://discuss.opensource.org/t/welcome-diverse-approaches-to-training-data-within-a-unified-open-source-ai-definition/531/12">agrees</a> with @<a target="_blank" href="https://discuss.opensource.org/u/stefano">stefano</a> and points out that while many interpret OSD-compliant licenses to include CC4 and CC0, OSI has not officially evaluated Creative Commons licenses for compliance. He highlights concerns about CC0’s patent defense, which could be crucial for datasets.</li>



<li>@<a target="_blank" href="https://discuss.opensource.org/u/mjbommar">mjbommar</a> <a target="_blank" href="https://discuss.opensource.org/t/welcome-diverse-approaches-to-training-data-within-a-unified-open-source-ai-definition/531/13">echoes</a> the concerns about patent defense, noting it as a critical issue in both software and data licensing.</li>



<li>@<a target="_blank" href="https://discuss.opensource.org/u/Shamar">Shamar</a> <a target="_blank" href="https://discuss.opensource.org/t/welcome-diverse-approaches-to-training-data-within-a-unified-open-source-ai-definition/531/14">supports</a> the first two suggestions but argues that models trained on non-public data cannot meet an &#8220;Open Source AI&#8221; definition, as they limit the freedom to study and modify, which are core principles of Open Source.</li>
</ul>



<h2 class="wp-block-heading"><a target="_blank" href="https://discuss.opensource.org/t/on-the-current-definition-of-open-source-ai-and-the-state-of-the-data-commons/559">On the current definition of Open Source AI and the state of the data commons</a></h2>



<ul class="wp-block-list">
<li>@<a target="_blank" href="https://discuss.opensource.org/u/nick">nick</a> <a target="_blank" href="https://discuss.opensource.org/t/on-the-current-definition-of-open-source-ai-and-the-state-of-the-data-commons/559">shares an article</a> by Nathan Lambert, reviewed by key figures in the Open Source AI space, discussing the challenges of training data and the current Open Source AI definition. <a target="_blank" href="https://discuss.opensource.org/t/on-the-current-definition-of-open-source-ai-and-the-state-of-the-data-commons/559/2">@Percy Liang (on X)</a> view is highlighted, where he suggests that releasing an entire dataset is neither sufficient nor necessary for Open Source AI. He emphasizes the need for detailed code of the data processing pipeline for transparency, beyond just releasing the dataset.</li>



<li>@<a target="_blank" href="https://discuss.opensource.org/u/shujisado">shujisado</a> <a target="_blank" href="https://discuss.opensource.org/t/on-the-current-definition-of-open-source-ai-and-the-state-of-the-data-commons/559/3">discusses</a> the legal nuances of using U.S. government documents in AI training, emphasizing that while they may be used in the U.S., legal complications arise in other jurisdictions.</li>



<li>@<a target="_blank" href="https://discuss.opensource.org/u/Shamar">Shamar</a> <a target="_blank" href="https://discuss.opensource.org/t/on-the-current-definition-of-open-source-ai-and-the-state-of-the-data-commons/559/6">stresses</a> that Open Source AI should provide all the necessary data and processing information to recreate a system, otherwise, calling it Open Source is &#8220;open washing.&#8221;</li>
</ul>



<h2 class="wp-block-heading"><a target="_blank" href="https://discuss.opensource.org/t/rfc-separating-concerns-between-source-data-and-processing-information/568">[RFC] Separating concerns between Source Data and Processing Information</a></h2>



<ul class="wp-block-list">
<li>@<a target="_blank" href="https://discuss.opensource.org/u/Shamar">Shamar</a> <a target="_blank" href="https://discuss.opensource.org/t/rfc-separating-concerns-between-source-data-and-processing-information/568">proposes</a> a clearer distinction between &#8220;source data&#8221; and &#8220;processing information&#8221; in the Open Source AI definition to ensure transparency and reproducibility. He suggests source data should be publicly available under the same terms that allowed its original use, while the process used to train the system should be shared under an Open Source license. His formulation aims to prevent loopholes that could lead to open-washing and emphasizes the importance of granting all four freedoms (study, modify, distribute, and use) to qualify as Open Source AI.</li>



<li>@<a target="_blank" href="https://discuss.opensource.org/u/nick">nick</a> <a target="_blank" href="https://discuss.opensource.org/t/rfc-separating-concerns-between-source-data-and-processing-information/568/2">disagrees</a>, arguing that @<a target="_blank" href="https://discuss.opensource.org/u/Shamar">Shamar</a> proposal misunderstands the difference between the rights to use data for training and the rights to distribute it. He also challenges the claim that exact replication of AI systems can be guaranteed, even with access to the same data.</li>
</ul>



<h2 class="wp-block-heading"><a target="_blank" href="https://discuss.opensource.org/t/open-source-ai-definition-town-hall-september-13-2024/562">Open Source AI Definition Town Hall &#8211; September 13, 2024</a></h2>



<ul class="wp-block-list">
<li>The sixteenth edition of our town hall meetings was held on the 13th of September. If you missed it, <a target="_blank" href="https://discuss.opensource.org/t/open-source-ai-definition-town-hall-september-13-2024/562/2">the recording and slides can be found here</a>.</li>
</ul>



<p></p>
]]></content:encoded>
					
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">72100</post-id>	</item>
		<item>
		<title>Open Source AI Definition &#8211; Weekly update September 9</title>
		<link>https://opensource.org/blog/open-source-ai-definition-weekly-update-september-9</link>
		
		<dc:creator><![CDATA[Mia Lykou Lund]]></dc:creator>
		<pubDate>Mon, 09 Sep 2024 17:02:46 +0000</pubDate>
				<category><![CDATA[News]]></category>
		<category><![CDATA[ai]]></category>
		<category><![CDATA[Deep Dive: AI]]></category>
		<guid isPermaLink="false">https://opensource.org/?p=71683</guid>

					<description><![CDATA[Read all about what happened on the forums this week!]]></description>
										<content:encoded><![CDATA[
<p>Week 36 summary&nbsp;</p>



<h2 class="wp-block-heading"><a target="_blank" href="https://discuss.opensource.org/t/draft-v-0-0-9-of-the-open-source-ai-definition-is-available-for-comments/513">Draft v.0.0.9 of the Open Source AI Definition is available for comments</a></h2>



<ul class="wp-block-list">
<li>-@<a target="_blank" href="https://discuss.opensource.org/u/Shamar">Shamar</a> <a target="_blank" href="https://discuss.opensource.org/t/draft-v-0-0-9-of-the-open-source-ai-definition-is-available-for-comments/513/11">agrees with</a> @<a target="_blank" href="https://discuss.opensource.org/u/thesteve0">thesteve0</a> and emphasizes that AI systems consist of two parts: a virtual machine (architecture) and the <a href="https://opensource.org/ai/open-weights" data-type="page" data-id="120137">weights</a> (the executable software). He argues that while weights are important, they are not sufficient to study or fully understand an AI model. For a system to be truly Open Source, it must provide all the data used to recreate an exact copy of the model, including random values used during the process. Without this, the system should not be labeled Open Source, even if the weights are available under an open-source license. Shamar suggests calling such systems “freeware” instead and ensuring the Open Source AI Definition aligns with the Open Source Definition.</li>



<li>@<a target="_blank" href="https://discuss.opensource.org/u/jberkus">jberkus</a> <a target="_blank" href="https://discuss.opensource.org/t/draft-v-0-0-9-of-the-open-source-ai-definition-is-available-for-comments/513/12">questions</a> whether creating an exact copy of an AI system is truly possible, even with access to all the training data, or if slight differences would always exist.</li>



<li>@<a target="_blank" href="https://discuss.opensource.org/u/shujisado">shujisado</a> <a target="_blank" href="https://discuss.opensource.org/t/draft-v-0-0-9-of-the-open-source-ai-definition-is-available-for-comments/513/13">explains</a> that under Japan&#8217;s copyright law, AI training on publicly available copyrighted works is permissible, but sharing the datasets created during training requires explicit permission from copyright holders. He notes that while AI training within legal limits may be allowed in many jurisdictions, making all training data freely available is unlikely. He adds that the current Open Source AI Definition strikes a reasonable balance given global intellectual property rights but suggests that more specific language might help clarify this further.</li>
</ul>



<h2 class="wp-block-heading"><a target="_blank" href="https://discuss.opensource.org/t/share-your-thoughts-about-draft-v0-0-9/514">Share your thoughts about draft v0.0.9</a></h2>



<ul class="wp-block-list">
<li><strong>@</strong><a target="_blank" href="https://discuss.opensource.org/u/marianataglio">marianataglio</a> <a target="_blank" href="https://discuss.opensource.org/t/share-your-thoughts-about-draft-v0-0-9/514/8">suggests including hardware specifications</a>, training time, and carbon footprint in the Open Source AI Definition to improve transparency. She believes this would enhance reproducibility, accessibility, and collaboration, while helping practitioners estimate computational costs and optimize models for more efficient training.</li>
</ul>



<h2 class="wp-block-heading"><a target="_blank" href="https://discuss.opensource.org/t/open-source-ai-definition-town-hall-september-6-2004/539">Open Source AI Definition Town Hall &#8211; September 6, 2004</a></h2>



<ul class="wp-block-list">
<li>The fifthteenth edition of our town hall meetings was held on the 6th of September. If you missed it, the <a target="_blank" href="https://discuss.opensource.org/t/open-source-ai-definition-town-hall-september-6-2004/539/2">recording and slides can be found here.</a></li>
</ul>



<h2 class="wp-block-heading"><a target="_blank" href="https://discuss.opensource.org/t/welcome-diverse-approaches-to-training-data-within-a-unified-open-source-ai-definition/531">Welcome diverse approaches to training data within a unified Open Source AI Definition</a></h2>



<ul class="wp-block-list">
<li>@<a target="_blank" href="https://discuss.opensource.org/u/Alek_Tarkowski">Alek_Tarkowski</a> <a target="_blank" href="https://discuss.opensource.org/t/welcome-diverse-approaches-to-training-data-within-a-unified-open-source-ai-definition/531/6">agrees with</a> @<a target="_blank" href="https://discuss.opensource.org/u/arandal">arandal</a> on the importance of situating Open Source AI within broader open movements like open data. He suggests cooperation with organizations like Creative Commons should go beyond licensing standards to include data governance, which remains an undeveloped area.</li>



<li>@<a target="_blank" href="https://discuss.opensource.org/u/Alek_Tarkowski">Alek_Tarkowski</a> <a target="_blank" href="https://discuss.opensource.org/t/welcome-diverse-approaches-to-training-data-within-a-unified-open-source-ai-definition/531/7">finds the idea of requiring source data to follow Open Source licenses conceptually interesting</a>, likening it to &#8220;upstream copyleft,&#8221; but notes traditional copyleft frameworks may not suit AI development.</li>



<li>@<a target="_blank" href="https://discuss.opensource.org/u/arandal">arandal</a> <a target="_blank" href="https://discuss.opensource.org/t/welcome-diverse-approaches-to-training-data-within-a-unified-open-source-ai-definition/531/8">clarifies that the proposal</a> is an evolution of software freedom principles, not a direct extension of traditional copyleft, similar to how AGPL addressed gaps left by earlier licenses. <a target="_blank" href="https://discuss.opensource.org/t/welcome-diverse-approaches-to-training-data-within-a-unified-open-source-ai-definition/531/10">They further mention</a> that discussions on these approaches are ongoing across various organizations, though formal publications are limited.</li>
</ul>



<h2 class="wp-block-heading"><a target="_blank" href="https://discuss.opensource.org/t/explaining-the-concept-of-data-information/401">Explaining the concept of Data information</a></h2>



<ul class="wp-block-list">
<li>@<a target="_blank" href="https://discuss.opensource.org/u/Senficon">Senficon</a> <a target="_blank" href="https://discuss.opensource.org/t/explaining-the-concept-of-data-information/401/40">highlights a concern</a> from the open science community that, while EU copyright law allows reproductions of protected content for research, it restricts making the research corpus available to third parties. This limits research reproducibility and open access, as it aims to protect rights holders&#8217; revenue.</li>



<li>@<a target="_blank" href="https://discuss.opensource.org/u/kjetilk">kjetilk</a> <a target="_blank" href="https://discuss.opensource.org/t/explaining-the-concept-of-data-information/401/41">agrees with the observation</a> but questions the assumption that making content publicly available would significantly harm rights holders&#8217; revenue. He believes such policies should be based on solid evidence from extensive research.</li>
</ul>
]]></content:encoded>
					
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">71683</post-id>	</item>
		<item>
		<title>Open Source AI Definition &#8211; Weekly update September 2nd</title>
		<link>https://opensource.org/blog/open-source-ai-definition-weekly-update-september-2nd</link>
					<comments>https://opensource.org/blog/open-source-ai-definition-weekly-update-september-2nd#comments</comments>
		
		<dc:creator><![CDATA[Mia Lykou Lund]]></dc:creator>
		<pubDate>Mon, 02 Sep 2024 14:17:42 +0000</pubDate>
				<category><![CDATA[News]]></category>
		<category><![CDATA[ai]]></category>
		<category><![CDATA[Deep Dive: AI]]></category>
		<guid isPermaLink="false">https://opensource.org/?p=71328</guid>

					<description><![CDATA[Stay up to date as we approach the final phases of creating the first-ever open source AI definition!]]></description>
										<content:encoded><![CDATA[
<h2 class="wp-block-heading"><a target="_blank" href="https://discuss.opensource.org/t/share-your-thoughts-about-draft-v0-0-9/514">Share your thoughts about draft v0.0.9</a></h2>



<ul class="wp-block-list">
<li>@<a target="_blank" href="https://discuss.opensource.org/u/mkai">mkai</a> <a target="_blank" href="https://discuss.opensource.org/t/share-your-thoughts-about-draft-v0-0-9/514/3?u=mia">added </a>concerns about how OSI will address AI-generated content from both open and closed source models, given current legal rulings that such content cannot be copyrighted. He also suggests clarifying the difference between licenses for AI model parameters and the model itself within the Open Source AI Definition.</li>



<li>@<a target="_blank" href="https://discuss.opensource.org/u/shujisado">shujisado</a> <a target="_blank" href="https://discuss.opensource.org/t/share-your-thoughts-about-draft-v0-0-9/514/4?u=mia">added</a> that while media coverage of the OSAID v0.0.9 release is encouraging, he is not supportive of the idea of an enforcement mechanism to flag false open source AI. He believes this approach differs from OSI’s traditional stance and suggests it may be a misunderstanding.</li>



<li>@<a target="_blank" href="https://discuss.opensource.org/u/jplorre">jplorre</a> <a target="_blank" href="https://discuss.opensource.org/t/share-your-thoughts-about-draft-v0-0-9/514/6?u=mia">added</a> that while LINAGORA supports the proposed definition, they propose clarifying the term &#8220;equivalent system&#8221; to mean systems that produce the same outputs given identical inputs. They also suggest removing the specific reference to &#8220;tokenizers&#8221; in the definition, as it may not apply to all AI systems.
<ul class="wp-block-list">
<li>@<a target="_blank" href="https://discuss.opensource.org/u/shujisado">shujisado</a> <a target="_blank" href="https://discuss.opensource.org/t/share-your-thoughts-about-draft-v0-0-9/514/7?u=mia">agreed</a> with the need for clarification on &#8220;equivalent system&#8221; but noted that identical outputs cannot always be guaranteed in general LLMs. He suggests that this clarification might be better suited for the checklist rather than the OSAID itself</li>
</ul>
</li>
</ul>



<p><a target="_blank" href="https://discuss.opensource.org/t/draft-v-0-0-9-of-the-open-source-ai-definition-is-available-for-comments/513">Draft v.0.0.9 of the Open Source AI Definition is available for comments</a></p>



<ul class="wp-block-list">
<li>@<a target="_blank" href="https://discuss.opensource.org/u/adafruit">adafruit</a> reconnects with @<a target="_blank" href="https://discuss.opensource.org/u/webmink">webmink</a> and <a target="_blank" href="https://discuss.opensource.org/t/draft-v-0-0-9-of-the-open-source-ai-definition-is-available-for-comments/513/8?u=mia">proposes </a>updates to the Open Source AI Definition, including adding requirements for prompt transparency and data access during AI training. These updates aim to enhance the ability to audit, replicate, and modify AI models by providing detailed logs, documentation, and public access to prompts used during the training phase.
<ul class="wp-block-list">
<li>@<a target="_blank" href="https://discuss.opensource.org/u/webmink">webmink</a> <a target="_blank" href="https://discuss.opensource.org/t/draft-v-0-0-9-of-the-open-source-ai-definition-is-available-for-comments/513/9?u=mia">appreciates</a> the proposal but points out that it seems specific to a single approach, suggesting that it may need broader applicability.</li>
</ul>
</li>



<li>@<a target="_blank" href="https://discuss.opensource.org/u/thesteve0">thesteve0</a> <a target="_blank" href="https://discuss.opensource.org/t/draft-v-0-0-9-of-the-open-source-ai-definition-is-available-for-comments/513/10?u=mia">criticizes</a> the current definition, arguing that it does not grant true freedom to modify AI models because the <a href="https://opensource.org/ai/open-weights" data-type="page" data-id="120137">weights</a>, which are essential for using the model, cannot be reproduced without access to both the original data and code. He suggests that models sharing only their weights, especially when built on proprietary data, should be labeled as &#8220;open weights&#8221; rather than &#8220;open source.&#8221; He also expresses concern about the misuse of the &#8220;open source&#8221; label by some AI models, citing specific examples where the term is being abused.</li>
</ul>



<h2 class="wp-block-heading"><a target="_blank" href="https://discuss.opensource.org/t/open-washing-and-unspoken-assumptions-of-oss/515">Open-washing and unspoken assumptions of OSS</a></h2>



<ul class="wp-block-list">
<li>@<a target="_blank" href="https://discuss.opensource.org/u/pranesh">pranesh</a> <a target="_blank" href="https://discuss.opensource.org/t/open-washing-and-unspoken-assumptions-of-oss/515/4?u=mia">added</a> that it might be helpful to explicitly state that the governance of open-source AI is out of scope for OSAID, but also notes that neither the OSD nor the free software definition explicitly mention governance, so it may not be necessary.</li>



<li>@<a target="_blank" href="https://discuss.opensource.org/u/kjetilk">kjetilk</a> <a target="_blank" href="https://discuss.opensource.org/t/open-washing-and-unspoken-assumptions-of-oss/515/9?u=mia">added</a> that while governance issues have traditionally been unspoken, this unspoken nature is a key problem that needs addressing. He suggests that OSI should explicitly declare governance out of scope to allow others to take on this responsibility.</li>



<li>@<a target="_blank" href="https://discuss.opensource.org/u/mjbommar">mjbommar</a> <a target="_blank" href="https://discuss.opensource.org/t/open-washing-and-unspoken-assumptions-of-oss/515/6?u=mia">added</a> support for making an official statement that OSI does not intend to control governance, noting concerns that some might fear OSI is moving towards a walled governance approach. He references past regrets about not controlling the &#8220;open source&#8221; trademark as a means to combat open-washing.</li>



<li>@<a target="_blank" href="https://discuss.opensource.org/u/nick">nick</a> <a target="_blank" href="https://discuss.opensource.org/t/open-washing-and-unspoken-assumptions-of-oss/515/7?u=mia">added</a> assurance that OSI has no intention of creating a walled governance garden, reaffirming the organization&#8217;s long-standing position against such control.</li>



<li>@<a target="_blank" href="https://discuss.opensource.org/u/shujisado">shujisado</a> <a target="_blank" href="https://discuss.opensource.org/t/open-washing-and-unspoken-assumptions-of-oss/515/10?u=mia">added</a> that there seems to be a consensus within the OSAID process that governance is out of scope, and notes that related statements have already been moved to the <a target="_blank" href="https://hackmd.io/@opensourceinitiative/osaid-faq">FAQ section</a> in recent versions.</li>
</ul>



<h2 class="wp-block-heading"><a target="_blank" href="https://discuss.opensource.org/t/explaining-the-concept-of-data-information/401">Explaining the concept of Data information</a></h2>



<ul class="wp-block-list">
<li>@<a target="_blank" href="https://discuss.opensource.org/u/pranesh">pranesh</a> <a target="_blank" href="https://discuss.opensource.org/t/explaining-the-concept-of-data-information/401/32?u=mia">mentions</a> that, from a legal perspective, the percentage of infringement matters, citing the &#8220;de minimis&#8221; doctrine and defenses like &#8220;fair use&#8221; that consider the amount and purpose of infringement. He emphasizes that copyright laws in different jurisdictions vary, and not all recognize the same defenses as in the US.</li>
</ul>



<ul class="wp-block-list">
<li>@<a target="_blank" href="https://discuss.opensource.org/u/mjbommar">mjbommar</a> <a target="_blank" href="https://discuss.opensource.org/t/explaining-the-concept-of-data-information/401/33?u=mia">argues</a> that the scale and nature of AI outputs make the &#8220;de minimis&#8221; defense irrelevant, especially when AI models generate significant amounts of copyrighted content. He stresses that the economic impact of AI-generated content is a key factor in determining whether it qualifies as transformative or infringes copyright.</li>



<li>@<a target="_blank" href="https://discuss.opensource.org/u/shujisado">shujisado</a> <a target="_blank" href="https://discuss.opensource.org/t/explaining-the-concept-of-data-information/401/36?u=mia">highlights</a> that in Japan, using copyrighted works for AI training is generally treated as an exception under copyright law, a stance that is also being adopted by neighboring East Asian countries. He suggests that approaches like the EU Directive are unlikely to become mainstream in Asia.</li>



<li>@<a target="_blank" href="https://discuss.opensource.org/u/mjbommar">mjbommar</a><a target="_blank" href="https://discuss.opensource.org/t/explaining-the-concept-of-data-information/401/37?u=mia"> acknowledges</a> the global focus on US/EU laws but points out that many commonly used models are developed by Western organizations. He questions how Japan&#8217;s updated copyright laws align with international treaties like WCT/DMCA, expressing concern that they may allow practices that conflict with these agreements.
<ul class="wp-block-list">
<li>@<a target="_blank" href="https://discuss.opensource.org/u/shujisado">shujisado</a> <a target="_blank" href="https://discuss.opensource.org/t/explaining-the-concept-of-data-information/401/38?u=mia">responds</a> by stating that Japan&#8217;s copyright laws, including Article 30-4, were carefully crafted to comply with international standards, such as the Berne Convention and the WIPO Copyright Treaty, ensuring that they meet the required legal frameworks.</li>
</ul>
</li>
</ul>



<h2 class="wp-block-heading"><a target="_blank" href="https://discuss.opensource.org/t/welcome-diverse-approaches-to-training-data-within-a-unified-open-source-ai-definition/531">Welcome diverse approaches to training data within a unified Open Source AI Definition</a></h2>



<ul class="wp-block-list">
<li>@<a target="_blank" href="https://discuss.opensource.org/u/arandal">arandal</a><a target="_blank" href="https://discuss.opensource.org/t/welcome-diverse-approaches-to-training-data-within-a-unified-open-source-ai-definition/531?u=mia"> emphasizes</a> the importance of the Open Source Definition (OSD) as a unifying framework that accommodates diverse approaches within the open-source community. She argues that AI models, being a combination of source code and training data, should have their diversity in handling data explicitly recognized in the Open Source AI Definition. She proposes specific text changes to the draft to clarify that while some developers may be comfortable with proprietary data, others may not, and both approaches should be supported to ensure the long-term success of open-source AI.</li>



<li>@<a target="_blank" href="https://discuss.opensource.org/u/mjbommar">mjbommar</a> <a target="_blank" href="https://discuss.opensource.org/t/welcome-diverse-approaches-to-training-data-within-a-unified-open-source-ai-definition/531/2?u=mia">appreciates</a> the spirit of Arandal’s proposal but adds that the OSI currently lacks specific licenses for data, which is why it is crucial for the OSI to collaborate with Creative Commons. Creative Commons maintains the ecosystem of &#8220;data licenses&#8221; that would be necessary under the proposed revisions to the Open Source AI Definition.</li>



<li>@<a target="_blank" href="https://discuss.opensource.org/u/arandal">arandal</a> <a target="_blank" href="https://discuss.opensource.org/t/welcome-diverse-approaches-to-training-data-within-a-unified-open-source-ai-definition/531/3?u=mia">agrees</a> with the need for collaboration with organizations like Creative Commons, noting that this coordination is already reflected in checklist v. 0.0.9. She suggests that such collaboration is necessary even without the proposed revisions to ensure the definition accurately addresses data licensing in AI.</li>



<li>@<a target="_blank" href="https://discuss.opensource.org/u/nick">nick</a> <a target="_blank" href="https://discuss.opensource.org/t/welcome-diverse-approaches-to-training-data-within-a-unified-open-source-ai-definition/531/4?u=mia">acknowledges</a> the importance of working with organizations like Creative Commons and mentions that OSI is in ongoing communication with several relevant organizations, including MLCommons, the Open Future Foundation, and the Data and Trust Alliance. He highlights the recent publication of the Data Provenance Standards by the Data and Trust Alliance as an example of the kind of collaborative work that is being pursued.</li>



<li>@<a target="_blank" href="https://discuss.opensource.org/u/mjbommar">mjbommar</a> <a target="_blank" href="https://discuss.opensource.org/t/welcome-diverse-approaches-to-training-data-within-a-unified-open-source-ai-definition/531/5?u=mia">reiterates</a> the need for explicit coordination with Creative Commons, arguing that the OSI cannot realistically finalize the Open Source AI Definition without such collaboration. He also suggests that the OSI should explore AI preference signaling and work with Creative Commons and SPDX/LF to establish shared standards, which should be part of the OSAID standard’s roadmap.</li>
</ul>



<p>Join this week&#8217;s town hall to hear the latest developments, give your comments and ask questions.</p>



<div class="wp-block-buttons is-content-justification-center is-layout-flex wp-container-core-buttons-is-layout-a89b3969 wp-block-buttons-is-layout-flex">
<div class="wp-block-button"><a class="wp-block-button__link wp-element-button">Register for the townall</a></div>
</div>
]]></content:encoded>
					
					<wfw:commentRss>https://opensource.org/blog/open-source-ai-definition-weekly-update-september-2nd/feed</wfw:commentRss>
			<slash:comments>1</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">71328</post-id>	</item>
		<item>
		<title>Open Source AI Definition &#8211; Weekly update August 26</title>
		<link>https://opensource.org/blog/open-source-ai-weekly-update-august-26</link>
		
		<dc:creator><![CDATA[Mia Lykou Lund]]></dc:creator>
		<pubDate>Mon, 26 Aug 2024 19:33:35 +0000</pubDate>
				<category><![CDATA[News]]></category>
		<category><![CDATA[ai]]></category>
		<category><![CDATA[Deep Dive: AI]]></category>
		<guid isPermaLink="false">https://opensource.org/?p=70925</guid>

					<description><![CDATA[With the 0.0.9 draft definition published this week, we are moving closer to the first-ever definition of Open Source AI. Find out what happened this week and how you can get involved!]]></description>
										<content:encoded><![CDATA[
<h1 class="wp-block-heading">Week 34 summary&nbsp;</h1>



<h2 class="wp-block-heading"><a target="_blank" href="https://discuss.opensource.org/t/share-your-thoughts-about-draft-v0-0-9/514">Share your thoughts about draft v0.0.9</a></h2>



<p>As we move toward the release of the first-ever Open Source AI Definition in October at <a target="_blank" href="https://2024.allthingsopen.org/event-overview">All Things Open</a>, the publication of the 0.0.9 draft brings us one step closer to realizing this goal.</p>



<ul class="wp-block-list">
<li>OSAID 0.0.9 draft definition <a target="_blank" href="https://hackmd.io/@opensourceinitiative/osaid-0-0-9">is live</a>!&nbsp;</li>
</ul>



<ul class="wp-block-list">
<li><strong>&nbsp;&nbsp;Changelog includes:</strong>
<ul class="wp-block-list">
<li><strong>New Feature: Clarified Open Source Models and Weights</strong>
<ul class="wp-block-list">
<li>Added a new paragraph under &#8220;What is Open Source AI&#8221; to define &#8220;system&#8221; as including both models and weights.</li>



<li>Clarified that all components of a larger system must meet the standard.</li>



<li>Updated paragraph after the “share” bullet to emphasize this point.&nbsp;&nbsp;</li>
</ul>
</li>



<li><strong>New Section: Open Source Models and Open Source Weights</strong>
<ul class="wp-block-list">
<li>Added descriptions of components for both models and weights in machine learning systems.</li>



<li>Edited subsequent paragraphs to eliminate redundancy.</li>
</ul>
</li>



<li><strong>Training Data: Defined as a Benefit, Not a Requirement</strong>
<ul class="wp-block-list">
<li>Defined open, public, and unshareable non-public training data.</li>



<li>Explained the role of training data in studying AI systems and understanding biases.</li>



<li>Emphasized extra requirements for data to advance openness, especially in private-first areas like healthcare.</li>
</ul>
</li>



<li><strong>Separation of Checklist</strong>
<ul class="wp-block-list">
<li>The Checklist is now a separate document from the main Definition.</li>



<li>Fully aligned Checklist content with the Model Openness Framework (MOF).</li>
</ul>
</li>



<li><strong>Terminology Changes</strong>
<ul class="wp-block-list">
<li>Replaced &#8220;Model&#8221; with &#8220;Weights&#8221; under &#8220;Preferred form to make modifications&#8221; for consistency.</li>
</ul>
</li>



<li><strong>Explicit Reference to Recipients of the Four Freedoms</strong>
<ul class="wp-block-list">
<li>Added specific references to developers, deployers, and end users of AI systems.</li>
</ul>
</li>



<li><strong>Credits and References</strong>
<ul class="wp-block-list">
<li>Incorporated credit to the Free Software Definition.</li>



<li>Added references to conditions of availability of components, referencing the Open Source Definition.</li>
</ul>
</li>
</ul>
</li>
</ul>



<ul class="wp-block-list">
<li><strong>Initial reactions on the forum:&nbsp;</strong>
<ul class="wp-block-list">
<li>@<a target="_blank" href="https://discuss.opensource.org/u/shujisado">shujisado</a><a target="_blank" href="https://discuss.opensource.org/t/share-your-thoughts-about-draft-v0-0-9/514/2?u=mia"> praises the updates in version 0.0.9</a>, particularly the decision to separate the checklist from the main document, which clarifies the intent behind OSAID. He also supports the separation of &#8220;code&#8221; and &#8220;weights,&#8221; noting that in Japan, &#8220;code&#8221; clearly falls under copyright, making this distinction logical. He acknowledges revisions in the checklist that consider the importance of complete datasets, even though he disagrees with making datasets mandatory.&nbsp;</li>
</ul>
</li>
</ul>



<ul class="wp-block-list">
<li><strong>Comments on the draft on HackMD</strong>
<ul class="wp-block-list">
<li>@<a target="_blank" href="https://hackmd.io/@qsR4K0jlQsSfwmyD0P2rUA">Joshua Gay</a> adds that instead of narrowing the focus to machine-learning systems, the emphasis should be on &#8220;parameters&#8221; as a whole since weights are just one type of parameter. He suggests a rewrite that highlights making model parameters, such as weights and other settings, available under OSI-approved terms, with examples across various AI models.
<ul class="wp-block-list">
<li>He further suggests using broader language that covers more AI systems instead of narrower terminology. Specifically, he proposes replacing &#8220;Open Source models and Open Source weights&#8221; with &#8220;Open Source models and Open Source parameters,&#8221; and using &#8220;AI systems&#8221; instead of &#8220;machine learning systems.&#8221; Additionally, he recommends redefining an AI model to include architecture, parameters like weights and decision boundaries, and inference code, while referring to AI parameters as configuration settings that produce outputs from inputs.</li>
</ul>
</li>



<li>Under “Open Source models and Open Source weights”, @<a target="_blank" href="https://hackmd.io/@shujisado">shujisado</a> adds that the last paragraph titled &#8220;Open Source models and Open Source weights&#8221; actually explains &#8220;AI model&#8221; and &#8220;AI weights,&#8221; leading to a mismatch between the title and content, and notes that these terms are not used elsewhere in the definition.</li>



<li>Under “Preferred form to make modifications to machine-learning systems”, @<a target="_blank" href="https://hackmd.io/@shujisado">shujisado</a> suggests some grammatical corrections.</li>
</ul>
</li>
</ul>



<ul class="wp-block-list">
<li><strong>Next steps</strong>
<ul class="wp-block-list">
<li><strong>The OSI has recently presented at the following events</strong>:<strong> </strong>
<ul class="wp-block-list">
<li>Hong Kong for <a href="https://opensource.org/events/ai_dev-hong-kong">AI_dev</a>, August 21-23</li>



<li>Beijing for <a href="https://opensource.org/events/open-source-congress">Open Source Congress</a>, August 25-27.</li>
</ul>
</li>



<li><strong>Iterate Drafts</strong>: Continue refining drafts with feedback from the worldwide roadshow, considering new dissenting opinions.</li>



<li><strong>Review Licenses</strong>: Decide on the best approach for reviewing new licenses for datasets, documentation, and model parameters.</li>



<li><strong>Enhance FAQ</strong>: <a target="_blank" href="https://hackmd.io/@opensourceinitiative/osaid-faq">Continue improving the FAQ</a> to address emerging questions.</li>



<li><strong>Post-Stable Release Plan</strong>: Establish a process for reviewing and updating future versions of the Open Source AI Definition.</li>
</ul>
</li>
</ul>



<ul class="wp-block-list">
<li><strong>Get involved:&nbsp;</strong>
<ul class="wp-block-list">
<li>Join the <a target="_blank" href="https://discuss.opensource.org/">forum</a> and share your opinion.</li>



<li>Leave a comment on the <a target="_blank" href="https://hackmd.io/@opensourceinitiative/osaid-0-0-9">draft v.0.0.9</a> with precise feedback.</li>



<li>Follow the <a href="https://opensource.org/blog/author/mia-lykoulund">weekly recaps</a> and subscribe to our monthly <a href="https://opensource.org/newsletter">newsletter</a>.</li>



<li>Join the <a href="https://opensource.org/events/tags/townhall">town hall</a> meetings: we’re increasing the frequency to weekly meetings where you can learn more, ask questions, and share your thoughts. <a href="https://opensource.org/events/open-source-ai-definition-town-hall-2024-09-06">The next is on September 6</a>.</li>



<li>Join the <a href="https://opensource.org/events">workshops and scheduled conferences</a></li>
</ul>
</li>
</ul>



<h2 class="wp-block-heading">&nbsp;<a target="_blank" href="https://discuss.opensource.org/t/explaining-the-concept-of-data-information/401">Explaining the concept of Data information</a></h2>



<ul class="wp-block-list">
<li>&nbsp;@<a target="_blank" href="https://discuss.opensource.org/u/kjetilk">Kjetilk</a> <a target="_blank" href="https://discuss.opensource.org/t/explaining-the-concept-of-data-information/401/29?u=mia">points out</a> the legal distinction between using copyrighted works for AI training (reproduction) and incorporating them into publishable datasets, questioning the fairness of allowing exploitative models without compensation while potentially banning those that benefit society.</li>



<li>@<a target="_blank" href="https://discuss.opensource.org/u/shujisado">Shujisado</a><a target="_blank" href="https://discuss.opensource.org/t/explaining-the-concept-of-data-information/401/30?u=mia">clarifies </a>that compensation for copyrighted works used in AI training is possible for both open source and closed models, distinguishing it from &#8220;royalty,&#8221; and notes that Japan&#8217;s copyright law exempts such uses for machine learning.
<ul class="wp-block-list">
<li>@<a target="_blank" href="https://discuss.opensource.org/u/kjetilk">Kjetilk</a> <a target="_blank" href="https://discuss.opensource.org/t/explaining-the-concept-of-data-information/401/31?u=mia">reiterates </a>the relevance of &#8220;royalty&#8221; for compensation in closed, non-published models, suggesting it makes sense under copyright law if required, but if not, it could benefit science and the arts.</li>
</ul>
</li>
</ul>



<h2 class="wp-block-heading"><a target="_blank" href="https://discuss.opensource.org/t/open-source-ai-definition-town-hall-august-23-2024/510">Open Source AI Definition Town Hall</a></h2>



<ul class="wp-block-list">
<li>The slides and recording from the town hall meeting held on August 23, 2024 are available <a target="_blank" href="https://discuss.opensource.org/t/open-source-ai-definition-town-hall-august-23-2024/510/8">here</a>.</li>



<li>The next town hall meeting will be held on September 6th. Sign up for the event <a href="https://opensource.org/events/open-source-ai-definition-town-hall-2024-09-06">here</a>.</li>
</ul>
]]></content:encoded>
					
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">70925</post-id>	</item>
		<item>
		<title>Open Source AI Definition &#8211; Weekly update July 15</title>
		<link>https://opensource.org/blog/open-source-ai-definition-weekly-update-july-15</link>
					<comments>https://opensource.org/blog/open-source-ai-definition-weekly-update-july-15#comments</comments>
		
		<dc:creator><![CDATA[Mia Lykou Lund]]></dc:creator>
		<pubDate>Mon, 15 Jul 2024 19:26:16 +0000</pubDate>
				<category><![CDATA[News]]></category>
		<category><![CDATA[ai]]></category>
		<category><![CDATA[Deep Dive: AI]]></category>
		<guid isPermaLink="false">https://opensource.org/?p=64577</guid>

					<description><![CDATA[Stay up to date with the Open Source AI Definition]]></description>
										<content:encoded><![CDATA[
<p>It has been quiet over the 4th of July weekend on the forums and OSI has been speaking at different events:</p>



<ul class="wp-block-list">
<li>@<a target="_blank" href="https://discuss.opensource.org/u/stefano">stefano</a> spoke in a panel at the UN event OSPOs for Good. Access the recording <a target="_blank" href="https://discuss.opensource.org/t/osi-at-ospos-for-good-whats-next-for-open-source/456/2?u=mia">here</a>.</li>



<li>@<a target="_blank" href="https://discuss.opensource.org/u/mer/summary">mer</a> is <a target="_blank" href="https://discuss.opensource.org/t/open-source-community-africa/466?u=mia">speaking at Open Source Community Africa</a></li>



<li>OSI was present at the &nbsp;Linux Foundation hosted AI_dev: Open Source GenAI &amp; ML Summit Europe 2024. Read about the takeaways <a target="_blank" href="https://discuss.opensource.org/t/highlights-from-ai-dev-paris/444?u=mia">here</a>.</li>
</ul>



<h2 class="wp-block-heading"><a target="_blank" href="https://discuss.opensource.org/t/why-and-how-to-certify-open-source-ai/349">Why and how to certify Open Source AI</a></h2>



<ul class="wp-block-list">
<li>@<a target="_blank" href="https://discuss.opensource.org/u/jberkus">jberkus</a> expresses concern about the extensive resources required to certify AI systems, estimating that it would take weeks of work per system. This scale makes it impractical for a volunteer committee like License Review.</li>



<li>@<a target="_blank" href="https://discuss.opensource.org/u/shujisado">shujisado</a> reflects on past controversies over license conformity, noting that Open Source AI has the potential for a greater economic impact than early Open Source&#8221; He acknowledges the need for a more robust certification process given this increased significance. He suggests that cooperation from the machine learning community or consortia might be necessary to address technical issues and monitor the certification process neutrally. He offers to help spread the word about OSAID within the Japanese ML/LLM development community.</li>
</ul>



<p>@<a target="_blank" href="https://discuss.opensource.org/u/jberkus">jberkus</a> clarifies that the OSI would need full-time paid staff to handle the certifications, as the work cannot be managed by volunteers alone.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://opensource.org/blog/open-source-ai-definition-weekly-update-july-15/feed</wfw:commentRss>
			<slash:comments>5</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">64577</post-id>	</item>
		<item>
		<title>Open Source AI Definition &#8211; Weekly update July 1</title>
		<link>https://opensource.org/blog/open-source-ai-definition-weekly-update-july-1</link>
					<comments>https://opensource.org/blog/open-source-ai-definition-weekly-update-july-1#comments</comments>
		
		<dc:creator><![CDATA[Mia Lykou Lund]]></dc:creator>
		<pubDate>Mon, 01 Jul 2024 15:48:07 +0000</pubDate>
				<category><![CDATA[News]]></category>
		<category><![CDATA[ai]]></category>
		<category><![CDATA[Deep Dive: AI]]></category>
		<guid isPermaLink="false">https://opensource.org/?p=63554</guid>

					<description><![CDATA[Catch up on the community's discussions about the Open Source AI Definition!]]></description>
										<content:encoded><![CDATA[
<h2 class="wp-block-heading"><a target="_blank" href="https://discuss.opensource.org/t/an-open-call-to-test-openvla/424">An open call to test OpenVLA</a></h2>



<ul class="wp-block-list">
<li>Last week @<a target="_blank" href="https://discuss.opensource.org/u/quaid">quaid</a> <a target="_blank" href="https://discuss.opensource.org/t/an-open-call-to-test-openvla/424?u=mia">suggested</a> conducting a controlled experiment to determine if data information alone is sufficient to recreate an AI model with fidelity to the original. He shared insights from the OpenVLA project, noting its possible compliance with the requirements of draft v0.0.8 and suggesting a test suite to compare models created with full datasets versus data information.
<ul class="wp-block-list">
<li>To this, @Stefano <a target="_blank" href="https://discuss.opensource.org/t/an-open-call-to-test-openvla/424/3?u=mia">noted</a> that there also are some master students at CMU who are conducting similar  experiments to &#8220;kick the tires&#8221; of the draft definition.</li>



<li>@<a target="_blank" href="https://discuss.opensource.org/u/quaid">quaid</a> proposed more precise criteria for evaluating model similarity, such as &#8220;functionally similar&#8221; or &#8220;practically similar&#8221; and further suggested detailing the values sought from open data datasets to improve the experiment’s framework.</li>
</ul>
</li>
</ul>



<h2 class="wp-block-heading"><a target="_blank" href="https://discuss.opensource.org/t/interesting-research-paper-rethinking-open-source-generative-ai-open-washing-and-the-eu-ai-act/429">Interesting research paper: “Rethinking open source generative AI: open-washing and the EU AI Act”</a></h2>



<ul class="wp-block-list">
<li>@<a target="_blank" href="https://discuss.opensource.org/u/hook">hook</a> has <a target="_blank" href="https://discuss.opensource.org/t/interesting-research-paper-rethinking-open-source-generative-ai-open-washing-and-the-eu-ai-act/429?u=mia">shared</a> a research paper they found interesting and relevant tilted <a target="_blank" href="https://dl.acm.org/doi/pdf/10.1145/3630106.3659005">Rethinking open source generative AI: open-washing and the EU AI Act</a>.
<ul class="wp-block-list">
<li>This paper <a target="_blank" href="https://discuss.opensource.org/t/open-source-ai-needs-to-require-data-to-be-viable/351/13?u=mia">has been shared before</a> by its author @<a target="_blank" href="https://discuss.opensource.org/u/Mark/summary">mark</a> and discussed in the context of whether the OSAID should contain a partially open license, arguing that in doing so, open washing would be limited, stating that “ I think providers and users of LLMs should not be free to create oil spills in our information landscape and I think RAIL provides useful guardrails for that.” This would highlight the “degrees of openness”.</li>



<li>They too present their findings in a visualization<a target="_blank" href="https://opening-up-chatgpt.github.io/"> of the degrees of openness</a> of different systems.
<ul class="wp-block-list">
<li>This is a point we have discussed before and note that the OSAID will not be a partially open license but a binary one. <a href="https://opensource.org/blog/open-source-ai-definition-weekly-update-june-10">See week 22 summary for the context of this discussion</a>.&nbsp;</li>
</ul>
</li>
</ul>
</li>
</ul>



<h2 class="wp-block-heading"><a target="_blank" href="https://discuss.opensource.org/t/open-source-ai-definition-town-hall-june-28-2024/425">Open Source AI Definition Town Hall &#8211; June 28, 2024</a></h2>



<ul class="wp-block-list">
<li>We held our 12th town hall meeting last week. You can access the recording and slides <a target="_blank" href="https://discuss.opensource.org/t/open-source-ai-definition-town-hall-june-28-2024/425/2?u=mia">here</a> if you missed it. The town hall presented some ideas for the next draft of the Definition, making it clear that there is no agreement yet on the data information concept and that part is still subject to change.</li>



<li>A new town hall meeting is <a href="https://opensource.org/events/open-source-ai-definition-town-hall-2024-07-12">scheduled for Friday, July 12</a>.</li>
</ul>
]]></content:encoded>
					
					<wfw:commentRss>https://opensource.org/blog/open-source-ai-definition-weekly-update-july-1/feed</wfw:commentRss>
			<slash:comments>1</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">63554</post-id>	</item>
		<item>
		<title>Open Source AI Definition &#8211; Weekly update June 24</title>
		<link>https://opensource.org/blog/open-source-ai-definition-weekly-update-june-24</link>
					<comments>https://opensource.org/blog/open-source-ai-definition-weekly-update-june-24#comments</comments>
		
		<dc:creator><![CDATA[Mia Lykou Lund]]></dc:creator>
		<pubDate>Mon, 24 Jun 2024 19:36:07 +0000</pubDate>
				<category><![CDATA[News]]></category>
		<category><![CDATA[ai]]></category>
		<category><![CDATA[Deep Dive: AI]]></category>
		<guid isPermaLink="false">https://opensource.org/?p=63282</guid>

					<description><![CDATA[This week saw lively debate on the role of data information in AI. Dive into the key points discussed here!]]></description>
										<content:encoded><![CDATA[
<h2 class="wp-block-heading"><a target="_blank" href="https://discuss.opensource.org/t/explaining-the-concept-of-data-information/401">Explaining the concept of Data information</a></h2>



<p>Following @<a target="_blank" href="https://discuss.opensource.org/u/stefano">stefano</a>’s <a href="https://opensource.org/blog/explaining-the-concept-of-data-information">publication</a> regarding why the OSI considers training data to be “optional” under the checklist in <a target="_blank" href="https://hackmd.io/@opensourceinitiative/osaid-0-0-8">Open Source AI Definition</a>, the debate has continued. Here are the main points:</p>



<ul class="wp-block-list">
<li><strong>Preferred Form of Modification</strong></li>
</ul>



<ul class="wp-block-list">
<li>@<a target="_blank" href="https://discuss.opensource.org/u/hartmans">hartmans</a> <a target="_blank" href="https://discuss.opensource.org/t/explaining-the-concept-of-data-information/401/4?u=mia">states</a> finding an agreement on the meaning of &#8220;preferred form of modification&#8221; depends on the user&#8217;s objectives. The disagreement may stem from different priorities in ranking the freedoms associated with open source AI, though they emphasize prioritizing <a href="https://opensource.org/ai/open-weights" data-type="page" data-id="120137">model weights</a> for practical modifications. He suggested that data information could be more beneficial than raw data for understanding models and urged flexibility in AI definitions.</li>



<li>@<a target="_blank" href="https://discuss.opensource.org/u/shujisado">shujisado</a> <a target="_blank" href="https://discuss.opensource.org/t/explaining-the-concept-of-data-information/401/11?u=mia">highlighted</a> that training data for machine learning models is a preferred form of modification but questioned if it is the most preferred. He further emphasized the need for a flexible  definition for preferred forms of modification in AI.</li>



<li>@<a target="_blank" href="https://discuss.opensource.org/u/quaid">quaid</a><a target="_blank" href="https://discuss.opensource.org/t/explaining-the-concept-of-data-information/401/18?u=mia"> supported</a> the idea of conducting controlled experiments to determine if data information alone is sufficient to recreate AI models accurately. Suggested practical steps for testing the effectiveness of data information and encouraged community participation in such experiments.
<ul class="wp-block-list">
<li>@<a target="_blank" href="https://discuss.opensource.org/u/stefano">stefano</a> <a target="_blank" href="https://discuss.opensource.org/t/explaining-the-concept-of-data-information/401/23?u=mia">added </a>that some students at CMU will run this kind of experiment (if full training dataset is needed or if data information is enough to recreate a model that can be tested for fidelity to the original) to test the definition.&nbsp;</li>
</ul>
</li>



<li>@<a target="_blank" href="https://discuss.opensource.org/u/jberkus">jberkus</a><a target="_blank" href="https://discuss.opensource.org/t/explaining-the-concept-of-data-information/401/17?u=mia"> raised concerns</a> about the practical assessment of data information and its ability to facilitate the recreation of AI systems. He questioned how to evaluate data information without recreating the AI system.</li>



<li><strong>Practical Applications and Community Insights</strong>
<ul class="wp-block-list">
<li>@<a target="_blank" href="https://discuss.opensource.org/u/hartmans">hartmans</a> <a target="_blank" href="https://discuss.opensource.org/t/explaining-the-concept-of-data-information/401/10?u=mia">proposed</a> practical scenarios where data information could suffice for modifying AI models and suggested that the community&#8217;s flexibility in defining the preferred form of modification has been valuable for Debian.</li>



<li>@<a target="_blank" href="https://discuss.opensource.org/u/quaid">quaid</a> <a target="_blank" href="https://discuss.opensource.org/t/explaining-the-concept-of-data-information/401/18?u=mia">shared</a> insights from his research on the OpenVLA project, noting its compliance with OSAID requirements. He further proposed conducting controlled experiments to verify if data information is enough to recreate models with fidelity.</li>
</ul>
</li>



<li><strong>General observations&nbsp;</strong></li>
</ul>



<ul class="wp-block-list">
<li>@<a target="_blank" href="https://discuss.opensource.org/u/shujisado">shujisado</a> <a target="_blank" href="https://discuss.opensource.org/t/explaining-the-concept-of-data-information/401/11?u=mia">emphasized</a> the need for flexible definitions in AI, drawing from open-source community experiences. Agreed on the complexity of training data issues and supported the flexible approach of OSI in defining the preferred form of modification.</li>



<li>@<a target="_blank" href="https://discuss.opensource.org/u/quaid">quaid</a> <a target="_blank" href="https://discuss.opensource.org/t/explaining-the-concept-of-data-information/401/18?u=mia">suggested</a> practical approaches for evaluating data information and its adequacy for recreating AI models and proposed further experiments and community involvement to refine the understanding and application of data information in open-source AI.</li>
</ul>



<h2 class="wp-block-heading"><a target="_blank" href="https://discuss.opensource.org/t/are-we-evaluating-licenses-or-systems/414">Are we evaluating Licenses or Systems?</a></h2>



<ul class="wp-block-list">
<li>@<a target="_blank" href="https://discuss.opensource.org/u/jberkus">jberkus</a><strong> </strong><a target="_blank" href="https://discuss.opensource.org/t/are-we-evaluating-licenses-or-systems/414?u=mia">asked</a><strong> </strong>whether OSAID will apply to licenses or systems, noting that current drafts focus on systems. He questioned if a certification program for reviewing systems as open source or proprietary is the intended direction.</li>



<li>@<a target="_blank" href="https://discuss.opensource.org/u/shujisado">shujisado</a><strong> </strong><a target="_blank" href="https://discuss.opensource.org/t/are-we-evaluating-licenses-or-systems/414/2?u=mia">confirmed</a> that discussions are moving towards certifying AI systems and pointed at an existing thread. He emphasized the need for evaluating individual components of AI systems and expressed concern about OSI&#8217;s capacity to establish a certification mechanism, highlighting that it would significantly expand OSI&#8217;s role.</li>
</ul>
]]></content:encoded>
					
					<wfw:commentRss>https://opensource.org/blog/open-source-ai-definition-weekly-update-june-24/feed</wfw:commentRss>
			<slash:comments>4</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">63282</post-id>	</item>
		<item>
		<title>Open Source AI Definition &#8211; Weekly update June 17</title>
		<link>https://opensource.org/blog/open-source-ai-definition-weekly-update-june-17</link>
		
		<dc:creator><![CDATA[Mia Lykou Lund]]></dc:creator>
		<pubDate>Mon, 17 Jun 2024 16:52:03 +0000</pubDate>
				<category><![CDATA[News]]></category>
		<category><![CDATA[ai]]></category>
		<category><![CDATA[Deep Dive: AI]]></category>
		<guid isPermaLink="false">https://opensource.org/?p=62707</guid>

					<description><![CDATA[Busy? Catch up on the Open Source AI Definition here!]]></description>
										<content:encoded><![CDATA[
<h2 class="wp-block-heading"><a target="_blank" href="https://discuss.opensource.org/t/explaining-the-concept-of-data-information/401">Explaining the concept of Data information</a></h2>



<ul class="wp-block-list">
<li>After <a target="_blank" href="https://discuss.opensource.org/t/open-source-ai-needs-to-require-data-to-be-viable/351">much debate</a> regarding training data, @stefano published a summary of the positions expressed and some clarifications about the terminology included in draft v.0.0.8. You can read the rationale about it and <a target="_blank" href="https://discuss.opensource.org/t/explaining-the-concept-of-data-information/401/3">share your thoughts on the forum</a>. </li>



<li><strong>Initial thoughts:</strong>
<ul class="wp-block-list">
<li>@<a target="_blank" href="https://discuss.opensource.org/u/Senficon">Senficon</a> (Felix Reda) <a target="_blank" href="https://discuss.opensource.org/t/explaining-the-concept-of-data-information/401/2?u=mia">adds</a> that while the discussion has highlighted the case for data information, it&#8217;s crucial to understand the implications of copyright law on AI, particularly concerning access to training data. Open Source software relies on a legal element (copyright licenses) and an access element (availability of source code). However, this framework does not seamlessly apply to AI, as different copyright regimes allow text and data mining (TDM) for AI training but not the <strong>redistribution</strong> of datasets. This discrepancy means that requiring the publication of training datasets would make Open Source AI models illegal, despite TDM exceptions that facilitate AI development. Also, <strong>public domain status is not consistent internationally</strong>, complicating the creation of legally publishable datasets. Consequently, a definition of Open Source AI that imposes releasing datasets would impede collaborative improvements and limit practical significance. Emphasizing data innovation can help maintain Open Source principles without legal pitfalls.</li>
</ul>
</li>
</ul>



<h2 class="wp-block-heading"><a target="_blank" href="https://discuss.opensource.org/t/concerns-and-feedback-on-anchoring-on-the-model-openness-framework/393">Concerns and feedback on anchoring on the Model Openness Framework</a></h2>



<ul class="wp-block-list">
<li>@<a target="_blank" href="https://discuss.opensource.org/u/amcasari">amcasari</a> <a target="_blank" href="https://discuss.opensource.org/t/concerns-and-feedback-on-anchoring-on-the-model-openness-framework/393?u=mia">expresses concern</a> about the usability and neutrality of the &#8220;Model Openness Framework&#8221; (MOF) for identifying AI systems, suggesting it doesn&#8217;t align well with current industry practices and isn&#8217;t ready for practical application without further feedback and iteration.</li>



<li>@<a target="_blank" href="https://discuss.opensource.org/u/shujisado">shujisado</a> <a target="_blank" href="https://discuss.opensource.org/t/concerns-and-feedback-on-anchoring-on-the-model-openness-framework/393/2?u=mia">points out</a> that the MOF&#8217;s classification of components doesn&#8217;t depend on the specific IP laws applied, but rather on a general legal framework, and highlights that Japan&#8217;s IP law system differs from the US and EU, yet finds discussions based on the OSD consistent.</li>



<li>@<a target="_blank" href="https://discuss.opensource.org/t/concerns-and-feedback-on-anchoring-on-the-model-openness-framework/393/4?u=mia">stefano</a> <a target="_blank" href="https://discuss.opensource.org/t/concerns-and-feedback-on-anchoring-on-the-model-openness-framework/393/4?u=mia">emphasizes</a> the importance of having well-thought-out, timeless principles in the Open Source AI Definition document, while viewing the Checklist as a more frequently updated working document. He also supports the call to see practical examples of the framework in use and proposes separating the Checklist from the main document to reduce confusion.</li>
</ul>



<h2 class="wp-block-heading"><a target="_blank" href="https://discuss.opensource.org/t/initial-report-on-definition-validation/368">Initial Report on Definition Validation</a></h2>



<ul class="wp-block-list">
<li>Reviews of eleven different AI systems have been published. We do these review to check existing systems compatibility with our <a target="_blank" href="https://hackmd.io/@opensourceinitiative/osaid-0-0-8">current definition</a>. These are the systems in question: Arctic, BLOOM, Falcon, Grok, Llama 2, Mistral, OLMo, OpenCV, Phy-2, Pythia, and T5.
<ul class="wp-block-list">
<li>@<a target="_blank" href="https://discuss.opensource.org/u/Mer">mer</a> has set up a review sheet for the Viking model upon request from @<a target="_blank" href="https://discuss.opensource.org/u/merlijn-sebrechts">merlijn-sebrechts</a>.</li>



<li>@<a target="_blank" href="https://discuss.opensource.org/u/anatta8538">anatta8538</a> asks if MLOps is considered within the topic of the Model Openness Framework and whether CLIP, an LMM, would be consistent with the OSAID.</li>



<li>@<a target="_blank" href="https://discuss.opensource.org/u/nick">nick</a> clarifies that the evaluation focuses on components as described in the Model Openness Framework, which includes development and deployment aspects but does not cover MLOps as a whole.</li>
</ul>
</li>
</ul>



<h2 class="wp-block-heading"><a target="_blank" href="https://discuss.opensource.org/t/why-and-how-to-certify-open-source-ai/349">Why and how to certify Open Source AI</a></h2>



<ul class="wp-block-list">
<li>@<a target="_blank" href="https://discuss.opensource.org/u/Alek_Tarkowski">Alek_Tarkowski</a> agrees that certification of open-source AI will be crucial under the AI Act and highlights the importance of defining what constitutes an Open Source license. He points out the confusion surrounding terms like &#8220;free and open source license&#8221; and suggests that the issue of responsible AI licensing as a form of Open Source licensing needs resolution. Notes that some restrictive licenses are gaining traction and may need consideration for exemption from regulation, thus urging for a consensus.</li>
</ul>



<h2 class="wp-block-heading"><a target="_blank" href="https://discuss.opensource.org/t/open-source-ai-definition-town-hall-june-14-2024/392">Open Source AI Definition Town Hall &#8211; June 14, 2024</a></h2>



<p>Slides and the recording of our previous townhall meeting can be found <a target="_blank" href="https://discuss.opensource.org/t/open-source-ai-definition-town-hall-june-14-2024/392/2?u=mia">here</a>.</p>
]]></content:encoded>
					
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">62707</post-id>	</item>
		<item>
		<title>Open Source AI Definition &#8211; Weekly update June 10</title>
		<link>https://opensource.org/blog/open-source-ai-definition-weekly-update-june-10</link>
					<comments>https://opensource.org/blog/open-source-ai-definition-weekly-update-june-10#comments</comments>
		
		<dc:creator><![CDATA[Mia Lykou Lund]]></dc:creator>
		<pubDate>Tue, 11 Jun 2024 21:40:15 +0000</pubDate>
				<category><![CDATA[News]]></category>
		<category><![CDATA[ai]]></category>
		<category><![CDATA[Deep Dive: AI]]></category>
		<guid isPermaLink="false">https://opensource.org/?p=26705</guid>

					<description><![CDATA[This week, we continued discussions about the role of training data in open source AI. Missed it? Catch up here!]]></description>
										<content:encoded><![CDATA[
<h2 class="wp-block-heading"><a target="_blank" href="https://discuss.opensource.org/t/open-source-ai-needs-to-require-data-to-be-viable/351">Open Source AI needs to require data to be viable</a></h2>



<ul class="wp-block-list">
<li>With many different discussions happening at once, here are the main points:
<ul class="wp-block-list">
<li><strong>On the issue of training data </strong>
<ul class="wp-block-list">
<li><a target="_blank" href="https://discuss.opensource.org/u/Mark">@mark</a> <a target="_blank" href="https://discuss.opensource.org/t/open-source-ai-needs-to-require-data-to-be-viable/351/21?u=mia">is concerned</a> with openness of AI not being meaningful if there is not a focus on the training data.” Model weights are the most inscrutable component of current generative AI, and providers that release only [the weights] should not get a free ‘openness’ pass.”</li>



<li>@<a target="_blank" href="https://discuss.opensource.org/u/stefano">stefano</a> agrees with all of that but questions the criteria used to assign green marks in <a target="_blank" href="https://discuss.opensource.org/t/open-source-ai-needs-to-require-data-to-be-viable/351/13?u=mia">Mark&#8217;s paper</a>, pointing out inconsistencies. They use the example of Pythia-Chat-Base-7, which relies on a dataset from OpenDataHub with potential issues like non-versioned data and stale links, failing to meet stringent requirements required by @<a target="_blank" href="https://discuss.opensource.org/u/juliaferraioli">juliaferraioli</a>. Similar concerns are raised for other models like OLMo 7B Instruct, which lack specific data versioning details. Maffulli also highlights the case of Pythia-7B, which once may have been compliant but it&#8217;s now problematic due to the unavailability of its foundational dataset, the Pile, illustrating the complexities in maintaining an &#8220;open source&#8221; status over time, if the stringent proposal suggested by @<a target="_blank" href="https://discuss.opensource.org/u/juliaferraioli">juliaferraioli</a> and the AWS team is adopted.</li>



<li>@<a target="_blank" href="https://discuss.opensource.org/u/shujisado">shujisado</a> <a target="_blank" href="https://discuss.opensource.org/t/open-source-ai-needs-to-require-data-to-be-viable/351/30?u=mia">adds </a>that while he sympathizes with @<a target="_blank" href="https://discuss.opensource.org/u/juliaferraioli">juliaferraioli</a>&#8216;s request for datasets, @<a target="_blank" href="https://discuss.opensource.org/u/stefano">stefano</a>&#8216;s arguments in support of the concept of &#8220;Data information&#8221; are aligned with the OSI principles and are reasonable.</li>



<li>@<a target="_blank" href="https://discuss.opensource.org/u/spotaws">spotaws</a> stresses that &#8220;data information&#8221; alone is insufficient if the data itself is too vague. </li>



<li>@<a target="_blank" href="https://discuss.opensource.org/u/juliaferraioli">juliaferraioli</a> <a target="_blank" href="https://discuss.opensource.org/t/open-source-ai-needs-to-require-data-to-be-viable/351/33?u=mia">adds that</a> while replicating AI systems like OLMo or Pythia may seem impractical due to costs and statistical nature, the capability is crucial for broader adoption and consistency.  She finds the current definition to be unclear and subjective.</li>



<li>@<a target="_blank" class="" href="https://discuss.opensource.org/u/zack">zack</a> recommends to review StarCoder2, recognizing that it would be in the same category of BLOOM: a system with lots of transparency and a dataset made available but released with a restrictive license.</li>



<li>@<a target="_blank" class="" href="https://discuss.opensource.org/u/Ezequiel_Lanza">Ezequiel_Lanza</a> joined the conversation in support of the concept of Data information, <a target="_blank" href="https://discuss.opensource.org/t/open-source-ai-needs-to-require-data-to-be-viable/351/40?u=stefano">claiming</a>, with technical arguments that &#8220;sharing the dataset is not necessarily required and may not justify the potential risks associated with making it mandatory.&#8221;</li>



<li><strong>Partially open / restrictive licenses</strong>
<ul class="wp-block-list">
<li>Continuing @<a target="_blank" href="https://discuss.opensource.org/u/Mark">marks</a> points regarding restrictive licenses (like the ethical licenses), <a target="_blank" href="https://discuss.opensource.org/u/stefano">@stefano</a> has <a target="_blank" href="https://discuss.opensource.org/t/open-source-ai-needs-to-require-data-to-be-viable/351/18?u=mia">added a link</a> to an article highlighting some reasons why OSI is staying away from these licenses.</li>



<li>@<a target="_blank" href="https://discuss.opensource.org/u/pchestek">pchestek</a> <a target="_blank" href="https://discuss.opensource.org/t/open-source-ai-needs-to-require-data-to-be-viable/351/23?u=mia">further adds</a> that a partially open license would create even more opportunities for open washing, as “open source AI” could have many meanings.</li>



<li>@<a target="_blank" href="https://discuss.opensource.org/u/Mark">mark</a> <a target="_blank" href="https://discuss.opensource.org/t/open-source-ai-needs-to-require-data-to-be-viable/351/24?u=mia">clarified</a> that rather than proposing a variety of meanings, they are seeking to highlight the dimensions of openness in their paper, exploring the broader landscape. </li>



<li>@<a target="_blank" href="https://discuss.opensource.org/u/stefano">stefano</a> <a target="_blank" href="https://discuss.opensource.org/t/open-source-ai-needs-to-require-data-to-be-viable/351/26?u=mia">adds</a> that in the 26 years of OSI, it has contended with numerous organizations claiming varying degrees of openness as &#8220;open source. This issue is now mirrored in AI, as companies seek the market value of being labeled Open Source. Open Source is binary: either users have full rights or they don&#8217;t, and any system that falls short is not Open Source AI, regardless of how &#8220;almost&#8221; open it is.</li>
</ul>
</li>



<li><strong>Field of use/restriction&nbsp;</strong>
<ul class="wp-block-list">
<li><a target="_blank" href="https://discuss.opensource.org/u/juliaferraioli">@juliaferraioli</a> believes that OSAID should include prohibitions against field-of-use restrictions.</li>



<li>@<a target="_blank" href="https://discuss.opensource.org/u/shujisado">shujisado</a> <a target="_blank" href="https://discuss.opensource.org/t/open-source-ai-needs-to-require-data-to-be-viable/351/19?u=mia">adds</a> that OSAID specifies four freedoms as requirements for being considered open source and that this should be understood as the same since “freedom” is the same as “non-restricted”. The 10 clauses of the OSD have been replaced by the checklist in draft v0.0.8.</li>



<li>@<a target="_blank" href="https://discuss.opensource.org/u/juliaferraioli">juliaferraioli</a> adds that individual components may be covered by their individual licenses, but the overall system may be subject to additional terms, which is why we need this to be explicit.</li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
</ul>



<h2 class="wp-block-heading"><a target="_blank" href="https://discuss.opensource.org/t/initial-report-on-definition-validation/368">Initial Report on Definition Validation</a></h2>



<ul class="wp-block-list">
<li>@<a target="_blank" href="https://discuss.opensource.org/u/Mer">Mer</a> <a target="_blank" href="https://discuss.opensource.org/t/initial-report-on-definition-validation/368/6?u=mia">has added</a> how far we are regarding our system analysis compared to our current draft definition. Some points that remain incomplete have been highlighted.</li>



<li>Mistral (Mixtral 8x7B) <a target="_blank" href="https://discuss.opensource.org/t/initial-report-on-definition-validation/368/9?u=mia">is considered not in alignment</a> with the OSAID because its data pre-processing code is not released under an OSI-approved license.</li>
</ul>



<h2 class="wp-block-heading"><a target="_blank" href="https://discuss.opensource.org/t/can-a-derivative-of-non-open-source-ai-be-considered-open-source-ai/345">Can a derivative of non-open-source AI be considered Open Source AI?</a></h2>



<ul class="wp-block-list">
<li>@<a target="_blank" href="https://discuss.opensource.org/u/tarek_ziade">tarek_ziade</a> <a target="_blank" href="https://discuss.opensource.org/t/can-a-derivative-of-non-open-source-ai-be-considered-open-source-ai/345/12?u=mia">shares </a>his experience fine-tuning a &#8220;small&#8221; model (200M parameters) for a Firefox feature to describe images, using a base model for image encoding and text decoding. Despite not having 100% traceability of upstream data, Tarek argues that intentional fine-tuning and transparency make the new fine-tuned model open source. Any issues arising from downstream data can be addressed by the project maintainers, maintaining the model’s open source status.</li>
</ul>



<h2 class="wp-block-heading"><a target="_blank" href="https://discuss.opensource.org/t/open-source-ai-definition-town-hall-may-31-2024/369">Town hall recording out</a></h2>



<ul class="wp-block-list">
<li>We held our 10th town hall meeting a week and a half ago. You can access the recording <a target="_blank" href="https://discuss.opensource.org/t/open-source-ai-definition-town-hall-may-31-2024/369/3?u=mia">here</a> if you missed it.</li>



<li>A new town hall meeting is scheduled for this <a href="https://opensource.org/events/open-source-ai-definition-town-hall-2024-06-14">Friday, June 14</a>.</li>
</ul>
]]></content:encoded>
					
					<wfw:commentRss>https://opensource.org/blog/open-source-ai-definition-weekly-update-june-10/feed</wfw:commentRss>
			<slash:comments>4</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">26705</post-id>	</item>
	</channel>
</rss>
