Stack Overflow and LLMs

 Sat, 11 May 2024 10:04 UTC

Stack Overflow and LLMs
Image: CC BY 4.0 by cybrkyd

News broke on 06-May-2024 that Stack Overflow is partnering with OpenAI.1 The news caused many users to remove or edit their contributions from the public forum to prevent them from being used to train AI. SO started banning those users.

Stack Overflow owns their user’s posts. The content, however, is under a Creative Commons 4.0 license and this requires attribution. It is therefore quite understandable why so many are miffed about this move by SO.

Anyone who has used Chat GPT will tell you that there are no references provided to the sources of it’s data. Compare that to Bing’s Copilot, for example, which adds a section at the bottom of the output, dedicated to citing the relevant sources.

Despite assurances from Stack Overflow that OpenAI will “provide attribution to the Stack Overflow community within ChatGPT to foster deeper engagement with content”, the deletions and data poisoning continue.2 The announcement by SO provides confirmation that the Creative Commons 4.0 license will be honoured.

Before this partnership was announced, it would not be thought of as daft if one assumed that SO’s data had already been indexed by OpenAI. Some of the answers I’ve received are remarkably similar to those found on SO. Maybe a coincidence, especially around the more generic stuff like “How do I move multiple files in Linux”. Be that as it may, many think that this is theft.

I’ll stop here. This is a highly emotional topic with a lot of SO contributors throwing their toys out of their cots and spitting the dummy. Any opinion offered on this matter construed to be in support of this move by SO will result in abuse.3

One thing, however, does need to be written and said: If you knowingly go into something fully aware of the Creative Commons 4.0 license, why complain?

  1. Stack Overflow and OpenAI: Stack Overflow and OpenAI Partner to Strengthen the World’s Most Popular Large Language Models ↩︎

  2. Tom’s Hardware: Stack Overflow bans users ↩︎

  3. Ruben Schade: Stack Overflow and LLM licencing ↩︎