Nearly six months after the launch of ChatGPT—after Bard, Bing, Claude, LLaMA, and StabilityLM are subsequently released—one after another, large user generated content companies are closing off access to their data for training AI models.

Paresh Dave, Wired:

Stack Overflow, a popular internet forum for computer programming help, plans to begin charging large AI developers as soon as the middle of this year for access to the 50 million questions and answers on its service, CEO Prashanth Chandrasekar says.

Mike Isaac, The New York Times:

[Reddit] said on Tuesday that it planned to begin charging companies for access to its application programming interface, or A.P.I.

[…]

Mr. Huffman said Reddit’s A.P.I. would still be free to developers who wanted to build applications that helped people use Reddit… But for the A.I. makers, it’s time to pay up.

Kif Leswing, CNBC:

Twitter CEO Elon Musk threatened to sue Microsoft on Wednesday, accusing the software giant of illegally using the social media company’s data to train its artificial intelligence model.

[…]

Musk said in December that Twitter would “pause” OpenAI’s access to its database.

It is actually unlikely that new training data from any of these companies will be necessary any time soon. Language models need a huge amount of text in order to learn basic grammar, writing styles, and general facts. Specific, up-to-date information, on the other hand, is best integrated by plugging in external tools.

After a while, though, it will be necessary to update the foundation model’s training data. When that happens, large companies that are able to either pay for API access or strike data exchange deals will unequally benefit.