AI companies are buying your old Tumblr and Reddit posts

The next big business opportunity in AI might be websites selling their users’ posts.

Driving the news: Automattic, the company that owns Tumblr and WordPress, is reportedly preparing deals to let OpenAI and Midjourney scrape user posts for AI training data. There’s an easy joke about AIs trained on fanfiction and anime memes, but Tumblr is a popular place for artists to post their work, and over 800 million websites use WordPress as their content management system.

  • Tumblr is not as big as it once was, but the deal will cover public posts as far back as 2014 — though concerned users will be given the option to opt out of AI training.

Catch-up: Last month, Reddit struck a deal with Google to train AI models on user posts, worth US$60 million annually. The platform is so hopeful about these kinds of deals that it's comfortable going public despite a US$90.8 million loss in 2023.

Why it matters: Selling access to user data for AI training could be a big revenue stream for media companies, one that’s especially attractive after several years of ad market volatility. AI is only as good as the data it is trained on, and companies seem willing to pay big for it.

  • That might be because they need more sources as news outlets begin blocking website-scraping crawlers. According to Originality AI, 37% of the top 350 news sites are blocking OpenAI, with 12% blocking Google.
     
  • Copyright lawsuits also might be pushing AI companies away from the “it’s better to beg forgiveness than pay for permission” approach of the past.

Zoom out: News media might be interested in being used for training data, as long as companies pay up. OpenAI has struck deals with Axel Springer and The Associated Press, and is reportedly seeking more deals worth US$5 million annually. Apple is reportedly going as high as US$50 million to get training data from some publishers.