Reports have emerged suggesting that Reddit, the popular news aggregator and community site, is planning to block AI startups from scraping its data. This move could potentially impact news crawlers, such as those used by Google and Bing.
According to a report by The Washington Post, Reddit may remove the ability to log in to the site using Google credentials and block Google’s web crawlers from accessing the site. These actions are said to be a result of Reddit’s struggles to reach an agreement with AI companies, like Google, regarding payment for the data they extract from the site.
Reddit has since denied these claims, but only explicitly addressing the removal of Google login functionality. The issue of blocking web crawlers remains open to interpretation.
Data scraping has become a contentious topic in recent times, particularly regarding AI startups and the training of their chatbots. Several news websites, including Reddit, have implemented API blocks and limits to prevent these startups from scraping their content. Elon Musk, CEO of X, has publicly criticized AI startups for scraping his platform’s data, which led to API changes on the site.
Reddit faced a similar issue a few months ago and followed the lead of X in blocking APIs, leading to controversy and the permanent shutdown of many subreddits. However, the focus now appears to be on search crawlers, which continue to scrape Reddit without any cost.
AI startups traditionally rely on publicly available web data to train their chatbots and AI models, avoiding the time-consuming and expensive process of creating their own datasets. However, news organizations and content creators have expressed frustration, arguing that these startups are benefiting from their work without compensation.
Blocking search engine crawlers from accessing Reddit would mean that its content no longer appears in search results on Google and Bing. This would be a significant setback for Reddit, as search engines are a major source of traffic for the website.
Despite this, Reddit does not appear to be concerned. An anonymous source, reported to be a representative from Reddit, stated, “Reddit can survive without search.” However, as AI continues to advance and data becomes more vital for training AI models, it is crucial for search giants and news sites to find a resolution to this issue soon.