On Thursday, Reddit is rolling out a brand new coverage aimed toward balancing its want to license its content material to bigger tech firms, like Google, and defending customers’ privateness. The newly introduced “Public Content material Coverage” will now be a part of Reddit’s current privateness coverage and content material coverage to information how Reddit’s knowledge is being accessed and utilized by business entities and different companions. Associated to this, the corporate additionally introduced a subreddit devoted to researchers working with Reddit’s knowledge.
The announcement comes shortly after Reddit’s inventory market debut, which sees the corporate positioning itself to develop income not solely from the adverts that run on its platform and API utilization by builders but in addition from its corpus of knowledge. The corporate in its IPO prospectus mentioned it had already made $203 million by means of knowledge licensing agreements and expects that quantity to extend over time.
Whereas Reddit hadn’t traditionally blocked entry to its knowledge for AI coaching functions, it modified its course final 12 months. Reddit CEO Steve Huffman informed The New York Occasions that it didn’t make sense for Reddit to proceed to offer “all of that worth to a few of the largest firms on this planet totally free,” signaling the corporate’s plan to maneuver into the info licensing area.
With these efforts now properly underway, the brand new Public Content material Coverage will lock down entry to Reddit’s knowledge with out an settlement. (Reddit says it’s not including new restrictions, simply publicizing the coverage it’s had in place internally for a while.)
“Sadly, we see increasingly business entities utilizing unauthorized entry or misusing licensed entry to gather public knowledge in bulk, together with Reddit public content material,” Reddit writes in its weblog. “Worse, these entities understand they haven’t any limitation on their utilization of that knowledge, they usually achieve this with no regard for person rights or privateness, ignoring affordable authorized, security, and person elimination requests. Whereas we’ll proceed our efforts to dam identified dangerous actors, we have to do extra to limit entry to Reddit public content material at scale to trusted actors who’ve agreed to abide by our insurance policies. However we additionally must proceed to make sure that customers, mods, researchers, and different good-faith, non-commercial actors have entry.”
In different phrases, entry to Reddit knowledge for analysis and different non-commercial efforts will proceed, however these entities that wish to use Reddit’s knowledge for different functions — together with for AI coaching — must pay. In a graphic shared on the weblog, Reddit makes this clear, saying that companies desirous about utilizing Reddit knowledge to “energy, increase or improve your product for any business functions” requires a contract.
Advertisers, in the meantime, are directed to an adverts API for managing campaigns and monitoring their efficiency.
As a result of the corporate is basically simply a big web site, indexable by search engines like google, this new coverage goals to lock down Reddit content material from any unauthorized assortment whereas additionally respecting customers’ rights.
As an example, Reddit says that its companions must add customers’ choices to delete their content material. So if customers don’t need their private posts to turn into fodder for future AI engines, they need to have the ability to decide out. Companions are additionally restricted by the brand new coverage from utilizing Reddit’s content material to establish people or their private data, together with for advert focusing on. Companions can also’t use Reddit content material to spam or harass its customers or to conduct “background checks, facial recognition, authorities surveillance, or assist legislation enforcement do any of the above.”
The coverage moreover restricts entry to grownup media and clarifies that Reddit gained’t promote its customers’ private data. The corporate additionally notes that it’s going to by no means license private content material like personal messages or private account data, like customers’ emails or looking historical past, amongst different issues.
To assist researchers who wish to use Reddit knowledge for non-commercial functions, the corporate has established a brand new subreddit, r/reddit4researchers. The corporate says it’s partnering with OpenMined to additionally develop a program to information and develop researchers’ collaboration with Reddit.