Categories: Gadgets360

Reddit to Update Web Standard to Block Automated Data Scraping From Its Website

Social media platform Reddit said on Tuesday it will update a Web standard used by the platform to block automated data scraping from its website, following reports that AI startups were bypassing the rule to gather content for their systems.

The move comes at a time when artificial intelligence firms have been accused of plagiarizing content from publishers to create AI-generated summaries without giving credit or asking for permission.

Reddit said that it would update the Robots Exclusion Protocol, or “robots.txt,” a widely accepted standard meant to determine which parts of a site are allowed to be crawled.

The company also said it will maintain rate-limiting, a technique used to control the number of requests from one particular entity, and will block unknown bots and crawlers from data scraping – collecting and saving raw information – on its website.

More recently, robots.txt has become a key tool that publishers employ to prevent tech companies from using their content free-of-charge to train AI algorithms and create summaries in response to some search queries.

Last week, a letter to publishers by the content licensing startup TollBit said that several AI firms were circumventing the web standard to scrape publisher sites.

This follows a Wired investigation which found that AI search startup Perplexity likely bypassed efforts to block its Web crawler via robots.txt.

Earlier in June, business media publisher Forbes accused Perplexity of plagiarizing its investigative stories for use in generative AI systems without giving credit.

Reddit said on Tuesday that researchers and organizations such as the Internet Archive will continue to have access to its content for non-commercial use.

© Thomson Reuters 2024


Is the Samsung Galaxy Z Flip 5 the best foldable phone you can buy in India right now? We discuss the company’s new clamshell-style foldable handset on the latest episode of Orbital, the Gadgets 360 podcast. Orbital is available on Spotify, Gaana, JioSaavn, Google Podcasts, Apple Podcasts, Amazon Music and wherever you get your podcasts.

Recent Posts

Beyoncé’s NFL Christmas Halftime Show Now Streaming on Netflix: Everything You Need to Know

Beyoncé's much-anticipated halftime performance, part of Netflix's NFL Christmas Gameday event, is set to release…

10 months ago

Scientists Predict Under Sea Volcano Eruption Near Oregon Coast in 2025

An undersea volcano situated roughly 470 kilometers off Oregon's coastline, Axial Seamount, is showing signs…

10 months ago

Organic Molecules in Space: A Key to Understanding Life’s Cosmic Origins

As researchers delve into the cosmos, organic molecules—the building blocks of life—emerge as a recurring…

10 months ago

The Secret of the Shiledars OTT Release Date Announced: What You Need to Know

Director Aditya Sarpotdar, following his successful venture "Munjya," has announced the release of his treasure…

10 months ago

Anne Hathaway’s Mothers’ Instinct Now Streaming on Lionsgate Play

The psychological thriller Mothers' Instinct, featuring Anne Hathaway, Jessica Chastain, and Kelly Carmichael, delves into…

10 months ago

All We Imagine As Light OTT Release Date: When and Where to Watch it Online?

Payal Kapadia's award-winning film, All We Imagine As Light, will soon be available for streaming,…

10 months ago