Home GADGETS Several AI companies said to be ignoring robots dot txt exclusion, scraping...

GADGETS

Several AI companies said to be ignoring robots dot txt exclusion, scraping content without permission: report

June 22, 2024

Several AI companies are circumventing the Robots Exclusion Protocol (robots.txt) to scrape content from websites without permission, according to TollBit, a content licensing startup, reports Reuters. This issue has led to disputes between AI firms and publishers, with Forbes accusing Perplexity of plagiarizing its content.

TollBit’s letter to publishers, obtained by Reuters, reveals that many AI agents are ignoring the robots.txt standard, which is used to block parts of a site from being crawled. The company’s analytics indicate a pattern of widespread non-compliance, as various AIs use data for training without authorization. AI search startup Perplexity, in particular, has been accused by Forbes of using its investigative stories in AI-generated summaries without proper attribution or permission. Perplexity did not comment on these allegations.

The robots.txt protocol, created in the mid-1990s, was intended to prevent web crawlers from overloading websites. Although it has no legal enforcement, it has traditionally been widely respected — until now, it seems. Publishers are trying to use this protocol to block unauthorized content usage by AI systems, which scrape content to train algorithms and generate summaries.

“What this means in practical terms is that AI agents from multiple sources (not just one company) are opting to bypass the robots.txt protocol to retrieve content from sites,” TollBit wrote, according to Reuters. “The more publisher logs we ingest, the more this pattern emerges.”

Some publishers, like the New York Times, have taken legal action against AI companies for copyright infringement. Others have opted to negotiate licensing deals. This ongoing debate highlights the conflicting views on the value and legality of using content to train generative AI, as many AI developers argue that accessing content without charge does not violate any laws — unless, of course, it’s paid content.

The issue has gained prominence as AI-generated news summaries become more common. Google’s AI product, which creates summaries in response to search queries, has worsened publisher concerns. To prevent their content from being used by Google’s AI, publishers have been blocking it using robots.txt, but this removes their content from search results and impacts their online visibility. Meanwhile, if AIs ignore robots.txt, then what’s the point of content owners using it to no effect, and losing online visibility?

TollBit also has a horse in this AI and editorial content race, positioning itself as an intermediary between AI companies and publishers that helps to establish licensing agreements for content usage. The startup tracks AI traffic to publisher websites and provides analytics to negotiate fees for different types of content, including premium content. TollBit claims to have 50 websites using its services as of May, but did not disclose their names.

Source link

Several AI companies said to be ignoring robots dot txt exclusion, scraping content without permission: report

EDITOR PICKS

Chargers’ Jim Harbaugh undergoing treatment, will continue coaching after heart issues in game vs....

‘Affordability of drugs for rare diseases a challenge’ | Hyderabad News

BRS questions TG Govt.’s silence on Karnataka’s decision to go for Almatti dam height...

Don’t ruin your future by getting addicted to social media, Owaisi tells youth

The Acer Nitro Blaze 11 Is the Anti-Nintendo Switch

2025 Honda Dio Launch Price Rs 75k

Ancestry vs. 23andMe: How to Pick the Best DNA Testing Kit for You

CM Says Free Bus Travel for Women from August 15

Municipal Car Parks To Double Parking Fees In Delhi

Fatal ‘Parrot Fever’ Outbreak Claims 5 Lives Across Europe

GHMC cranks up crackdown against sweet shops in Hyderabad

HYDRAA demolishes park encroachment

Medha Shankar Slays in Fiery Red Saree!

Society will perish if population growth rate goes below 2.1: Bhagwat

EVEN MORE NEWS

Trump Rages as Suspended Host Jimmy Kimmel Resumes Show

Jacqueline Shines at LFW Amid Controversy

Telangana HC refuses to stall Bathukamma celebrations; event to go as...

POPULAR CATEGORY