Reddit sues Perplexity and others for allegedly scraping millions of user comments

The lawsuit alleges they are evading Reddit’s own anti-scraping measures and circumventing Google’s controls.

Associated Press

Social media platform Reddit sued the artificial intelligence company Perplexity AI
and three other entities on Wednesday, alleging their involvement in an
“industrial-scale, unlawful” economy to “scrape” the comments of
millions of Reddit users for commercial gain.

Reddit’s
lawsuit in a New York federal court takes aim at San Francisco-based
Perplexity, maker of an AI chatbot and “answer engine” that competes
with Google, ChatGPT, and others in online search.

Also named in
the lawsuit are Lithuanian data-scraping company Oxylabs UAB, a web
domain called AWMProxy that Reddit describes as a “former Russian
botnet,” and Texas-based startup SerpApi, which lists Perplexity as a
customer on its website.

It’s the second such lawsuit from Reddit since it sued another major AI company, Anthropic, in June.

But
the lawsuit filed Wednesday is different in the way that it confronts
not just an AI company, but the lesser-known services the AI industry
relies on to acquire online writings needed to train AI chatbots.

“Scrapers
bypass technological protections to steal data, then sell it to clients
hungry for training material. Reddit is a prime target because it’s one
of the largest and most dynamic collections of human conversation ever
created,” said Ben Lee, Reddit’s chief legal officer, in a statement
Wednesday.

Perplexity said it has not yet received the lawsuit but
“will always fight vigorously for users’ rights to freely and fairly
access public knowledge. Our approach remains principled and responsible
as we provide factual answers with accurate AI, and we will not
tolerate threats against openness and the public interest.”

SerpApi’s
customer success director, Ryan Schafer, said in an email: “We strongly
disagree with Reddit’s allegations and intend to vigorously defend
ourselves in court.”

Oxylabs didn’t immediately respond to a request for comment Wednesday. AWMProxy could not immediately be reached for comment.

Reddit
compares the companies it is suing to “would-be bank robbers” who can’t
get into the bank vault, so they break into the armored truck instead.
The lawsuit alleges they are evading Reddit’s own anti-scraping measures
while also “circumventing Google’s controls and scraping Reddit content
directly from Google’s search engine results.”

Lee
said that because they’re unable to scrape Reddit directly, “they mask
their identities, hide their locations, and disguise their web scrapers
to steal Reddit content from Google Search. Perplexity is a willing
customer of at least one of these scrapers, choosing to buy stolen data
rather than enter into a lawful agreement with Reddit itself.”

Reddit
made a similar argument in its lawsuit against Anthropic, alleging that
the company ignored Reddit’s appeals to cease using its content. That
case was initially filed in California Superior Court but was later
moved to federal court and has a hearing scheduled for January.

Along
with digitized books and news articles, websites such as Wikipedia and
Reddit are deep troves of written materials that can help teach an AI
assistant the patterns of human language.

Reddit has previously entered licensing agreements with Google, OpenAI,
and other companies that are paying to be able to train their AI
systems on the public commentary of Reddit’s more than 100 million daily
users.

The licensing deals helped the 20-year-old online platform
raise money ahead of its Wall Street debut as a publicly traded company
last year.

—By Matt O’Brien, AP technology writer

Fast Company

(14)

Author: admin
Device Daily Photo