The New York Times has put Perplexity.ai on notice: "Don't crawl our site, or else!" This is a little comical since Google has been crawling nytimes.com since the beginning of Google, but we need a distinction here. Two verbs, both terms of art: "to crawl," which means to systematically browse the web, index content, and then make it searchable through a search engine; "to scrape," which is the automated process of gathering information from the web to collect content from webpages.
Is Perplexity crawling to index and surface data, or is it scraping to index, surface, and train on data? Hmmm…
The dispute between The New York Times and Perplexity underscores growing tensions between traditional media outlets and AI companies over content usage rights. The New York Times issued a cease and desist letter to Perplexity, accusing the AI search engine startup of using its content without permission. This move comes amid similar claims from other publishers, including Forbes and Condé Nast, and an ongoing lawsuit by NYT against OpenAI and Microsoft over content training without proper authorization.
The New York Times contends that "Perplexity and its partners have been unjustly enriched by using its journalism without a license. The company has already prohibited AI crawlers, including Perplexity's, through its robots.txt file."
Perplexity, however, denies scraping content for AI training, emphasizing that it merely indexes web pages to present factual content in its answers. In what may be the most misguided statement that I've ever heard regarding copyrighted content, Perplexity spokesperson Sara Platnick argued that "no entity owns the copyright over facts," defending its approach to indexing content and citing it to inform responses.
The company expressed a desire for collaboration with publishers, rather than conflict. (In response to plagiarism accusations earlier in the year, Perplexity struck agreements with several publishers, including Fortune, Time, and The Texas Tribune, to offer ad revenue and free subscriptions.)
If you need a name for this battle, call it "the battle over link-based search." Publishers can't earn a living if you don't see ads or click on them. AI bots and their respective generative AI platforms don't see or click; they simply summarize and surface aggregated content without attribution or remuneration. Oops. This is "the" fight of the year – maybe the next five years.
As always your thoughts and comments are both welcome and encouraged. -s
P.S. Are you a CMO or AI-focused c-suite exec? If so, please request an invitation to the MMA CMO AI Transformation Summit, which I'm going to host on November 6 in New York. It's going to be an action-packed half-day event featuring short talks, facilitated discussion, and breakouts featuring some of the brightest minds in marketing. Just reply to this email.
ABOUT SHELLY PALMER
Shelly Palmer is the Professor of Advanced Media in Residence at Syracuse University’s S.I. Newhouse School of Public Communications and CEO of The Palmer Group, a consulting practice that helps Fortune 500 companies with technology, media and marketing. Named he covers tech and business for , is a regular commentator on CNN and writes a popular . He's a , and the creator of the popular, free online course, . Follow or visit .