What could disrupt the future of generative AI? – MarTech

mt logo
MarTech » Marketing artificial intelligence (AI) » What could disrupt the future of generative AI?
There’s a lot of talk these days about how generative AI could put people out of work. Not as much thought is given to how people could put generative AI out of work. But they could — and quite possibly will.
GenAI and the foundation models on which it rests are currently at the dizzying peak of the Gartner hype cycle. If Gartner’s model is sound, those tools may be about to plunge into the “trough of disillusionment” before emerging a few years hence on a plateau of useful productivity.
There’s an argument, however, that the trough of disillusionment could swallow genAI products for good. In addition to the risks embedded in relying on what is essentially unconscious and amoral “intelligence,” users also face the very real prospects that copyright and privacy issues could mortally wound large language models (LLMs) like ChatGPT.
Let’s take those in order.
Publishers monetize content. They do not seek to have third-parties monetize that content without permission, especially as the publishers have likely already paid for it. Professional authors monetize what they write. They too do not seek to have third-parties profit from their work with no recompense for the creator. Everything I say here about written content applies equally to graphic, video and any other creative content.
We do have copyright laws, of course, that protect publishers and authors from direct theft. Those don’t help with genAI because it crawls so many sources that the ultimate output may not closely resemble just one of the individual sources (although that can happen).
Right now, publishers are actively looking at ways to block LLMs from scraping their content. It’s a tough technical challenge
In this video, MarTech contributor Greg Krehbiel discusses ways publishers might try to block LLMs. He also makes a case for changing terms and conditions to prepare the grounds for future lawsuits. As he seems to acknowledge, none of his suggestions are a slam dunk. For instance, is it practicable to stop Google crawling your site to grab content without also stopping it crawling your site to place it in search results? Also, lawsuits are costly.
But how about a regulatory fix? Do you remember the endless annoyance of telemarketing calls? The National Do Not Call register put a stop to that. Everyone who cared was able to register their number and telemarketers could continue to call it only at the risk of the FTC imposing hefty fines.
Registering domains with a National Do Not Scrape register might be a heavier lift, but one can see in general terms how such a regulatory strategy might work. Would every infringement be detected? Surely not. But the same goes, for example, for GDPR. GDPR commands compliance not because every infringement is detected, but because those infringements that are detected can result in heavy sanctions — “unprecedentedly steep fines of up to 4 percent of a company’s total global revenue.”
Whether there’s a technical or regulatory fix to stop genAI stealing content, hasn’t that horse already departed the stable? LLMs have already been trained on inconceivably large datasets. They may be prone to error, but there’s a sense in which they know everything.
Well, they know everything up to a couple of years ago. ChatGPT-4 was pre-trained on data with a cut-off of September 2021. That means that there’s a lot that it doesn’t know. Let’s remind ourselves of what we’re dealing with here.
Dig deeper: Artificial Intelligence: A beginner’s guide
GenAI uses algorithms to predict the next-best-piece-of-text to create, based on all those millions of pieces of text on which it was trained. What makes it “intelligent” is that it can improve its own algorithms based on feedback and response (a human doesn’t have to tinker with the algorithms, although of course she could).
What genAI doesn’t do — can’t do — is find out stuff about the world that lies outside its data training set. This underlines the point, made by philosophers like Donald Davidson,1 that AI has no causal connections with the world. If I want to know if it’s raining, I don’t rely on a dataset; I look out the window. To put it technically, genAI may have great syntax (grammar), but it’s a stranger to semantics (meaning).
The conclusion to be drawn from this is that AI is wholly reliant on creatures, like us, who are causally connected to the world; who can tell if it’s raining, if there’s a moon in the sky, if Jefferson drafted the Declaration of Independence. So far, it has been dependent on what people have done in the past. To remain relevant it must continue to depend on what people alone can do.
If the ability of LLMs to continue to scrape content created by humans is significantly retarded, they will not be able to add to, update, correct and augment their datasets going forward. The demise of their utility might be slow, but it would be more or less guaranteed.
In addition to the urge of publishers, authors and other creators to keep genAI away from their content, there’s another very real problem it faces in the immediate future. The need to somehow guarantee that, in the act of scraping millions of gigabytes of data from the web, they are not inadvertently seizing personally identifying information (PII) or other types of data protected by existing regulations.
Suffice to say that European courts tend to be more sympathetic to citizens’ rights than to big tech’s profits.
We haven’t even mentioned trust and safety. Those concerns were covered in my recent conversation with Gartner’s AI hype cycle expert Afraz Jaffri, who said:
The first issue is actually the trust aspect. Regardless of external regulations, there’s still a fundamental feel that it’s very hard to control the models’ outputs and to guarantee the outputs are actually correct. That’s a big obstacle.
It’s easy to say that genAI is here to stay. Plenty of people have said it. And indeed, a significant — if not entirely novel — development in technology is highly unlikely to be forgotten or abandoned. At a bare minimum, organizations will continue to use these capabilities on their own datasets, or cautiously determined external datasets, and that will meet many important use cases.
Nevertheless, the chances that genAI will be disrupted, constrained and very much altered by some combination of regulatory blocks, legal challenges, trust issues — and other obstacles as yet unseen — are well above zero.
Related stories
New on MarTech
About the author
Related topics
Get the daily newsletter digital marketers rely on.
See terms.
Discover time-saving technologies and actionable tactics that can help you overcome crucial marketing challenges.
Online Sept 26-27, 2023: MarTech fall
Start discovering now: MarTech spring
Learn actionable search marketing tactics that can help you drive more traffic, leads, and revenue.
Start training now:: SMX Advanced
November 14-15, 2022: SMX Next
March 8-9, 2022: Master Classes
Maximize Intent Signals and Win Customer Loyalty with Cross-Channel Strategies
Everything You Ever Wanted to Know About Data but Were Afraid to Ask
Unleash the Untapped Power of Email Address Intelligence
Enterprise Marketing Attribution and Performance Management Platforms: A Marketer’s Guide
Enterprise Account-Based Marketing Platforms: A Marketer’s Guide
Enterprise Marketing Work Management Platforms: A Marketer’s Guide
The Modern Marketing Data Stack 2023
Receive daily marketing news & analysis.
Our events
Follow us
© 2023 Third Door Media, Inc. All rights reserved.
Third Door Media, Inc. is a publisher and marketing solutions provider incorporated in Delaware, USA, with an address 88 Schoolhouse Road, PO Box 3103, Edgartown, MA 02539. Third Door Media operates business-to-business media properties and produces events. It is the publisher of MarTech.org, the leading marketing technology digital publication.