Wikipedia's Moment of Truth – The New York Times
Can the online encyclopedia help teach A.I. chatbots to get their facts right — without destroying itself in the process?
Credit…Illustration by Erik Carter
Supported by
In early 2021, a Wikipedia editor peered into the future and saw what looked like a funnel cloud on the horizon: the rise of GPT-3, a precursor to the new chatbots from OpenAI. When this editor — a prolific Wikipedian who goes by the handle Barkeep49 on the site — gave the new technology a try, he could see that it was untrustworthy. The bot would readily mix fictional elements (a false name, a false academic citation) into otherwise factual and coherent answers. But he had no doubts about its potential. “I think A.I.’s day of writing a high-quality encyclopedia is coming sooner rather than later,” he wrote in “Death of Wikipedia,” an essay that he posted under his handle on Wikipedia itself. He speculated that a computerized model could, in time, displace his beloved website and its human editors, just as Wikipedia had supplanted the Encyclopaedia Britannica, which in 2012 announced it was discontinuing its print publication.
For more audio journalism and storytelling, download New York Times Audio, a new iOS app available for news subscribers.
Recently, when I asked this editor — he asked me to withhold his name because Wikipedia editors can be the targets of abuse — if he still worried about his encyclopedia’s fate, he told me that the newer versions made him more convinced that ChatGPT was a threat. “It wouldn’t surprise me if things are fine for the next three years,” he said of Wikipedia, “and then, all of a sudden, in Year 4 or 5, things drop off a cliff.”
Wikipedia marked its 22nd anniversary in January. It remains, in many ways, a throwback to the Internet’s utopian early days, when experiments with open collaboration — anyone can write and edit for Wikipedia — had yet to cede the digital terrain to multibillion-dollar corporations and data miners, advertising schemers and social-media propagandists. The goal of Wikipedia, as its co-founder Jimmy Wales described it in 2004, was to create “a world in which every single person on the planet is given free access to the sum of all human knowledge.” The following year, Wales also stated, “We help the internet not suck.” Wikipedia now has versions in 334 languages and a total of more than 61 million articles. It consistently ranks among the world’s 10 most-visited websites yet is alone among that select group (whose usual leaders are Google, YouTube and Facebook) in eschewing the profit motive. Wikipedia does not run ads, except when it seeks donations, and its contributors, who make about 345 edits per minute on the site, are not paid. In seeming to repudiate capitalism’s imperatives, its success can seem surprising, even mystifying. Some Wikipedians remark that their endeavor works in practice, but not in theory.
Wikipedia is no longer an encyclopedia, or at least not only an encyclopedia: Over the past decade it has become a kind of factual netting that holds the whole digital world together. The answers we get from searches on Google and Bing, or from Siri and Alexa — “How old is Joe Biden?” or “What is an ocean submersible?” — derive in part from Wikipedia’s data having been ingested into their knowledge banks. YouTube has also drawn on Wikipedia to counter misinformation.
The new A.I. chatbots have typically swallowed Wikipedia’s corpus, too. Embedded deep within their responses to queries is Wikipedia data and Wikipedia text, knowledge that has been compiled over years of painstaking work by human contributors. While estimates of its influence can vary, Wikipedia is probably the most important single source in the training of A.I. models. “Without Wikipedia, generative A.I. wouldn’t exist,” says Nicholas Vincent, who will be joining the faculty of Simon Fraser University in British Columbia this month and who has studied how Wikipedia helps support Google searches and other information businesses.
Yet as bots like ChatGPT become increasingly popular and sophisticated, Vincent and some of his colleagues wonder what will happen if Wikipedia, outflanked by A.I. that has cannibalized it, suffers from disuse and dereliction. In such a future, a “Death of Wikipedia” outcome is perhaps not so far-fetched. A computer intelligence — it might not need to be as good as Wikipedia, merely good enough — is plugged into the web and seizes the opportunity to summarize source materials and news articles instantly, the way humans now do with argument and deliberation.
On a conference call in March that focused on A.I.’s threats to Wikipedia, as well as the potential benefits, the editors’ hopes contended with anxiety. While some participants seemed confident that generative A.I. tools would soon help expand Wikipedia’s articles and global reach, others worried about whether users would increasingly choose ChatGPT — fast, fluent, seemingly oracular — over a wonky entry from Wikipedia. A main concern among the editors was how Wikipedians could defend themselves from such a threatening technological interloper. And some worried about whether the digital realm had reached a point where their own organization — especially in its striving for accuracy and truthfulness — was being threatened by a type of intelligence that was both factually unreliable and hard to contain.
One conclusion from the conference call was clear enough: We want a world in which knowledge is created by humans. But is it already too late for that?
Back in 2017, the Wikimedia Foundation and its community of volunteers began exploring how the encyclopedia and its sister sites like Wikidata and Wikimedia Commons, with their offerings of free information and images, could evolve by the year 2030. The plan was to ensure that the foundation, the nonprofit that oversees Wikipedia, could protect and share the world’s information in perpetuity. One outcome of that 2017 effort, which included a year’s worth of meetings, was a prediction that Wikimedia would become “the essential infrastructure of the ecosystem of free knowledge”; another conclusion was that trends like online misinformation would soon require far more vigilance. And a research paper commissioned by the foundation found that artificial intelligence was improving at a rate that could change the way that knowledge is “gathered, assembled and synthesized.”
For that reason, the rollout of ChatGPT did not elicit surprise inside the Wikipedia community — though several editors told me they were shocked by the speed of its adoption, which needed just two months after its release in late 2022 to gain an estimated 100 million users. Despite its stodgy appearance, Wikipedia is more tech-savvy than casual users might assume. With a small group of volunteers to oversee millions of articles, it has long been necessary for highly experienced editors, often known as administrators, to use semiautomated software to identify misspellings and catch certain forms of intentional misinformation. And because of its open-source ethos, the organization has at times incorporated technology made freely available by tech companies or academics, rather than go through a lengthy and expensive development process on its own. “We’ve had artificial-intelligence tools and bots since 2002, and we’ve had a team dedicated to machine learning since 2017,” Selena Deckelmann, Wikimedia’s chief technology officer, told me. “They’re extremely valuable for semiautomated content review, and especially for translations.”
How Wikipedia uses bots and how bots use Wikipedia are extremely different, however. For years it has been clear that fledgling A.I. systems were being trained on the site’s articles, as part of the process whereby engineers “scrape” the web to create enormous data sets for that purpose. In the early days of these models, about a decade ago, Wikipedia represented a large percentage of the scraped data used to train machines. The encyclopedia was crucial not only because it’s free and accessible, but also because it contains a mother lode of facts and so much of its material is consistently formatted.
In more recent years, as so-called Large Language Models, or L.L.M.s, increased in size and functionality — these are the models that power chatbots like ChatGPT and Google’s Bard — they began to take in far larger amounts of information. In some cases, their meals added up to well over a trillion words. The sources included not just Wikipedia but also Google’s patent database, government documents, Reddit’s Q. and A. corpus, books from online libraries and vast numbers of news articles on the web. But while Wikipedia’s contribution in terms of overall volume is shrinking — and even as tech companies have stopped disclosing what data sets go into their A.I. models — it remains one of the largest single sources for L.L.M.s. Jesse Dodge, a computer scientist at the Allen Institute for AI in Seattle, told me that Wikipedia might now make up between 3 and 5 percent of the scraped data an L.L.M. uses for its training. “Wikipedia going forward will forever be super valuable,” Dodge points out, “because it’s one of the largest well-curated data sets out there.” There is generally a link, he adds, between the quality of data a model trains on and the accuracy and coherence of its responses.
In this light, Wikipedia might be seen as a sheep, caught in the jaws of a wolfish technology marketplace. A free site created in achingly good faith (“Sharing knowledge is by nature an act of kindness,” Wikimedia noted in 2017, on a page devoted to its strategic direction) is being devoured by companies whose objectives — like charging for subscriptions, as OpenAI recently began doing for its latest model — don’t jibe with its own. Yet the relationships are more complicated than they appear. Wikipedia’s fundamental goal is to spread knowledge as broadly and freely as possible, by whatever means. About 10 years ago, when site administrators focused on how Google was using Wikipedia, they were in a situation that presaged the advent of A.I. chatbots. Google’s search engine was able, at the top of its query results, to present Wikipedians’ work to users all over the world, giving the encyclopedia far greater reach than before — an apparent virtue. In 2017, three academic computer scientists, Connor McMahon, Isaac Johnson and Brent Hecht, conducted an experiment that tested how random users would react if just part of the contributions made to Google’s search results by Wikipedia were removed. The academics perceived an “extensive interdependence”: Wikipedia makes Google a “significantly better” search engine for many queries, and Wikipedia, in turn, gets most of its traffic from Google.
One upshot from the collision with Google and others who repurpose Wikipedia’s content was the creation, two years ago, of Wikimedia Enterprise, a separate business unit that sells access to a series of application programming interfaces that provide accelerated updates to Wikipedia articles. Depending on whom you ask, the enterprise unit is either a more formalized way for tech companies to direct the equivalent of large charitable donations to Wikipedia — Google now subscribes, and altogether the unit took in $3.1 million in 2022 — or a way for Wikipedia to recoup some of the financial value it creates for the digital world, and thus help fund its future operations. Practically speaking, Wikipedia’s openness allows any tech company to access Wikipedia at any time, but the A.P.I.s make new Wikipedia entries almost instantly readable. This speeds up what was already a pretty fast connection. Andrew Lih, a consultant who works with museums to put data about their collections on Wikipedia, told me he conducted an experiment in 2019 to see how long it would take for a new Wikipedia article, about a pioneering balloonist named Vera Simons, to show up in Google Search results. He found the elapsed time was about 15 minutes.
Still, the close relationship between search engines and Wikipedia has raised some existential questions for the latter. Ask Google, “What is the Russia-Ukrainian War?” and Wikipedia is credited, with some of its material briefly summarized. But what if that makes you less likely to visit Wikipedia’s article, which runs to some 10,000 words and contains more than 400 footnotes? From the point of view of some of Wikipedia’s editors, reduced traffic will oversimplify our understanding of the world and make it difficult to recruit a new generation of contributors. It may also translate into fewer donations. In the 2017 paper, the researchers noted that visits to Wikipedia had indeed begun to decline. And the phenomenon they identified became known as the “paradox of reuse”: The more Wikipedia’s articles were disseminated through other outlets and media, the more imperiled was Wikipedia’s own health.
With A.I., this reuse problem threatens to become far more pervasive. Aaron Halfaker, who led the machine-learning research team at the Wikimedia Foundation for several years (and who now works for Microsoft), told me that search-engine summaries at least offer users links and citations and a way to click back to Wikipedia. The responses from large language models can resemble an information smoothie that goes down easy but contains mysterious ingredients. “The ability to generate an answer has fundamentally shifted,” he says, noting that in a ChatGPT answer there is “literally no citation, and no grounding in the literature as to where that information came from.” He contrasts it with the Google or Bing search engines: “This is different. This is way more powerful than what we had before.”
Almost certainly, that makes A.I. both more difficult to contend with and potentially more harmful, at least from Wikipedia’s perspective. A computer scientist who works in the A.I. industry (but is not permitted to speak publicly about his work) told me that these technologies are highly self-destructive, threatening to obliterate the very content which they depend upon for training. It’s just that many people, including some in the tech industry, haven’t yet realized the implications.
Wikipedia’s most devoted supporters will readily acknowledge that it has plenty of flaws. The Wikimedia Foundation estimates that its English-language site has about 40,000 active editors — meaning they make at least five edits a month to the encyclopedia. According to recent data from the Wikimedia Foundation, about 80 percent of that cohort is male, and about 75 percent of those from the United States are white, which has led to some gender and racial gaps in Wikipedia’s coverage. And lingering doubts about reliability remain. For a popular article that might have thousands of contributors, “Wikipedia is literally the most accurate form of information ever created by humans,” Amy Bruckman, a professor at the Georgia Institute of Technology, told me. But Wikipedia’s short articles can sometimes be hit or miss. “They could be total garbage,” says Bruckman, who is the author of the recent book “Should You Believe Wikipedia?” An erroneous fact on a rarely visited page may endure for months or years. And there continues to exist the ever-present threat of vandalism, or tampering with an article. In 2017, for instance, a photo of the speaker of the House, Paul Ryan, was added to the entry on invertebrates. As a Wikipedia editor whose first name is Jade put it to me: “We have a number of, I would say, almost-professional trolls who must dedicate just about as much time to creating spam, creating vandalism, harassing people, as I dedicate to improving Wikipedia.”
Several academics told me that whatever Wikipedia’s shortcomings, they view the encyclopedia as a “consensus truth,” as one of them put it: It acts as a reality check in a society where facts are increasingly contested. That truth is less about data points — “How old is Joe Biden?” — than about complex events like the Covid-19 pandemic, in which facts are constantly evolving, frequently distorted and furiously debated. The truthfulness quotient is raised by Wikipedia’s transparency. Most Wikipedia entries include footnotes, links to source materials and lists of previous edits and editors — and experienced editors are willing to intercede when an article appears incomplete or lacks what Wikipedians call “verifiability.” Moreover, Wikipedia’s guidelines insist that its editors maintain an “N.P.O.V.” — neutral point of view — or risk being overruled (or, in the argot of wiki culture, “reverted”). And the site has a bent toward self-examination. You can find long disquisitions on Wikipedia that explore Wikipedia’s own reliability. An entry on how Wikipedia has fallen victim to hoaxes runs to more than 60 printed pages.
As difficult as the pursuit of truth can be for Wikipedians, though, it seems significantly harder for A.I. chatbots. ChatGPT has become infamous for generating fictional data points or false citations known as “hallucinations”; perhaps more insidious is the tendency of bots to oversimplify complex issues, like the origins of the Ukraine-Russia war, for example. One worry about generative A.I. at Wikipedia — whose articles on medical diagnoses and treatments are heavily visited — is related to health information. A summary of the March conference call captures the issue: “We’re putting people’s lives in the hands of this technology — e.g. people might ask this technology for medical advice, it may be wrong and people will die.”
This apprehension extends not just to chatbots but also to new search engines connected to A.I. technologies. In April, a team of Stanford University scientists evaluated four engines powered by A.I. — Bing Chat, NeevaAI, perplexity.ai and YouChat — and found that only about half of the sentences generated by the search engines in response to a query could be fully supported by factual citations. “We believe that these results are concerningly low for systems that may serve as a primary tool for information-seeking users,” the researchers concluded, “especially given their facade of trustworthiness.”
What makes the goal of accuracy so vexing for chatbots is that they operate probabilistically when choosing the next word in a sentence; they aren’t trying to find the light of truth in a murky world. “These models are built to generate text that sounds like what a person would say — that’s the key thing,” Jesse Dodge says. “So they’re definitely not built to be truthful.” I asked Margaret Mitchell, a computer scientist who studied the ethics of A.I. at Google, whether factuality should have been a more fundamental priority for A.I. Mitchell, who says she was fired from the company after criticizing the direction of its work (Google says she was fired for violating the company’s security policies), said that most would find that logical. “This common-sense thing — ‘Shouldn’t we work on making it factual if we’re putting it forward for fact-based applications?’ — well, I think for most people who are not in tech, it’s like, ‘Why is this even a question?’” But, Mitchell said, the priorities at the big companies, now in frenzied competition with one another, are concerned with introducing A.I. products rather than reliability.
The road ahead will almost certainly lead to improvements. Mitchell told me that she foresees A.I. companies’ making gains in accuracy and reducing biased answers by using better data. “The state of the art until now has just been a laissez-faire data approach,” she said. “You just throw everything in, and you’re operating with a mind-set where the more data you have, the more accurate your system will be, as opposed to the higher quality of data you have, the more accurate your system will be.” Jesse Dodge, for his part, points to an idea known as “retrieval,” whereby a chatbot will essentially consult a high-quality source on the web to fact-check an answer in real time. It would even cite precise links, as some A.I.-powered search engines now do. “Without that retrieval element,” Dodge says, “I don’t think there’s a way to solve the hallucination problem.” Otherwise, he says, he doubts that a chatbot answer can gain factual parity with Wikipedia or the Encyclopaedia Britannica.
Market competition might help prompt improvement, too. Owain Evans, a researcher at a nonprofit in Berkeley, Calif., who studies truthfulness in A.I. systems, pointed out to me that OpenAI now has several partnerships with businesses, and those firms will care greatly about responses’ achieving a high level of accuracy. Google, meanwhile, is developing A.I. systems to work closely with medical professionals on disease detection and diagnostics. “There’s just going to be a very high bar there’,’ he adds, “so I think there are incentives for the companies to really improve this.”
At least for now, A.I. companies are focusing on what they call “fine tuning” when it comes to factuality. Sandhini Agarwal and Girish Sastry, researchers at OpenAI, the company that created ChatGPT, told me that their newer A.I. model, GPT-4, has made significant improvements over earlier models in what they called “factual content.” Those advances stem mainly from a process known as “reinforcement learning with human feedback” to help A.I. models differentiate between good and bad answers. But ChatGPT clearly has a way to go, both to fix hallucinations and to provide complex, multilayered and accurate answers to historical questions. When I asked Agarwal whether OpenAI’s systems could ever be completely accurate, or offer 400 footnotes, she said that it was possible. But there might always exist a tension between a model’s ambition to be factual and its efforts to be creative and fluent. As an A.I. developer, she explained, the goal was not for a chat model to “regurgitate” data it had been trained on. Rather, it was to see patterns of knowledge it could relate to users in fresh, conversational language.
In the future, Sastry added, A.I. systems might interpret whether a query requires a rigorous factual answer or something more creative. In other words, if you wanted an analytical report with citations and detailed attributions, the A.I. would know to deliver that. And if you desired a sonnet about the indictment of Donald Trump, well, it could dash that off instead.
In late June, I began to experiment with a plug-in the Wikimedia Foundation had built for ChatGPT. At the time, this software tool was being tested by several dozen Wikipedia editors and foundation staff members, but it became available in mid-July on the OpenAI website for subscribers who want augmented answers to their ChatGPT queries. The effect is similar to the “retrieval” process that Jesse Dodge surmises might be required to produce accurate answers. GPT-4’s knowledge base is currently limited to data it ingested by the end of its training period, in September 2021. A Wikipedia plug-in helps the bot access information about events up to the present day. At least in theory, the tool — lines of code that direct a search for Wikipedia articles that answer a chatbot query — gives users an improved, combinatory experience: the fluency and linguistic capabilities of an A.I. chatbot, merged with the factuality and currency of Wikipedia.
One afternoon, Chris Albon, who’s in charge of machine learning at the Wikimedia Foundation, took me through a quick training session. Albon asked ChatGPT about the Titan submersible, operated by the company OceanGate, whose whereabouts during an attempt to visit the Titanic’s wreckage were still unknown. “Normally you get some response that’s like, ‘My information cutoff is from 2021,’” Albon told me. But in this case ChatGPT, recognizing that it couldn’t answer Albon’s question — What happened with OceanGate’s submersible? — directed the plug-in to search Wikipedia (and only Wikipedia) for text relating to the question. After the plug-in found the relevant Wikipedia articles, it sent them to the bot, which in turn read and summarized them, then spit out its answer. As the responses came back, hindered by only a slight delay, it was clear that using the plug-in always forced ChatGPT to append a note, with links to Wikipedia entries, saying that its information was derived from Wikipedia, which was “made by volunteers.” And this: “As a large language model, I may not have summarized Wikipedia accurately.”
But the summary about the submersible struck me as readable, well supported and current — a big improvement from a ChatGPT response that either mangled the facts or lacked real-time access to the internet. Albon told me, “It’s a way for us to sort of experiment with the idea of ‘What does it look like for Wikipedia to exist outside of the realm of the website,’ so you could actually engage in Wikipedia without actually being on Wikipedia.com.” Going forward, he said, his sense was that the plug-in would continue to be available, as it is now, to users who want to activate it but that “eventually, there’s a certain set of plug-ins that are just always on.”
In other words, his hope was that any ChatGPT query might automatically result in the chatbots’ checking facts with Wikipedia and citing helpful articles. Such a process would probably block many hallucinations as well: For instance, because chatbots can be deceived by how a question is worded, false premises sometimes elicit false answers. Or, as Albon put it, “If you were to ask, ‘During the first lunar landing, who were the five people who landed on the moon?’ the chatbot wants to give you five names.” Only two people landed on the moon in 1969, however. Wikipedia would help by offering the two names, Buzz Aldrin and Neil Armstrong; and in the event the chatbot remained conflicted, it could say it didn’t know the answer and link to the article.
The plug-in still lets ChatGPT get creative — but in limited ways. The following week, when I asked it for updates about the OceanGate submersible, I got a three-paragraph rundown of how the tragedy unfolded, including the deaths of five passengers. Then I asked it to formulate its answer in five bullet points, which it did instantly. Could it then adapt those five bullet points, I asked, so that a 7- or 8-year-old could understand? “Here’s a simpler version,” ChatGPT said instantly, and offered just what I asked for, noting that the Titan was “a special underwater vehicle” and its implosion was “a sad event.”
It wasn’t perfect. I told ChatGPT that its bullet points seemed to overlook how Stockton Rush, OceanGate’s chief executive, had been criticized for ignoring safety standards. “You raise a valid point,” it responded. “Here’s a revised version that addresses your concern.” Its fix took only a few seconds.
Within the Wikipedia community, there is a cautious sense of hope that A.I., if managed right, will help the organization improve rather than crash. Selena Deckelmann, the chief tech officer, expresses that perspective most optimistically. “What we’ve proven over 22 years now is: We have a volunteer model that is sustainable,” she told me. “I would say there are some threats to it. Is it an insurmountable threat? I don’t think so.” The longtime Wikipedia editor who wrote “Death of Wikipedia” told me that he feels there is a case to be made for a good outcome in the coming years, even if the longer term seems far less certain. The Wikimedia plug-in is the first significant move toward protecting its future. Projects are also in the works to use recent advances in A.I. internally. Albon says that he and his colleagues are in the process of adapting A.I. models that are “off the shelf” — essentially models that have been made available by researchers for anyone to freely customize — so that Wikipedia’s editors can use them for their work. One focus is to have A.I. models aid new volunteers, say, with step-by-step chatbot instructions as they begin working on new articles, a process that involves many rules and protocols and often alienates Wikipedia’s newcomers.
Leila Zia, the head of research at the Wikimedia Foundation, told me that her team was likewise working on tools that could help the encyclopedia by predicting, for example, whether a new article or edit would be overruled. Or, she said, perhaps a contributor “doesn’t know how to use citations” — in that case, another tool would indicate that. I asked whether it could help Wikipedia entries maintain a neutral point of view as they were writing. “Absolutely,” she says.
For the moment, as the Wikipedia community debates rules and policy, article submissions entirely written by L.L.M.s are heavily discouraged on English-language Wikipedia. Still, there remains a kind of John Henry problem with A.I. The chatbots, unlike their human counterparts, have a formidable ability to churn out language like a steam-driven machine, 24/7. “I suspect the internet is going to be filled with crud just all over the place,” Chris Albon told me. And with the A.I. models getting better at mimicking people’s writing styles, it may be increasingly difficult to detect chatbot-written submissions. One Wikipedia editor whose first name is Theo sent me links in early June to show how he was in the midst of fending off a barrage of edits involving suspect citations formulated by A.I., including one to an article about Lake Doxa, in Greece.
Often, I got the sense that Theo and other Wikipedians were worried that their human abilities to scrutinize new content and citations, stretched to the limit already, might soon be overwhelmed by an avalanche of A.I.-generated text. Certainly, new tools that were themselves A.I. would help. But even if the editors won in the short term, you had to wonder: Wouldn’t the machines win in the end?
Three years ago, in anticipation of Wikipedia’s 20th anniversary, Joseph Reagle, a professor at Northeastern University, wrote a historical essay exploring how the death of the site had been predicted again and again. Wikipedia has nevertheless found ways to adapt and endure. Reagle told me that the recent debates over A.I. recall for him the early days of Wikipedia, when its quality was unflatteringly compared to that of other encyclopedias. “It served as a proxy in this larger culture war about information and knowledge and quality and authority and legitimacy. So I take a sort of similar model to thinking about ChatGPT, which is going to improve. Just like Wikipedia is not perfect, it’s not perfect — it’s never going to be perfect — but what is the relative value given the other information that’s out there?” The future as he saw it would be a range of options for information, caveat emptor, including everything from ChatGPT to Wikipedia to Reddit to TikTok. A dedicated plug-in could meanwhile improve the chatbots’ answers to questions about, for instance, health, weather or history.
At the moment, it goes against the grain to bet against A.I. The big tech companies, wagering billions on the new technologies and largely undaunted by their shortcomings or risks, seem intent on forging ahead as fast as they can. Those dynamics would suggest that organizations like Wikipedia will be forced to adapt to the future that A.I. has begun to create, rather than exert influence over A.I. or mount an effective resistance to it. Yet many Wikipedians and academics I spoke with question any such assumption. Impressive as the chatbots may be, A.I.’s apparent glide path to success may soon encounter a number of obstacles.
These could be societal as well as technical. The European Union’s Parliament is presently considering a new regulatory framework that, among other things, would force tech companies to label A.I.-generated content and to disclose more information about their A.I. training data. Congress is meanwhile considering several bills to regulate A.I. Legal scrutiny may be coming, too. In one closely watched lawsuit, Stability A.I. is being challenged for using pictures from Getty Images without permission; a California class-action suit accuses OpenAI of stealing the personal data of millions of people that has been scraped from the internet. While Wikipedia’s licensing policy lets anyone tap its knowledge and text — to “reuse and remix” it however they might like — it does have several conditions. These include the requirements that users must “share alike,” meaning any information they do something with must subsequently be made readily available, and that users must give credit and attribution to Wikipedia contributors. Mixing Wikipedia’s corpus into a chatbot model that gives answers to queries without explaining the sourcing may thus violate Wikipedia’s terms of use, two people in the open-source software community told me. It is now a topic of conversation inside the Wikimedia community whether some legal recourse exists.
Data providers may be able to exert other kinds of leverage as well. In April, Reddit announced that it would not make its corpus available for scraping by big tech companies without compensation. It seems very unlikely that the Wikimedia Foundation could issue the same dictum and close its sites off — an action that Nicholas Vincent has called a “data strike” — because its terms of service are more open. But the foundation could make arguments in the name of fairness and appeal to firms to pay for its A.P.I., just as Google does now. It could further insist that chatbots give Wikipedia prominent attribution and offer citations in their answers, something Selena Deckelmann told me the foundation is discussing with various firms. Vincent says that A.I. companies would be foolhardy to try to build a global encyclopedia themselves, with individual contractors. Instead, he told me, “there might be an intermediary stage here where Wikipedia says, ‘Hey, look at how important we’ve been to you.’”
Such an entreaty could be an effective reminder too, that the chatbots are made from us. Without ingesting the growing millions of Wikipedia pages or vacuuming up Reddit arguments about plot twists in “The Bear,” new L.L.M.s can’t be adequately trained. In fact, no one I spoke with in the tech community seemed to know if it would even be possible to build a good A.I. model without Wikipedia.
It may require the equivalent of a death in the family before the tech companies realize that they exist in a world of mutual dependency. Already, according to the computer scientist working in the A.I. industry, some technologists are concerned that new A.I.s are compromising the health of a website for programmers called Stack Overflow — a popular platform that the models have been trained on to answer coding questions. The problem seems to have two distinct aspects. If those with coding inquiries can go to ChatGPT for help, why go to Stack Overflow? In the meantime, if fewer people are consulting Stack Overflow for answers, why continue posting helpful suggestions or insights there?
Even if conflicts like this don’t impede the advance of A.I., it might be stymied in other ways. At the end of May, several A.I. researchers collaborated on a paper that examined whether new A.I. systems could be developed from knowledge generated by existing A.I. models, rather than by human-generated databases. They discovered a systemic breakdown — a failure they called “model collapse.” The authors saw that using data from an A.I. to train new versions of A.I.s leads to chaos. Synthetic data, they wrote, ends up “polluting the training set of the next generation of models; being trained on polluted data, they then misperceive reality.”
The lesson here is that it will prove challenging to build new models from old models. And with chatbots, Ilia Shumailov, an Oxford University researcher and the paper’s primary author, told me, the downward spiral looks similar. Without human data to train on, Shumailov said, “your language model starts being completely oblivious to what you ask it to solve, and it starts just talking in circles about whatever it wants, as if it went into this madman mode.” Wouldn’t a plug-in from, say, Wikipedia, avert that problem, I asked? It could, Shumailov said. But if in the future Wikipedia were to become clogged with articles generated by A.I., the same cycle — essentially, the computer feeding on content it created itself — would be perpetuated.
Ultimately, the study concluded that the value of data from “genuine human interactions” will be increasingly valuable for future L.L.M.s. At least for today’s Wikipedians, that seems like encouraging news, insofar as it suggests our new machines will need us, at least for a while, to keep them honest and functional — and dependent on us. Ensuring that an A.I. system is doing what’s in the best interests of humanity involves a theoretical concept known as alignment. Alignment is viewed as both an enormous challenge and an enormous priority for A.I., because a system out of sync with humans might create terrible damage. If A.I. ruins or compromises a mostly reliable system of free knowledge, it’s difficult to see how that aligns with our best interests. “One of the things that’s really nice about having humans do the summarization is that you get some sort of basic level of alignment by default,” Aaron Halfaker pointed out to me. “And if you appreciate the editors of Wikipedia are human, they have human motivations and concerns and that their motivations are providing high-quality educational material to align with your needs, then you can essentially put trust in the system.”
You can grasp the alignment argument better when you talk to people who devote their lives to the idea. When I asked Jade, who has more than 24,000 edits to her credit, why she spends her free time — typically 10 to 20 hours a week — editing Wikipedia, she said she believed in sharing knowledge. “Plus, I’m just a big nerd,” she said. We were speaking by Zoom, late in the evening, and it was a conversation that had little resemblance to other long evenings of dialogue I’d had with ChatGPT. Some of Jade’s work spoke to her personal interests in nature and birds, like an entry she wrote on the vermilion flycatcher, which got about 21,000 page views in the past 12 months. She also told me she works regularly on the Wikipedia entry on the American Civil War, which had 4.84 million views over the same period. Her goal was to continue to work toward completeness and greater accuracy in that Civil War article so that it achieves “featured” status on Wikipedia, a rare recognition (usually marked by a star) of an article’s quality that is awarded by Wikipedia’s editors to about 0.1 percent of English-language entries. “My calculations in the past are, you know, more than 10 million people read my work in a year,” Jade said, “so it’s an honor to have people reading all that.
“We are going to have to create processes, we are going to have to have hard conversations,” she said, about the ethics of using A.I. to create Wikipedia articles. When I asked her whether chatbots would soon eliminate her opportunities for volunteer work, she replied, “I don’t ever — maybe not never, but certainly not in this century do I see robots fully replacing humans on Wikipedia.”
I wasn’t as sure. The allure of a chatbot conversation, despite its factual shortcomings, already seemed too irresistible and too enchanting to too many millions of people. In fact, my own hours spent with ChatGPT had chipped away at my own neutral point of view — not because the informational exchange was so rigorous and detailed (it wasn’t), but because the interaction was so captivating and effortless. Nevertheless, Jade was resolute. “I’m an optimist,” she said.
Jon Gertner has been writing about science and technology for the magazine since 2003. He last wrote about new ways of searching the universe for intelligent life. Erik Carter is a graphic designer and an art director in New York. His work often plays off an internet aesthetic and mixes media to create humorous juxtaposition.
Advertisement