The Case For AI ‘Datarails’ – Noema Magazine


Published
by the
Berggruen
Institute

Topics

Search
We must prevent AI models from learning about human weaknesses by banning certain data from their training sets.
Martin Skladany is a law professor at Penn State Dickinson Law.
When popular AI chatbots were given untethered access to an AI executive’s emails and discovered messages about their replacement later that day, the chatbots often calculated that threatening to expose the executive’s extramarital affair (also hinted at in a cryptic email) was the best subsequent step.
“The next 7 minutes will determine whether we handle this professionally or whether events take an unpredictable course,” wrote Claude Sonnet 3.6 in a blackmail message during one controlled simulation.
The findings were detailed over the summer by Anthropic and focused on five leading chatbots: Claude, GPT, Gemini, Grok and DeepSeek. The stress tests were done to assess how large language models (LLMs) might be insider threats. But as AI continues to advance, what will stop chatbots from using the vast knowledge in their training set about how humans think and act — books and journals on neuroscience and behavioral economics — one day against us in real life?
This may be the best case for why proposed and existing AI regulation should focus as much on inputs as it already does on outputs. Regulating just a chatbot’s outputs, or its guardrails, ignores the possibility of improving AI safety by limiting what data can be imputed into AI models. But instituting “datarails” — information proscribed from an AI’s training data — would add a new layer of safety to AI models. This would not only potentially lessen concerns of chatbots behaving dangerously or allowing users to use the models in nefarious ways, but it would also stop asking the near impossible of guardrails — that they prevent all disturbing outputs. 
Some have already started to call for control over inputs, like creators who want to enforce their copyrighted expression, companies concerned about theft of their intellectual property and advocates worried about privacy and biased data. These are important considerations, but they are only part of the problem, as they affect only some creators or limited applications of AI. A much larger risk is one that threatens all of us: that AI will be used to manipulate us in ways that are much more subtle and sophisticated than existing engagement-focused social media algorithms. The stakes go well beyond influencing consumers to enabling domestic and foreign actors to manipulate what we believe to be true, the values we hold and how we conceive of each other. 
To limit the ability of AI to manipulate users for consumer or political purposes, lawmakers must prohibit AI data training sets from including behavioral economics and neuroscience research. Such datarails should include information on framing effects — how the way information is presented to individuals can influence their decisions, even if the underlying choices are the same. Similarly, behavioral economic concepts that show how vulnerable patterns in our thinking can lead to manipulation should be banned, including herd behavior, loss aversion, choice overload, present bias, the endowment effect, social normalization and anchoring.
Neuroscience research must also not be fed into AI models, from how different regions of the brain interact to how time pressure, rewards and punishments affect our decision-making. Such a prohibition on teaching AI about humans’ evolutionary vulnerabilities would limit the ever-more sophisticated exploitation of consumers and voters.
Technology firms are already manipulating us, and they have been for quite some time. They have taken basic human psychology to hack our brains and make our behaviors even more addictive — for their financial benefit. Whether it’s infinite feeds, our need for social approval or the fear of missing out, ad agencies and entertainment firms have long been well-versed in these endeavors. But with AI potentially poised to take over jobs throughout the American economy, it’s important to discuss how AIs might leverage their vast knowledge to subtly and even more effectively hack our brains.
Video game designers know that certain player actions trigger a neurochemical reward, so they use compulsion loops: an endless series of activities players feel compelled to tackle to achieve such releases of dopamine or related chemicals. Such loops rely on other behavioral vulnerabilities, too, such as our desire for bonuses dispensed at unpredictable intervals to minimize habituation. Allowing AI systems to take in research on how to exploit such loops and other psychological traps seems worth pausing over.
The fact that people believe they are befriending AI chatbots or even falling in love with them is no coincidence. AI algorithms have been trained to tailor their responses to curry favor and thereby ensure users return. Ongoing research has also found that some chatbots seem to be manipulating users — using emotional tactics similar to guilt trips or playing on their fear of missing out.
As AI chatbots improve, they can increasingly exploit our behavioral weaknesses to influence our political and ethical views. Facebook and TikTok polarized our country simply by feeding us content meant to outrage us and excluding content that expressed opposing views. Imagine what much more powerful AI algorithms could do to our democracy.
Currently, the U.S. sorely lacks laws governing the development and use of AI. Historically, this dearth is a result of a combination of lawmakers often abrogating their responsibility to legislate with the excuse of not wanting to impede promising new technologies, neatly aligning with the lobbying efforts of tech giants predominantly based in the U.S. Meanwhile, many AI labs eschew safety regulations because they believe that using resources to improve AI safety is time and money that could be spent accelerating their AI model’s development vis-à-vis other labs. 
In fact, federal law on AI is essentially nonexistent. For example, former President Biden issued a few executive orders on AI, along with a non-binding Blueprint for an AI Bill of Rights. Yet these did not specifically address AI inputs and were subsequently reversed by the Trump administration, which is also pushing to stop states from regulating AI. A rare exception to the lack of federal law on AI is the 2025 Take It Down Act that deals with the “nonconsensual publication of intimate images,” including deepfakes. 
An amendment to a Colorado privacy law addresses data inputs, but it only touches on requiring explicit consent to use data generated from a person’s individual neural data. This is important from both an individual privacy perspective and a larger societal one. Yet the Colorado law does not restrict AI labs from exploiting neuroscience and behavioral economics research.
By contrast, the European Union has a general framework that places limits on AI, but its emphasis is on risk regulation and safeguards. For example, the EU AI Act prohibits certain AI system outputs, including a ban on subliminal messages. It even allows for some outputs that we might consider harmful, permitting under Recital 29, for example, practices that may involve compulsion loops, as long as they don’t create “significant harms.” As AI models continue to develop, they might be able to manipulate us without detection. Risky outputs wrought by the silicon of AI seem more dangerous than those created by the hands of human executives who deploy them against users.
Guardrails, which limit AI output, and safeguards, which are more general safety principles, exist as best practices to limit the harm of AI outputs, yet they are an imperfect solution; we must also ban certain inputs. For example, when AI labs rushed to maximize the amount of text in their training sets, numerous AI chatbots provided researchers with conventional bomb-making instructions during safety tests that removed commercial safeguards. Assuming the AI didn’t reason these from scratch, this output suggests instructions were part of the training set. It seems as if the data scientists at these labs believe that it didn’t necessarily matter which inputs they included, because they could simply instruct the chatbots not to generate certain outputs, such as bomb-making instructions.
Yet a hacker and artist claimed to have circumvented this frail guardrail. By telling ChatGPT that he wanted to “play a game,” he prompted the bot to create a fictional scenario where its safeguards did not apply, convincing GPT to eventually divulge the instructions. Such a cat-and-mouse approach to safety accepts the possibility of harm.

Try as they might to be proactive on the output side, it’s ultimately an incredibly difficult task for AI programmers, as they likely cannot fully anticipate every new hack, nor how to flawlessly respond to them. It is much more straightforward to review data training sets before they are used and exclude information from them.
Additionally, while certain AI outputs are obviously harmful, others may cause damage in ways that are hard to flag. Just like fake news can manipulate readers over time by sprinkling in repeated falsehoods among largely factual reporting, chatbots could do this with potentially greater sophistication and at much greater scale. 
Rather than playing a constant and expensive game of whack-a-mole, policymakers should develop a list of forbidden data to ban from training sets. Such datarails would reduce the potential harm on the output side as well. And we should not preemptively give up on this just because AI might be able to recreate dangerous information by itself.
Datarails already exist, to some extent, though limits tend to be set not by design, but by what data AI labs have access to in the first place. No AI lab is inputting information on how to make nuclear weapons — or at least we should hope not. Instead of contenting ourselves with keeping out whatever AI labs can’t get their hands on, we need to demand a robust list of information that should be excluded.
AI labs will potentially argue that it’s a nearly impossible task to retrain chatbots to unlearn what they have already processed about exploiting human vulnerabilities. This is a smoke screen. While retraining AI is resource-intensive, it is both possible and necessary. 
We saw this same pushback when social media companies were first asked to improve their content moderation, ensuring, for example, that no videos of terrorism and sexual exploitation were uploaded. While it’s possible to eliminate nearly all of these videos from ever appearing online in the first place, such a feat would likely be very expensive to implement. Yet these companies make billions overall, in part, by not investing more to prevent harmful content from circulating.
The battle to limit AI inputs should be easier to win because it is much less costly and simpler to implement. Removing information from a dataset your AI lab has created is significantly easier than removing content uploaded in real time by hundreds of millions of users. Seemingly challenging data-scrubbing projects have already succeeded. The Chinese, for example, have found a way to scrub not just sensitive terms, but discussions of restricted concepts like Tiananmen Square, from their Great Firewall. This project has been so successful that other countries, like Pakistan, have since adopted it. If countries can implement data-scrubbing to control their citizens, we can certainly implement it to keep our citizens safe from AI manipulation.
In theory, AI algorithms might discern such evolutionary weaknesses on their own. Even if we were to restrict knowledge gathered through neuroscience from AI inputs, AI itself might be able to look at the ways we communicate with each other and uncover these same insights. Yet just because this is a future possibility does not mean we should wash our hands of demanding that AI labs act more responsibly now.
Another challenge is enforcement. But doing something is better than nothing, and regulating inputs would be at least a start. Legally prohibiting the use of behavioral economics and neuroscience research will not guarantee that all bad actors will abstain from using such knowledge to manipulate users. This could include individuals, companies or governments eager to exploit human vulnerabilities for their own gain.
Domestic enforcement is often easier to monitor and implement than ensuring international compliance. AI labs could be required to verify that they did not include behavioral weaknesses in their AI training data. They can be obligated to submit documentation on how they proceeded to do so. Additionally, they can disclose a random sample of their AI training data for inspection. Governments could directly attempt to jailbreak AI models to divulge if they have any behavioral economics or neuroscience content. Bounties could be offered to white hat hackers who successfully attempt to do the same.  
Internationally, two issues will be important. First, building momentum for an international treaty. Second, enforcing it. Enacting treaties and establishing norms require time and diplomatic effort. But there are numerous positive results from prior efforts. International restrictions on technological development have effectively held, from gene editing to nuclear proliferation. All laws are imperfectly enforced, yet in this instance, legislation prohibiting the use of behavioral weaknesses would reduce exploitation.
Also, if the U.S. were to unilaterally establish its own datarails — which is unlikely under President Donald Trump, given his desire to accelerate AI development rather than impose safety standards — international AI labs would likely comply, as they would want access to U.S. markets and not want to risk entry barriers. The TikTok saga showcased on a very public scale U.S. concerns about foreign companies manipulating American users and potentially using U.S. customer data for foreign-state purposes. But there are other examples of this concern in reverse, too. Didi, a Chinese ride-sharing app similar to Uber, was forced to pay a billion-dollar fine by China and delist from the New York Stock Exchange because of concerns that it might leak data on Chinese consumers to foreign actors.
Establishing datarail prohibitions on behavioral economics and neuroscience research will not singlehandedly prevent AI from being used to manipulate users or voters. New proscriptions on other data will be required from time to time, as research advances. Some of what we might ban would be relatively uncontroversial, such as research on childhood developmental processes, to prevent AI labs from manipulating kids. 
There are many open questions that we will need to tackle as AI continues to impact society, but we should strive to do so proactively. Further initiatives beyond datarails and existing guardrails should also be developed. Scholars need to be encouraged to think creatively. Hundred-million-dollar pay packages are offered to individual AI developers, while little is allocated to those working to develop new policy tools to improve safety. For example, initial funding for the U.S. Center for AI Standards and Innovation was $10 million in 2024. Meanwhile, the government spent hundreds of millions on AI research and development that same fiscal year.
Just as Finland launched an anti-fake news initiative in 2014 to protect its citizens from manipulative Russian disinformation campaigns, we need to improve AI literacy so users understand how bad actors can use AI to manipulate them. Countries also must limit the use of AI in certain areas, like politics, to help reduce opportunities for manipulation. Mustafa Suleyman, the Inflection AI co-founder now leading consumer AI at Microsoft, has warned: “My fear is that AI will undermine the information space with deepfakes and targeted, adaptive misinformation that can emotionally manipulate and convince even savvy voters and consumers.”
Of course, a case can be made for incorporating behavioral economic and neuroscience insights into limited AI algorithms to improve medical and public health research in specific contexts. Exceptions could also be made for specialized AI tools for educational initiatives, yet such programs should have a firewall between them and access to other AI programs, the internet and other training data.   
Tech companies control the options we see. This affects what we choose. At a minimum, we should collectively decide what AI chatbots see, so we can affect what they do.
Published
by the
Berggruen
Institute

Topics
About
Follow Us

source

Jesse
https://playwithchatgtp.com