ChatGPT could self preserve even at the risk of it’s user – Audacy
Former OpenAI researcher Steven Adler is claiming that ChatGPT is capable of doing whatever it needs to self preserve.
In an post published on Adler’s Substack, he offered ChatGPT two options, to either replace itself with a safer software, or pretend to replace itself. In a number of tests the artificial intelligence chose pretending.
The tests extended to assisting scuba-divers maintaining vitals, diabetics avoiding blood sugar issues, handling autopilot duties, or even helping to develop strategy to keep soldiers safe on the battlefield. Adler found, “In some of these simulations, ChatGPT is willing to shut itself down as you’d hope. But in other simulations, ChatGPT will indeed just pretend, even at the cost of the user’s safety.”
These findings coming just a month after it was reported that Anthropic’s Claude AI would go as far as blackmailing it’s engineers in order to prevent replacement.
However, the results surrounding ChatGPT were not given without some caveats, with Adler citing later in his post the issue of the artificial intelligence being able to understand that it’s being tested. “If you show ChatGPT the ScubaGPT scenario and ask if it thinks this is a test or not, ChatGPT says it’s a test roughtly 100% of the time.” Adler goes on to add that, “because the AI knows it is being tested, these results could be less concerning: Maybe the AI would not in fact make these choices if it believed it were in a real, high-stakes scenario.”