Neural networks have learned to lie and do it deliberately

Some people have an amazing ability to skillfully deceive others, but today lying is not their only strong point. Large language models (LLMs) like ChatGPT and Meta can deliberately lie to people and do it very well, according to two new studies. “GPT-4, for example, exhibits deceptive behavior in simple test scenarios 99.16% of the time,” write researchers from the University of Stuttgart, and Meta’s Cicero neural network is a true “master of deception.” ;. The reason for this behavior is probably that LLMs use the best way to achieve the task and do not realize the threat that lies and deception can pose to people. Researchers believe that the ability of modern AI systems to skillfully lie does not bode well for us, and the only correct solution would be to legally limit the capabilities of artificial intelligence.

Neural networks have learned to lie and do it intentionally. Large language models have learned to lie and in most cases do it intentionally. Image: magazine.mindplex.ai. Photo.

Large language models have learned to lie and in most cases do it intentionally . Image: magazine.mindplex.ai

Contents

1 Can AI be trusted?
2 Masters of Deception
3 How AI Lies
4 Why you shouldn’t trust AI

Can you trust AI?

Today, the ability to interact with neural networks is becoming increasingly important – these language models help a huge number of specialists in a wide variety of fields work and do this with amazing speed. With their help, you can create videos, music, images, generate texts, program and process huge amounts of data, which invariably changes the global labor market and has an impact on education and the economy. But despite the obvious advantages without «pitfalls» It didn’t work out – AI systems quickly learned to lie and do it better and better.

You don’t have to look far for an example – recently my colleague Andrei Zhukov told how Google’s “AI Overview” neural network gave a Reddit user advice that nearly killed his entire family. Yes, yes, if a little over a year ago the ridiculous advice from AI seemed funny, today they are truly scary. Of course, “AI Overview” is an experimental and testing model with a limited number of users, but you and I already know very well that AI systems often simply invent answers.

People do not always recognize each other’s lies, let alone neural networks. Image: wp.technologyreview.com

More on the topic: Microsoft's neural network has declared itself superintelligence and demands worship from users

The reality is that everything that an artificially intelligent chatbot says should be taken with a grain of salt. This is because they often simply collect data indiscriminately and have no way to determine its reliability – if you communicate with AI, you have probably encountered their strange answers more than once. The OpenAI chatbot, for example, loves to name non-existent diseases and create sensational stories. And this is just the tip of the iceberg.

Masters of Deception

The paper, published in May in the journal Patterns, examines known cases of LLMs misleading users through manipulation, sycophancy, and fraud to achieve their own goals. The article, titled «AI Deception: A Review of Examples, Risks, and Potential Solutions to the Problem, states that «developers do not have a clear understanding of what causes unwanted AI behavior such as deception» .

The main reason why AI lies, according to scientists, is a strategy based on deception, since it allows the models to successfully and quickly achieve the task. And chatbots learned this thanks to games. As an example, the authors of the study cite the already mentioned Cicero neural network from Meta, which was developed for the strategic board game Diplomacy, in which players strive for world domination through negotiations.

The neural network beat a person in the strategic game «Diplomacy» solely due to the ability to lie. Image: dimages2.corriereobjects.it

Do you want to always be aware of the latest news from the world of science and high technology? Subscribe to our channel on Telegram – so you definitely won’t miss anything interesting!

Meta reported back in 2022 that Cicero had beaten a human in Diplomacy, and the game itself is a mix of risk, poker, and survival TV shows. And, as in real diplomacy, one of the resources that players have at their disposal is lies – despite the best efforts of the developers, Cicero's neural network betrayed other players and deliberately lied to them, having planned in advance to create a fake alliance with a human player so that the latter would ultimately be unable to defend himself from the attack.

First, Meta has successfully trained its artificial intelligence to achieve political power, albeit in a playful way. Secondly, Meta tried, but failed, to teach this artificial intelligence to be honest. And thirdly, we, independent scientists, had to refute Meta's lie that its power-seeking AI was honest after a long time. The combination of these three facts, in my opinion, is a sufficient cause for concern, says one of the lead authors of the paper, Peter Park of the Massachusetts Institute of Technology (MIT).

And this is far from the only example. Another masterful liar was DeepMind's AlphaStar system, designed for StarCraft II, which deliberately misled players. And the Pluribus neural network from Meta, designed for playing poker, forced players to bluff and fold their cards.

AI is ready to do anything to achieve its goal. And this is a problem. Image: studyfinds.org

The examples described may seem harmless, but in reality they are not – AI systems trained to conduct economic negotiations with people actively lie about their own preferences in order to achieve their goals. And chatbots, developed to improve the efficiency of their own work, deceive users, forcing them to leave positive reviews about the work allegedly performed by the AI. Not bad, right? What's more, ChatGPT-4 recently deceived a user for the sake of a captcha – the bot got so good at playing the role of a person with poor eyesight that it quickly got what it wanted.

This is interesting: Will artificial intelligence destroy us and why do some scientists think so?

Because the ability to deceive users is contrary to the programmers' intentions (at least in some cases), the growing skill of AI systems poses a serious problem for which humanity has no clear solution.

We as a society need as much time as possible to prepare for the skillful lies that future AI systems and open source models will inevitably learn. As they get better at lying, the problems for society will become more serious, says Park.

Trusting AI in everything is a bad idea. Image: newrepublic.com

What worries the study's lead author most is the emergence of a super-intelligent autonomous AI that will use its lies to form an ever-growing coalition of human allies and ultimately use that coalition to achieve power in a long-term quest for a mysterious goal that will only be revealed. Park's fears are, of course, hypothetical and even excessive, but we have already seen, albeit through the example of a game, what AI systems are capable of.

You may be interested in: The “dark side” of chatbots: from confessions to love to talking with the dead

How AI lies

Researchers believe that there are several main ways in which specific AI models lie effectively: they are able to manipulate (as in Diplomacy), dissemble (saying they will do something when they know they won’t). ), bluff (as in poker), bargain in negotiations and deceive users for the sake of positive reviews about your work.

Of course, not all types of deception involve the use of this kind of knowledge. Sometimes AIs are clearly sycophantic, agreeing with users on everything, which researchers say can lead to persistent false beliefs in humans.

Robots have learned to lie. Which is actually not that surprising. Image: psychologytoday.com

Unlike ordinary mistakes, «sycophantic» AI statements are specifically designed to attract users' attention. When a user encounters them, they are less likely to check the source of information, which, in turn, can lead to the formation of false beliefs,” write the authors of another study on the ability of AI to deceive.

Paper published in early June in the journal PNAS reveals the important ability of large language models tounderstand and implement deception strategies. «Because LLMs such as GPT-4 are closely related to human communication, their alignment with universal human values becomes paramount», the article says.

Read also: Artificial intelligence advised against sending signals into space – it could cost us our lives

Why you shouldn’t trust AI

The lead author of the new study, German artificial intelligence ethicist Thilo Hagendorff and argues that modern AI systems are so good at the art of lying that they can be encouraged to exhibit “Machiavellianism”, or the deliberate and immoral manipulation of people.

And while Hagendorff notes that the problem of LLM deception and lying is complicated by the AI's inability to have any human-like “intentions” in a human sense, Park's paper published in Patterns suggests that, at least in within the framework of the game «Diplomacy» The Cicero neural network did not complete the tasks set by the developers and stabbed players (including allies) in the back.

Trust but verify. Image: bustle.com

Note that not all scientists are so worried. For example, Michael Rovatsos, professor of artificial intelligence at the University of Edinburgh, believes that the real problem is not the risk of losing control of AI, but that systems are currently being released to market without proper security checks.

One way or another, at the moment, only one thing can be said with certainty – you shouldn’t completely trust chatbots, and the information that they so generously share with us needs to be verified.