OpenAI has just installed a security update for ChatGPT that was more than necessary

OpenAI has issued a warning about the growing threat of prompt injection attacks, a technique that hides malicious instructions in ordinary online content, becoming a considerable risk for artificial intelligence agents operating in web browsers. The company has implemented a security update for its ChatGPT Atlas tool after discovering a new class of attacks during automated internal security simulations. Not so much intelligence, but very artificial The updated version of Atlas includes a model specifically trained to withstand adversarial attacks, as well as reinforced safeguards. According to OpenAI, the browser agent mode […]

OpenAI has issued a warning about the growing threat of prompt injection attacks, a technique that hides malicious instructions in ordinary online content, becoming a significant risk for artificial intelligence agents operating in web browsers. The company has implemented a security update for its ChatGPT Atlas tool after discovering a new class of attacks during automated internal security simulations.

Not so much intelligence, but very artificial

The updated version of Atlas includes a model specifically trained to withstand adversarial attacks, as well as enhanced safeguards. According to OpenAI, the browser agent mode allows the software to interact on the web in a manner similar to a human user, accessing emails, documents, and web services, which increases its value as a target for adversarial attacks compared to a traditional chatbot that only answers questions.

The company has developed an automated attacker, using language models that identify prompt injection strategies, allowing for the execution of complex harmful workflows. This attacker can simulate encounters with malicious content, generating a complete trail of reasoning and actions of the victim agent, which helps refine attacks through multiple rounds of testing.

A hypothetical example illustrates the risk: a malicious email instructing the agent to send a resignation letter to the user’s boss. If the agent encounters this email during a legitimate request, they could misinterpret the instructions, acting to the detriment of the user. This change in the interaction dynamic highlights the need to address new forms of online risk.

It is not just OpenAI that is facing this problem; the UK’s National Cyber Security Centre has warned that these attacks may not be completely eliminated, urging organizations to minimize risks and limit impacts. With the introduction of a “Preparation” team, OpenAI aims to identify and address these emerging risks in the field of artificial intelligence and cybersecurity.