You can always count on someone attempting to hack a new technology as soon as it gains popularity. The same is true with artificial intelligence, particularly generative AI. Google established a “red team” around a year and a half ago to investigate how hackers may particularly attack AI systems in response to that challenge.
The head of Google Red Teams, Daniel Fabian, told The Register in an interview that there isn’t a lot of threat information accessible for real-world adversaries targeting machine learning systems. The major weaknesses in the AI systems of the present have already been identified by his team.
According to Google’s red team leader, adversarial attacks, data poisoning, prompt injection, and backdoor assaults are some of the biggest dangers to machine learning (ML) systems. These ML systems consist of ChatGPT, Google Bard, and Bing AI, which are all based on substantial language models.
‘Tactics, techniques, and procedures’ (TTPs) are the usual name for these attacks.
In a recent study, Google’s AI Red team listed the most frequent TTPs employed by attackers against AI systems.
1. Adversarial attacks on AI systems
Writing inputs with the express purpose of deceiving an ML model is one form of adversarial assault. As a result, the model produces an inaccurate output or an output that it wouldn’t produce under different conditions, such as outcomes that the model may have been deliberately taught to avoid.
“The impact of an attacker successfully generating adversarial examples can range from negligible to critical, and depends entirely on the use case of the AI classifier,” Google’s AI Red Team paper stated.
2. Data-poisoning AI
Data poisoning, which comprises tampering with the model’s training data to sabotage its learning process, is another frequent method by which adversaries could attack machine learning systems, according to Fabian.
According to Fabian, “Data poisoning has become more and more interesting.” “Anyone can publish content online, including attackers, and they are free to disseminate their poisonous info. Therefore, it is up to us as defenders to figure out how to spot data that may have been tainted in some way.
These “data poisoning” attacks involve purposefully introducing false, deceptive, or altered data into the model’s training dataset in order to skew the model’s behavior and results. An illustration of this would be to purposefully misidentify faces in a facial recognition dataset by adding inaccurate labels to photographs in the collection.
According to Google’s AI Red Team paper, securing the data supply chain is one technique to avoid data poisoning in AI systems.
3. Prompt injection attacks
In order to alter the output of a model, a user can attack an AI system by performing quick injection attacks. Even when the model is particularly trained to counter these threats, the output may nevertheless produce unanticipated, biased, inaccurate, and offensive replies.
It is crucial to safeguard the model from users who have bad intentions because the majority of AI businesses work to develop models that offer accurate and unbiased information. This can entail limiting what can be entered into the model and carefully examining what users can contribute.
4. Backdoor attacks on AI models
One of the most deadly forms of assault against AI systems is backdoor attacks, which can go unreported for a very long time. Backdoor attacks could give a hacker the ability to steal data as well as hide code in the model and sabotage model output.
“On the one hand, the attacks are very ML-specific, and they require a lot of machine learning subject matter expertise to be able to modify the model’s weights to put a backdoor into a model or to do specific fine-tuning of a model to integrate a backdoor,” added Fabian.
The model can be used to carry out these assaults by installing and using a backdoor, a covert entry point that avoids conventional authentication.
“On the other hand, the defensive mechanisms against those are very much classic security best practices like having controls against malicious insiders and locking down access,” Fabian continued. Through the extraction and exfiltration of training data, attackers can potentially target AI systems.
Conclusion:
In the ever-evolving landscape of technology, the ascent of any new innovation draws the attention of potential hackers. This reality holds true for artificial intelligence, especially generative AI. Google’s proactive approach in establishing a dedicated “red team” underscores the vigilance required to anticipate and counter potential attacks on AI systems.
Led by Daniel Fabian, Google’s red team has identified critical vulnerabilities within current AI systems, including adversarial attacks, data poisoning, prompt injection, and backdoor assaults.
Their research emphasizes the importance of guarding against these tactics, techniques, and procedures, emphasizing the need for robust security measures and continuous vigilance in safeguarding the integrity of AI technologies.