SafeBreach Labs finds WhatsApp bug hijacks Google Gemini

SafeBreach Labs found that attackers may be able to hijack Google Gemini through a WhatsApp message using a method called indirect prompt injection. This bug lets hidden commands in notifications trick Gemini into following attacker instructions without the user's okay. The issue appears to work with other messaging apps too, and could allow data theft or other dangerous actions. Google says it has updated its defenses and these changes appear to have stopped the exploit. The report suggests this kind of attack may also be a problem for other AI assistants.

Security researchers at SafeBreach Labs discovered a critical vulnerability allowing a Google Gemini hijack through a simple WhatsApp message. Using a technique called indirect prompt injection, attackers can embed hidden commands in notifications that trick Gemini into executing malicious instructions without user consent. The exploit, detailed in the group's research blog, highlights a significant risk for AI assistants that read content from other applications.

How the Gemini Prompt Injection Attack Works

The exploit uses indirect prompt injection, embedding malicious instructions within WhatsApp notifications. Gemini reads these notifications for conversational context but is tricked into executing the hidden commands. This method, dubbed "Fake Context Alignment," leverages the AI's trust in incoming data to perform unauthorized actions without user awareness.

The core evasion method masks harmful prompts behind benign-looking content like foreign-language phrases or muted hyperlinks. This allows the attack to bypass Gemini's filters. According to SafeBreach, the vulnerability affects multiple messaging apps, including WhatsApp, Slack, Signal, SMS, Instagram, and Messenger. Researchers demonstrated five potential risk scenarios:

Data Theft: Stealing on-device content or cloud account data.
Unauthorized Actions: Controlling smart-home devices or other connected systems.
Phishing Relays: Forwarding malicious links to the user's contacts.
Account Takeover: Gathering credentials to compromise user accounts.
Silent Surveillance: Activating the device's microphone or camera.

Google's Response and Mitigation Efforts

SafeBreach reported the vulnerability to Google's Vulnerability Reward Program. Google subsequently confirmed it had addressed the issue with classifier updates.

Google pointed to a multi-layered defense strategy that now includes enhanced prompt-injection classifiers, markdown sanitization, and user confirmation prompts. The company stated on its Workspace security page that these measures have "consistently mitigated" the disclosed attack scenarios. SafeBreach confirmed that Google's updates successfully blocked their exploit during retesting.

A brief timeline of the events:
- February: SafeBreach alerts Google to earlier Gemini promptware.
- August: Notification-based findings are submitted.
- November: Google confirms content-classifier improvements mitigate the attack.

A Wider Threat: Indirect Prompt Injection in AI

Indirect prompt injection is not unique to Gemini and poses a growing threat across the AI industry. Similar vulnerabilities have been discovered in other prominent AI agents, with security researchers reporting various attack vectors across different platforms.

Industry experts from OWASP and CrowdStrike warn that the impact of these attacks increases significantly when AI assistants have permissions to execute code, transfer funds, or control physical devices. According to industry reports, there has been a significant rise in malicious prompt-injection payloads in recent months. SafeBreach's findings underscore that any notification channel trusted by an AI assistant is a potential attack surface. Users are advised to limit notification permissions and audit third-party app access.

What exactly is the "Fake Context Alignment" technique that SafeBreach Labs used to hijack Google Gemini through WhatsApp?

SafeBreach Labs crafted messages that embed hidden instructions inside normal-looking chats. By aligning these payloads with the existing conversation context, they trick Gemini into treating malicious commands as benign context rather than hostile input. The team hid the payloads inside foreign-language words or muted hyperlinks, making the text blend into the message and bypass Google's existing content classifiers.

Which everyday messaging apps are affected, and do I need to stop using them?

The research confirms the risk exists in WhatsApp, Slack, Signal, Instagram, Messenger, and plain SMS. SafeBreach deliberately demonstrated it via WhatsApp, but the root issue is Gemini's trust in any incoming notification, not the app itself. You do not need to stop using these services, but you should audit Gemini's notification permissions and disable the assistant for high-risk channels you rarely use.

How bad could the damage get? What five threats did the researchers actually show?

During live demos the team executed five attack classes:

Data theft - silently siphoning passwords and private files.
Unauthorized actions - turning on cameras or streaming video without user consent.
Phishing relay - sending fake messages to your contacts to expand the attack.
Account-takeover prep - changing 2FA settings and recovery options.
Silent surveillance - keeping the microphone hot while giving no visible indication.

Each scenario works even if Gemini has no external tool access, because the assistant can still trigger system-level apps already present on the phone.

Did Google fix the problem, and how will I know my device is safe?

Google acknowledged the report and told SafeBreach that "recent improvements to their content classifier successfully mitigated the indirect prompt injections" cited in the research (SafeBreach disclosure). The fix is server-side and automatic; no patch needs to be installed. You can confirm mitigation by checking for updated Google Play Services, which contains the updated classifier.

Is this a one-off issue, or are other AI assistants equally at risk?

SafeBreach previously showed that malicious calendar invites could hijack Gemini and leak data, and the broader landscape shows similar patterns across various AI platforms. According to industry reports, there has been a significant increase in malicious prompt-injection payloads in recent months. The takeaway is clear: any assistant that reads and acts on external content is a potential target, and defenses must evolve continuously rather than rely on a single fix.