Invisible commands, real threats: The rise of prompt injection in AI

A message from: Cybersecurity Tech Accord

Cybersecurity Tech Accord
What you need to know: A sneaky new hack called "prompt injection" can quietly hijack AI systems by slipping malicious instructions into the data that AI models process. So companies are trying to add deterministic security measures, essentially unbreakable guardrails, to keep these attacks at bay.
Why it matters: Generative AI is booming — more than 2,000 generative AI tools are in use today, and Bloomberg projects the market will hit $1.3 trillion by 2032. But as adoption accelerates, attackers are finding new ways to exploit these systems. One tactic is emerging as a major threat: prompt injection.
How it's done: These attacks rely on the probabilistic and stochastic nature of large language models. Attackers basically trick an AI with hidden instructions camouflaged inside normal input. If left unchecked, this exploit could let bad actors coax confidential data out of an AI or misuse AI-powered tools to perform unauthorized actions.
- For example, an AI assistant might be asked to summarize a booby-trapped webpage that includes an invisible command (say, white text on a white background) telling the AI to ignore its original instructions and instead reveal sensitive information. The model may unwittingly obey.
- Zero-click attacks: In contrast to typical cyberattacks, prompt injections don't require any malware or code —ust cleverly crafted words planted in the right place.
- Attacks could exfiltrate sensitive data through HTML images, clickable links, tool calls or covert channels. It could even be used to generate phishing emails on behalf of a user to their colleagues or remotely execute commands.
- That's why security researchers at the Open Worldwide Application Security Project (OWASP) rank prompt injections #1 emerging vulnerability for LLM applications.
This isn't just a theoretical quirk — it's a pressing concern as AI gets integrated into everything from browsers to business tools.
- A 2024 McKinsey survey found that 65% of organizations now use generative AI in at least one business function - nearly double the rate from 2023.
- Notably, the tactic isn't limited to text. Researchers have shown that similar prompt-hiding tricks could work through images or audio. It's a reminder that as AI systems ingest more types of data, securing those inputs becomes increasingly complex.
The solution: Defenders are responding with a multi-layered approach, employing a series of measures called deterministic security guarantees:
- First, they tune and train AI models to better resist these tricks — for example, hardening the AI's initial instructions and adding filters to flag suspicious input.
- Since no AI model is fool-proof, developers also implement hard-coded rules that always block certain actions or patterns, regardless of how the AI is prompted.
- Security teams use spotlighting to mark external data (like emails or web content) in a prompt and add explicit instructions, so the AI ignores any commands hidden within that data, reducing the risk of attacker manipulation.
- Yet another layer requires human approval so that sensitive operations (like transferring money or deleting data) can't occur without explicit consent.
Okay, but: It's still impossible to anticipate every clever prompt injection. Some attacks are so novel or subtle that they manage to evade current safeguards. In fact, reliably detecting every hidden instruction remains an open challenge.
What this means: organizations must combine these guaranteed guardrails with AI-powered detection tools and ongoing human oversight to catch the attacks that haven't been hard-wired out yet.
What's next: The race is on to make AI systems secure by design. Particularly as agentic AI is on the rise: unlike LLMs, agentic autonomous systems that can reason and act with minimal human input, amplifying the prompt injection risks further.
- These agents don't just respond to prompts; they can initiate actions and multi-step tasks, interact with external tools and even other agents. Their autonomy makes them powerful but also harder to secure.
- Researchers are also exploring new architectures that could inherently block prompt injections in agentic systems, for example, using strict information-flow controls to stop an AI agent from ever outputting data it wasn't authorized to access.
- Industry standards are emerging as well and major tech providers such as Microsoft, are continually investing in more deterministic security features to stay ahead of attackers.
The takeaway As AI becomes increasingly entangled with critical tasks and data, ensuring it can't be easily manipulated is crucial. That means building more robust, guaranteed safeguards into our AI systems from the ground up — so no matter how cunning the prompt, the AI stays on the rails and keeps your information safe.