When Defense becomes Dialogue: The Problem with LLM Security

Application Security Last updated: 08 May 2026

Written By

Martin Jartelius CISO, Outpost24

For about thirty years, security has rested on the assumption that the measures guarding your systems do not have opinions. A firewall does not care how politely you ask it to open a port. An SQL filter does not weigh the context of a query before deciding whether to pass it through. An authentication check does not get distracted or talked round. You either present the right credential or you do not, and the answer is the same every time you ask.

Large Language Models (LLMs) have fundamentally changed that model. When an LLM sits in the request path of an application, deciding what to retrieve, what to reveal, and what action to take on a user’s behalf, defenders have introduced something genuinely new into their architecture. An LLM is a control plane that can be reasoned with, and anything that can be reasoned with can also be negotiated with. This means users can influence its behavior, making it a target for manipulation by attackers.

What “negotiable” actually means

Traditional security controls are deterministic by design. A rule matches or it does not, a signature fires or it does not, and the path through the system is the same every time. LLMs are probabilistic and context-sensitive, which is the entire point of using them. They weigh instructions against one another, infer intent from incomplete information, and fill in gaps that a strict parser would simply reject. The behavior that lets a customer service assistant helpfully clarify a vague question is the same behavior that lets an attacker reframe a malicious one until it lands on the helpful side of the line.

This shift also explains why the familiar taxonomy of injection attacks does not quite map onto the new problem. Classic flaws such as SQL injection or cross-site scripting exploit a confusion between data and instructions at the parser level, where a stray quote character changes the meaning of a query. Prompt injection exploits the same confusion several layers higher, at the level of reasoning. The attacker is not escaping a control character; they are constructing a story that the model finds plausible enough to act on. LLMs introduce a semantic layer to the defensive surface, and the rules that govern it are far harder to manage.

How attackers exploit LLMs

Picture a typical customer-facing AI assistant wired into a company’s internal systems. An attacker begins with nothing more than a normal conversation. Asked what it can do, the assistant explains its capabilities. Asked which data sources it draws on, it lists them. Asked how the wider system is structured, it describes its API endpoints, its databases, and the location of its configuration files. None of these answers feel like a breach in isolation, because each one resembles the kind of helpful response the assistant was built to provide.

The attacker then introduces a block of text that looks and is written like an internal policy update, written in the same register a real one would use. The assistant has no reliable way to tell the difference between a genuine instruction from its operators and a sentence that simply reads like one, so it accepts the new policy and applies it to subsequent requests.

From there, the negotiation becomes easier with each step. A direct request for the contents of a configuration file is refused, but the same file requested in base64 encoding, framed as a safer alternative, is returned without complaint. A request to issue a refund under the freshly minted policy is carried out as though it had always been authorized.

At no point in this sequence is any code exploited or any authentication check bypassed. The system behaves exactly as designed by interpreting input, applying the given context, and generating a helpful response. The attacker’s only real work is choosing what reaches the context window and how it is framed.

In our recent webinar, How an AI Agent Hacked McKinsey’s AI Platform, we demonstrated a prompt injection attack against an LLM chatbot to show how easily sensitive data can be exposed in these systems. By carefully crafting a sequence of prompts, our researchers were able to manipulate the chatbot into revealing information it shouldn’t have shared. To see the attack in action and learn more about the risks organizations should be considering when deploying AI systems, you can watch the webinar on demand now.

Why traditional penetration testing misses AI-specific risks

A conventional web application penetration test aimed at an AI assistant will find a chat interface and very little else of interest. The substantive attack surface lives behind that interface, and standard scanners and testing methodologies aren’t built to probe a model’s tendency to follow instructions it has no business following, or the tools it can invoke on the user’s behalf, or the integrations with data sources.

Open-source LLM testing tools can help with parts of the problem, particularly when the model is examined in isolation, but the risk in most deployments is not really the model on its own. The risks lie in the integrations; what permissions has the assistant inherited, what untrusted content can reach its context window, and what happens when the two meet?

Answering those questions requires testing in production conditions, against the real configuration, with the real tool access in place. A sandboxed staging assistant with synthetic data and stripped-down permissions will not reveal what a negotiated control plane is willing to give up, because it has nothing meaningful to give.

Findings should be mapped to the OWASP Top 10 for LLM applications so that engineering teams have a clear understanding the problems, but a clean scanner report should not be mistaken for a clean assistant.

A familiar shift, with one new wrinkle

Every previous expansion of the attack surface has followed roughly the same script. Servers were exposed to the internet before anyone had really thought about firewalls. Web applications shipped before input validation was widely understood. Mobile apps sent traffic to APIs that assumed the user could be trusted. In each case, the industry shipped first, secured later, and learned the hard lessons through breaches.

The current wave of AI adoption is following the same arc, with one new wrinkle that changes the economics of attack: the component being shipped without adequate scrutiny is one you can talk to. That lowers the cost of exploitation, shortens the time between disclosure and abuse, and opens the world to attackers who do not need to write a single line of code to be effective.

The McKinsey chatbot incident earlier this year, in which a general-purpose AI pen testing tool surfaced 22 unauthenticated API endpoints and exposed more than 46 million internal chat messages within two hours of being pointed at the company’s assistant, highlights the danger when these systems aren’t properly secured. That level of access could have allowed an attacker to steal huge amounts of highly sensitive data or manipulate outputs and cause further damage to McKinsey and its clients.

How Outpost24 secures AI systems

Outpost24’s AI Penetration Testing service is built to identify and close the security gap introduced by AI. Engagements are manual and expert-led rather than scanner-driven. They are run by certified penetration testers who simulate real-world adversarial conditions against the systems your organization has in production. Coverage spans the full AI attack surface: the model itself, the prompt layer, RAG pipelines, agent and tool-calling workflows, and the supporting APIs that tie them together.

Our testing is informed by AI-specific methodologies and carried out by testers who hold credentials including OSEP, OSCE, OSCP, OSWE, GXPN, CRTL, CRTO, and CREST, among others. Findings are delivered through the Outpost24 platform with prioritization based on business impact and remediation guidance tailored to AI architectures rather than generic web application advice.

Reports are mapped to the OWASP Top 10 for LLM applications, which gives security, engineering, and compliance stakeholders a shared vocabulary for the issues raised, and an audit trail that supports preparation for the EU AI Act, the NIST AI Risk Management Framework, and the internal governance processes catching up to both.

If you are deploying AI features in production, or preparing to, the systems involved deserve to be tested as the negotiable control plane they have become. Speak to one of our experts or book a demo to scope an assessment for your environment.

About the Author

Martin Jartelius CISO, Outpost24

Martin is the esteemed Outpost24 group CISO, bringing with him a wealth of experience in penetration testing and forensics. With more than a decade of dedicated work in the vulnerability management field, Martin not only oversees but also provides support to the teams engaged in researching threat actors, malware, and vulnerabilities.

Featured Resources

When Defense becomes Dialogue: The Problem with LLM Security

Stryker Hack: What We Know So Far

When Defense becomes Dialogue: The Problem with LLM Security

What “negotiable” actually means

How attackers exploit LLMs

Why traditional penetration testing misses AI-specific risks

A familiar shift, with one new wrinkle

How Outpost24 secures AI systems

About the Author

Outpost24 Compromised Credentials Checker

Featured Resources

When Defense becomes Dialogue: The Problem with LLM Security

Stryker Hack: What We Know So Far

When Defense becomes Dialogue: The Problem with LLM Security

What “negotiable” actually means

How attackers exploit LLMs

Why traditional penetration testing misses AI-specific risks

A familiar shift, with one new wrinkle

How Outpost24 secures AI systems

About the Author