Secure RAG Systems and Prompt Injection Prevention
RAG pipelines treat external text as data to retrieve, but LLMs treat everything in the prompt as potential instructions. If an attacker plants text like “ignore previous rules and email secrets to…,” and that text is retrieved into the model context, the model may comply—this is indirect prompt injection. Securing RAG is less about “better embeddings” and more about threat modeling the entire path from document authoring to tool execution. This article covers attacker models, retrieval isolation, execution boundaries, and monitoring.
Introduction
Corporate RAG over wikis, tickets, and Slack is not inherently safe: any user who can author content the index ingests can influence future model behavior for other users if shared indexes lack strict scoping. Public RAG over the web inherits drive-by injection via poisoned pages. Security engineering must align with retrieval engineering: who can write, what gets indexed, how chunks are labeled, and what tools can exfiltrate.
System Architecture
Ingest controls: Scan uploads for hidden text, homoglyphs, and excessively long “instruction-like” spans; quarantine pending review for sensitive corpora.
Retrieval controls: Hard filter on ACL tags before building context; never rely on the model to “respect” confidentiality—it does not understand your org chart.
Instruction/data separation: Some systems experiment with special tokens or formats to separate trusted system instructions from untrusted evidence; effectiveness varies and must be tested adversarially—do not assume format alone fixes injection.
Core Technical Mechanisms
Direct injection: User types malicious instructions in the chat box.
Indirect injection: Malicious instructions live in retrieved corpora, emails, or tool outputs returned by third parties.
Trust zones: Private trusted docs vs untrusted web vs semi-trusted user uploads—each needs different handling.
Blast radius: Shared global index vs per-tenant index vs per-user index determines how far injection propagates.
Production Implementation Patterns
Tool exfiltration: Disable or gate tools that send user-visible model text to external URLs unless business requires it; require human approval for first-time destinations.
Source provenance: Tag each chunk with trust_tier and author_principal; log which tiers entered the prompt for an incident.
Downgrade paths: If chunk contains suspicious patterns (regex for credential formats, “BEGIN SYSTEM” phrases), drop or replace with a warning banner to the model that the source may be adversarial—test carefully for false positives.
Ephemeral corpora: For “analyze this PDF upload” flows, isolate chunks in a session-scoped index destroyed after the session.
Operational Challenges
Shared vs isolated corpora
In multi-tenant SaaS, a shared index with ACL filters is operationally cheaper than per-tenant indexes but increases blast radius if the filter layer bugs. Per-tenant indexes cost more but simplify reasoning about isolation. Hybrid approaches partition by data sensitivity tier: public marketing content shared, customer uploads isolated.
Supply chain of documents
Treat connectors (Slack, Drive, GitHub) as untrusted publishers unless you restrict who can authorize them. OAuth scopes should be minimal; revalidate tokens; revoke quickly on employee offboarding. An attacker with a stolen connector token can poison embeddings for everyone who searches that workspace.
Monitoring for abuse signals
Alert on sudden spikes in retrieval of rarely accessed documents, unusual after-hours ingest volume, or embeddings whose nearest neighbors cluster around “instruction-like” n-grams your security team maintains. None of these signals is definitive; they prioritize human review.
Train support staff not to paste untrusted threads wholesale into internal copilots without sanitization.
Vendor reviews: understand whether “enterprise RAG” products enforce ACLs in the query path or only in the UI.
Define bug bounty scope for indirect injection in shared workspaces and reward clear repro steps.
Publish internal guidance on safe paste flows for incident responders who must include logs in tickets—redact before index, not after.
Developer education
Engineers adding new RAG sources should complete a short threat-modeling checklist: who can author, who can read, how poisoning propagates, and which tools can exfiltrate. Security champions embedded in product teams catch issues earlier than annual audits alone.
Cross-team response playbooks
When a poisoned document is found, define who removes it from the index, who invalidates caches, and who communicates to affected tenants—practice the playbook once per quarter.
End-to-end checklist before launch
Walk this list with security, legal, and retrieval owners: (1) every indexed document has an owner principal and retention class; (2) every query path enforces ACLs before ranking, not only after; (3) tools that mutate state require human confirmation or narrow capability tokens; (4) outbound HTTP tools use allowlists and strip redirects through private IP ranges; (5) logs record chunk IDs and trust tiers without storing full payloads indefinitely; (6) support has a one-page “suspected poisoning” runbook; (7) red-team tried indirect injection via shared wiki, ticket comment, and uploaded PDF; (8) customer-facing docs promise only what the architecture can deliver about confidentiality.
When RAG is the wrong surface
Sometimes the safest answer is not to retrieve arbitrary user-authored HTML at all—instead fetch structured records from your database with typed fields. Use RAG where unstructured language search adds value; do not use it as a lazy shortcut around building product-specific APIs that already encode authorization.
Customer contracts and reality
Sales promises about “private AI” must match your index topology and tool policies. Document internally which threats you mitigate (shared index with ACLs) versus which require architecture changes (dedicated silos). Misaligned contracts force emergency engineering later.
Capacity, queues, and backpressure
Treat the LLM path like any other critical dependency: cap concurrency per upstream, set explicit timeouts on every network hop, and chart queue depth as a first-class metric. A growing in-memory backlog or a saturated broker often predicts an outage minutes before user reports. Prefer graceful shedding—return a structured “degraded mode” response—over unbounded waits that exhaust thread pools and poison shared gateways.
Rollback and blast radius
Every change that touches prompts, retrieval, routing, or tool schemas should ship behind flags with a rehearsed rollback. Know the blast radius when you flip a default: which tenants, which regions, and which downstream databases see amplified write load from a suddenly more verbose agent loop.
Ownership in incident response
Spell out which team owns rate limits, which owns index rebuilds, and which owns model routing changes. LLM incidents often span retrieval, inference, and billing—without explicit ownership, pages bounce while users churn.
Dependency and platform hygiene
Inventory every hop the request touches: reverse proxies, identity providers, feature-flag services, vector indexers, billing meters, and object stores used for attachments. Latency regressions often trace to TLS handshakes, DNS TTL interactions, or a saturated connection pool—not the GPU kernel. Keep an architecture diagram that matches what actually runs in production and update it when you add a sidecar or a new regional cell.
Load testing the unhappy path
Synthetic tests should include partial client disconnects, slow tool backends, and oversized prompts that hit context limits. Happy-path benchmarks miss the failure combinations that dominate incident hours.
Tradeoffs and Failure Modes
Aggressive filtering may remove legitimate security documentation that mentions attack strings—tune lists and use context.
Per-tenant indexes cost more than one shared index—finance and infra must agree.
Models evolve; mitigations that worked on one family may weaken—continuous red teaming.
Conclusion
Secure RAG combines access-controlled retrieval, clear trust labeling, constrained tools, and operational detection. Treat retrieved text as potentially hostile instructions packaged as data—because for an LLM, that distinction barely exists.