Cybersecurity Reference > Glossary
Prompt Injection
A prompt injection is a cyberattack that manipulates AI language models by inserting malicious instructions into user prompts.
Attackers craft input text that appears normal but contains hidden commands designed to override the AI's intended behavior, bypass safety restrictions, or extract sensitive information from the system.
These attacks exploit the way AI models process natural language instructions. Since these systems are trained to follow directions embedded in text, malicious actors can disguise harmful commands within seemingly innocent queries. For example, an attacker might hide instructions to ignore previous safety guidelines or reveal confidential training data within what appears to be a routine question.
Prompt injections can occur directly through user interfaces or indirectly when AI systems process compromised external content like websites, documents, or emails. The attacks may aim to generate inappropriate content, leak proprietary information, perform unauthorized actions, or manipulate the AI's responses to spread misinformation.
Defense strategies include input sanitization, output filtering, implementing strict prompt templates, and using separate AI models to detect malicious prompts. However, these defenses remain challenging to implement perfectly, as the flexibility that makes language models useful also makes them vulnerable to creative manipulation attempts.
Need Protection Against Prompt Injection Attacks?
Plurilock's AI security solutions can safeguard your systems from malicious prompt manipulations.
Secure My AI Systems → Learn more →




