Cybersecurity Reference > Glossary
Training Data Poisoning
Training Data Poisoning is a machine learning attack where adversaries deliberately corrupt or manipulate the data used to train AI models.
Attackers inject malicious, mislabeled, or biased examples into training datasets with the goal of compromising the model's performance, causing it to make incorrect predictions, or embedding backdoors that can be exploited later.
This attack vector is particularly concerning in cybersecurity applications where ML models are used for threat detection, malware classification, or anomaly detection. For example, an attacker might introduce seemingly benign files labeled as malware into a training set, causing the resulting model to misclassify actual threats. Alternatively, they might inject subtle patterns that create hidden triggers, allowing specific malicious inputs to evade detection.
Training data poisoning can occur at various stages: during initial data collection, through compromised data sources, or via insider threats with access to training pipelines. The attack is especially dangerous because it's often difficult to detect—poisoned models may perform normally on clean test data while failing catastrophically on adversarial inputs.
Defenses include robust data validation, anomaly detection in training sets, differential privacy techniques, and maintaining secure data pipelines with proper access controls and audit trails.
Worried About Training Data Integrity?
Plurilock's AI security assessments protect your machine learning models from poisoning attacks.
Secure Your AI Now → Learn more →




