AI Code Hallucinations | VibeCodeWiki

What Are AI Code Hallucinations?

AI code hallucinations are instances where a large language model generates code, APIs, function signatures, library syntax, or technical facts that are plausible-sounding but incorrect — and presents them with the same confidence as correct information.

The term “hallucination” comes from the AI research community and describes a fundamental property of generative models: they produce outputs based on statistical patterns learned from training data. When asked about something outside their training distribution, or when patterns from training conflict, models generate outputs that fit the statistical context even when no ground truth exists — producing confident-sounding errors.

For code generation specifically, hallucinations are particularly dangerous because incorrect code that looks correct can be integrated without triggering immediate errors, revealing its flaws only later in staging or production.

Common Categories of Code Hallucinations

Non-Existent APIs and Methods

The most common hallucination type: the model generates a call to a function, method, or API endpoint that doesn’t exist.

# Hallucinated: pandas doesn't have a convert_to_json() method
df.convert_to_json(orient='records')

# Correct:
df.to_json(orient='records')

Incorrect Method Signatures

The function exists but the model generates the wrong arguments, argument order, or return type.

// Hallucinated: fs.readFile doesn't work this way in this version
const content = fs.readFile('./file.txt', 'utf-8');  // Not a promise, returns void

// Correct:
const content = await fs.promises.readFile('./file.txt', 'utf-8');

Outdated Syntax and Patterns

The model generates syntax that was valid in an older version of a library but is deprecated or removed in the current version.

// Hallucinated: React.createClass was removed in React 17
const MyComponent = React.createClass({...});

// Correct (React 18+):
const MyComponent = function() {...};

Invented Package Names

The model generates an import from a package that doesn’t exist on npm/PyPI.

# Hallucinated: this package does not exist
from ml_utils.preprocessing import smart_impute

# The actual function the model was thinking of:
from sklearn.impute import SimpleImputer

Plausible but Wrong Logic

The most subtle and dangerous hallucination: code that is syntactically correct and uses real APIs, but contains logical errors that produce incorrect results in edge cases or specific conditions.

Why Hallucinations Occur

Understanding the mechanism helps develop appropriate defensive strategies:

Training data limitations: Models learn from code repositories with a cutoff date. Newer library versions, deprecated APIs in recent major versions, and API changes after the training cutoff produce hallucinations because the model’s “knowledge” is outdated.

Statistical pattern matching under uncertainty: When the model encounters a task where it doesn’t have strong training signal (unusual library, obscure API, rarely-seen pattern), it generates the most statistically likely completion — which may be partially or entirely wrong.

Context conflation: When a model has seen similar-but-different APIs in training (e.g., the JavaScript Array.flat() and Array.flatten() — the latter is not real), it may conflate them and generate the wrong one.

Confident generation: Language models don’t have a built-in uncertainty signal. Correct and incorrect outputs are generated with the same syntactic confidence. The model doesn’t know it doesn’t know.

Detection Strategies

Type Checking as Hallucination Detector

A strict TypeScript or Python type checker will flag many hallucinations at compile/lint time. If the model invented a method that doesn’t exist, the type checker will report “Property ‘xxx’ does not exist on type ‘YYY’.” Enable strict type checking for all AI-assisted projects.

Run Code, Don’t Just Read It

The only reliable hallucination detector for logic errors is execution. Run the generated code against a representative test suite before integration. Code review alone cannot catch incorrect behavior that is syntactically plausible.

Verify New API References

When the model generates a call to an API or method you’re not familiar with, verify it exists in the official documentation before integration. This takes 30 seconds and catches the most common hallucination type.

Watch for Suspiciously Perfect Code

Paradoxically, code that is unusually clean, complete, and error-free on the first generation — particularly for complex tasks — warrants extra scrutiny. This pattern sometimes indicates the model is generating from a memorized template that doesn’t perfectly match your requirements.

Minimizing Hallucinations in Practice

Provide the API reference in context: Copy the relevant section of official documentation into your prompt. “According to the Prisma docs [paste], write a query that…” anchors the model to ground truth.

Use models with knowledge cutoffs that match your library versions: A model trained in early 2023 may not reliably know about APIs introduced in late 2023.

Test with specific inputs: When verifying logic-level correctness, test generated functions with edge case inputs — empty arrays, null values, boundary conditions — that reveal subtle errors.

Ask the model to cite its confidence: “Are you certain this API exists? If you’re not sure, say so.” Models sometimes acknowledge uncertainty when directly asked, though this is not reliable.

Validate with a second model: For high-stakes code, cross-check AI-generated solutions with a second model. Consistent hallucinations across models are less common; if two models agree and the code makes sense, it’s more likely to be correct.

Building a Hallucination Detection Habit

The most effective hallucination defense is a verification habit: before integrating any AI-generated code that references an API, method, or package you’re not 100% certain exists, spend 30 seconds verifying it in the official documentation. This habit catches 90% of hallucinations before they reach your codebase.

Keep a personal log of hallucinations you catch — patterns emerge over time. Models tend to hallucinate consistently in specific areas (certain library versions, less popular frameworks, edge-case API signatures). Knowing your model’s failure modes makes you a more effective reviewer.