Bleeding Llama: A Critical Memory Leak in the World's Most Popular Local AI Platform

May 5, 2026
Share

SECURITY ADVISORY

Bleeding Llama

Critical Unauthenticated Memory Leak in Ollama

CVE-2026-7482  •  CVSS 9.1 CRITICAL 

Vulnerability Unauthenticated Heap Memory Leak (Out-of-Bounds Read)

CVE ID CVE-2026-7482

CVSS Score 9.1 - CRITICAL

Affected Product Ollama (all versions prior to patch)

Attack Vector Remote, Unauthenticated - 3 API calls

Exposed Servers ~300,000 internet-facing instances

Discovered By Cyera Research 

What We Found

Cyera's research team discovered a critical memory-leak vulnerability in Ollama, the world's most popular platform for running large language models (LLMs) locally.
With over 170,000 GitHub stars and 100 million Docker Hub downloads, Ollama is widely used across enterprises as a self-hosted AI inference engine.

The Vulnerability

The bug is an out-of-bounds heap read in Ollama's model quantization pipeline. When a user creates a model from an uploaded GGUF file (the standard format for storing LLM weights), Ollama reads tensor data from memory. A malicious actor can craft a GGUF file that declares a far larger tensor size than the actual data provided, forcing Ollama to read well beyond the intended buffer boundary - accessing sensitive data stored on the heap, such as system prompts, user messages, environment variables, and more

The attacker then leverages Ollama's built-in model push feature to exfiltrate the resulting file - complete with stolen heap data - to an attacker-controlled server.
The entire attack requires only three unauthenticated API calls:

Step 1

POST /api/blobs/sha256:…

Upload crafted GGUF file with inflated tensor shape

Step 2

POST /api/create

Trigger model creation - out-of-bounds read fills model file with heap data

Step 3

POST /api/push

Push model (and embedded heap data) to attacker-controlled server

What Is the Impact

Ollama launches with no authentication by default and listens on all network interfaces (0.0.0.0). With approximately 300,000 Ollama servers currently exposed on the public internet, this vulnerability is immediately and broadly exploitable - no credentials required.

What an Attacker Can Steal

Directly from AI Conversations

  • User prompts and chat messages
  • System prompts from all running models
  • Conversation history across all users

From the Host Environment

  • Environment variables (API keys, tokens, secrets)
  • Proprietary code submitted to the AI
  • Customer data and contracts reviewed via AI

Who Is Most at Risk

  • Enterprises using Ollama as a shared internal AI assistant - every employee interaction is potentially exposed.
  • Development teams using Claude Code or similar agentic tools routed through Ollama - tool outputs, file contents, and code are all in scope.
  • Organizations in regulated industries (healthcare, finance, legal) where prompt content may include PII, PHI, or privileged information.
  • Any deployment where Ollama is network-accessible without a firewall or authentication proxy in front of it.

What You Should Do

Organizations running Ollama should treat this as a Priority 1 incident and take the following actions immediately:

Immediate Actions (Within 24 Hours)

  • 1. Patch Ollama: Apply the vendor-released fix. The patch validates tensor element counts against actual buffer sizes before any quantization loop executes.
  • 2. Restrict network access: If patching is not immediately possible, block external access to Ollama's default port (11434) at the firewall level. Ollama should never be internet-facing without authentication.
  • 3. Audit running instances: Use your asset inventory or tools like Shodan to identify any Ollama instances exposed on public IPs within your organization.

Short-Term Hardening (Within 1 Week)

  • Enable authentication: Deploy an authentication proxy or API gateway in front of all Ollama instances. No Ollama API endpoint should be reachable without credentials.
  • Rotate exposed secrets: If your Ollama server was internet-accessible, assume environment variables and secrets in memory may be compromised. Rotate API keys, tokens, and credentials immediately.
  • Review agentic integrations: Audit any Claude Code, LangChain, or other tool integrations routing traffic through Ollama. All data passed through these tools should be treated as potentially disclosed.
  • Network segmentation: Ensure Ollama servers are on isolated network segments with strict egress controls to prevent future exfiltration attempts.

How Cyera Can Help

Cyera's data security platform gives organizations the visibility and control needed to assess exposure from vulnerabilities like Bleeding Llama and prevent data exfiltration across AI workloads.

Data Discovery & Classification

Identify sensitive data flows into AI systems - including what employees are submitting to Ollama. Cyera continuously maps where PII, credentials, proprietary code, and regulated data exist across your environment so you know exactly what was at risk.

AI Risk Posture Assessment

Assess your organization's exposure to AI infrastructure vulnerabilities. Cyera can identify unauthenticated or misconfigured AI inference endpoints, flag internet-facing Ollama instances, and provide a risk-prioritized remediation roadmap.

Data Loss Prevention for AI Workloads

Enforce policies that prevent sensitive data from being submitted to AI endpoints without authorization. Cyera's DLP capabilities extend to agentic pipelines and developer tooling, including integrations like Claude Code, ensuring sensitive outputs cannot be silently exfiltrated.

Threat Detection & Incident Response

Detect anomalous access patterns and data movement consistent with exploitation of vulnerabilities like CVE-2026-7482. Cyera's threat intelligence capabilities help security teams investigate potential past exposure and understand the blast radius of a breach.

Read the full technical report

Share