CVE-2025-49847: Buffer Overflow Vulnerability in llama.cpp Leading to Potential Code Execution.

Overview

CVE-2025-49847 is a significant vulnerability found in the llama.cpp, a C/C++ implementation of several LLM models. This vulnerability is of high concern due to its potential to allow an attacker to cause arbitrary memory corruption and even execute unauthorized code. This could lead to significant system compromise and data leakage, affecting various applications and services that rely on affected versions of llama.cpp. Given the potential severity of the impact, it’s crucial for organizations to understand this vulnerability and take appropriate measures to mitigate it.

Vulnerability Summary

CVE ID: CVE-2025-49847
Severity: High (8.8 CVSS Score)
Attack Vector: Network
Privileges Required: None
User Interaction: None
Impact: System compromise and potential data leakage

Affected Products

A new way to communicate

Ameeba Chat is built on encrypted identity, not personal profiles.

Message, call, share files, and coordinate with identities kept separate.

• Encrypted identity
• Ameeba Chat authenticates access
• Aliases and categories
• End-to-end encrypted chat, calls, and files
• Secure notes for sensitive information

Private communication, rethought.

Download Ameeba Chat Learn More

Product | Affected Versions

llama.cpp | Prior to version b5662

How the Exploit Works

The vulnerability lies in the vocabulary-loading code of llama.cpp. Here, a helper function, _try_copy in llama_vocab::impl::token_to_piece(), incorrectly casts a very large size_t token length into an int32_t. This results in the bypassing of the length check (if (length < (int32_t)size)), and memcpy is still called with that oversized size. A malicious GGUF model vocabulary provided by an attacker can take advantage of this to overwrite memory beyond the intended buffer, thereby leading to arbitrary memory corruption and potential unauthorized code execution.

Conceptual Example Code

Below is a conceptual example of how this vulnerability might be exploited. This is represented as a pseudocode for an attacker-supplied GGUF model vocabulary with an oversized token.

// Malicious GGUF model vocabulary
std::string malicious_vocab = createOversizedToken();
// Loading malicious vocabulary in llama.cpp
llama_vocab vocab = llama_vocab::load_from_string(malicious_vocab);
// Triggering buffer overflow
vocab.token_to_piece(oversizedToken);

In this example, createOversizedToken() is a function that creates a token larger than int32_t can handle. The oversized token is then loaded into llama.cpp through the load_from_string function, and the buffer overflow is triggered when token_to_piece is called with the oversized token. This could potentially lead to memory corruption and unauthorized code execution.

Ameeba Security Research

CVE-2025-49847: Buffer Overflow Vulnerability in llama.cpp Leading to Potential Code Execution.

A new way to communicate

More posts

CVE-2024-13807: Sensitive Information Exposure in Xagio SEO Plugin for WordPress

CVE-2025-36003: IBM Security Verify Governance Identity Manager Information Disclosure Vulnerability

CVE-2025-40779: DHCPv4 Client Request Vulnerability in Kea

CVE-2025-53105: Unauthorized modification of rules execution order in GLPI