Overview
CVE-2025-49847 is a significant vulnerability found in the llama.cpp, a C/C++ implementation of several LLM models. This vulnerability is of high concern due to its potential to allow an attacker to cause arbitrary memory corruption and even execute unauthorized code. This could lead to significant system compromise and data leakage, affecting various applications and services that rely on affected versions of llama.cpp. Given the potential severity of the impact, it’s crucial for organizations to understand this vulnerability and take appropriate measures to mitigate it.
Vulnerability Summary
CVE ID: CVE-2025-49847
Severity: High (8.8 CVSS Score)
Attack Vector: Network
Privileges Required: None
User Interaction: None
Impact: System compromise and potential data leakage
Affected Products
Escape the Surveillance Era
Most apps won’t tell you the truth.
They’re part of the problem.
Phone numbers. Emails. Profiles. Logs.
It’s all fuel for surveillance.
Ameeba Chat gives you a way out.
- • No phone number
- • No email
- • No personal info
- • Anonymous aliases
- • End-to-end encrypted
Chat without a trace.
Product | Affected Versions
llama.cpp | Prior to version b5662
How the Exploit Works
The vulnerability lies in the vocabulary-loading code of llama.cpp. Here, a helper function, _try_copy in llama_vocab::impl::token_to_piece(), incorrectly casts a very large size_t token length into an int32_t. This results in the bypassing of the length check (if (length < (int32_t)size)), and memcpy is still called with that oversized size. A malicious GGUF model vocabulary provided by an attacker can take advantage of this to overwrite memory beyond the intended buffer, thereby leading to arbitrary memory corruption and potential unauthorized code execution.
Conceptual Example Code
Below is a conceptual example of how this vulnerability might be exploited. This is represented as a pseudocode for an attacker-supplied GGUF model vocabulary with an oversized token.
// Malicious GGUF model vocabulary
std::string malicious_vocab = createOversizedToken();
// Loading malicious vocabulary in llama.cpp
llama_vocab vocab = llama_vocab::load_from_string(malicious_vocab);
// Triggering buffer overflow
vocab.token_to_piece(oversizedToken);
In this example, createOversizedToken() is a function that creates a token larger than int32_t can handle. The oversized token is then loaded into llama.cpp through the load_from_string function, and the buffer overflow is triggered when token_to_piece is called with the oversized token. This could potentially lead to memory corruption and unauthorized code execution.