Overview
The NVIDIA Triton Inference Server, a popular platform for deploying AI models, is susceptible to a critical vulnerability, CVE-2025-23331. This vulnerability affects both Windows and Linux versions of the server and could potentially lead to a system compromise or data leakage. The vulnerability enables a user to trigger a memory allocation with an excessively large size value, causing a segmentation fault by providing an invalid request.
Vulnerability Summary
CVE ID: CVE-2025-23331
Severity: Critical (7.5 CVSS Score)
Attack Vector: Network
Privileges Required: Low
User Interaction: None
Impact: Denial of service, potential system compromise, and data leakage
Affected Products
Product | Affected Versions
NVIDIA Triton Inference Server for Windows | All Versions
NVIDIA Triton Inference Server for Linux | All Versions
How the Exploit Works
The exploit takes advantage of the server’s failure to validate and properly handle the size value of a user’s request. By providing an invalid request with an excessively large size value, the user can trigger a segmentation fault. This fault can lead to a denial of service and, in certain circumstances, allow for further exploitation that could result in system compromise or data leakage.
Conceptual Example Code
Below is a conceptual example of how the vulnerability might be exploited. This is a sample HTTP request with a malicious payload designed to trigger a segmentation fault.
POST /api/v1/inference HTTP/1.1
Host: target.example.com
Content-Type: application/json
{ "data_size": "99999999999999999999999999999", "data": "malicious_data" }
Mitigation Guidance
Users are strongly advised to apply the vendor patch as soon as it becomes available. Until then, the use of a Web Application Firewall (WAF) or Intrusion Detection System (IDS) can serve as a temporary mitigation measure.
