CVE-2025-46560

vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. Versions starting from 0.8.0 and prior to 0.8.5 are affected by a critical performance vulnerability in the input preprocessing logic of the multimodal tokenizer. The code dynamically replaces placeholder tokens (e.g., <|audio_|>, <|image_|>) with repeated tokens based on precomputed lengths. Due to inefficient list concatenation operations, the algorithm exhibits quadratic time complexity (O(n²)), allowing malicious actors to trigger resource exhaustion via specially crafted inputs. This issue has been patched in version 0.8.5.

CVSS v3 6.5 MEDIUM

6.5^/10

CVSS v3 : MEDIUM

Vector :

Exploitability : 2.8 / Impact : 3.6

Attack Vector NETWORK

Attack Complexity LOW

Privileges Required LOW

User Interaction NONE

Confidentiality Impact NONE

Integrity Impact NONE

Availability Impact HIGH

Scope UNCHANGED

References

Link	Resource
https://github.com/vllm-project/vllm/blob/8cac35ba435906fb7eb07e44fe1a8c26e8744f4e/vllm/model_executor/models/phi4mm.py#L1182-L1197	Product
https://github.com/vllm-project/vllm/security/advisories/GHSA-vc6m-hm49-g9qg	Exploit Vendor Advisory
https://github.com/vllm-project/vllm/security/advisories/GHSA-vc6m-hm49-g9qg	Exploit Vendor Advisory

Configurations

Configuration 1 (hide)

cpe:2.3:a:vllm:vllm:*:*:*:*:*:*:*:*

History

28 May 2025, 19:15

Type	Values Removed	Values Added
New CVE

Information

Published : 2025-04-30 01:15

Updated : 2025-05-28 19:15

NVD link : CVE-2025-46560

Mitre link : CVE-2025-46560

CVE.ORG link : CVE-2025-46560

JSON object : View

Products Affected

vllm

vllm

CWE

CWE-1333

Inefficient Regular Expression Complexity

{"id": "CVE-2025-46560", "cveTags": [], "metrics": {"cvssMetricV31": [{"type": "Secondary", "source": "security-advisories@github.com", "cvssData": {"scope": "UNCHANGED", "version": "3.1", "baseScore": 6.5, "attackVector": "NETWORK", "baseSeverity": "MEDIUM", "vectorString": "CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H", "integrityImpact": "NONE", "userInteraction": "NONE", "attackComplexity": "LOW", "availabilityImpact": "HIGH", "privilegesRequired": "LOW", "confidentialityImpact": "NONE"}, "impactScore": 3.6, "exploitabilityScore": 2.8}, {"type": "Primary", "source": "nvd@nist.gov", "cvssData": {"scope": "UNCHANGED", "version": "3.1", "baseScore": 7.5, "attackVector": "NETWORK", "baseSeverity": "HIGH", "vectorString": "CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:N/A:H", "integrityImpact": "NONE", "userInteraction": "NONE", "attackComplexity": "LOW", "availabilityImpact": "HIGH", "privilegesRequired": "NONE", "confidentialityImpact": "NONE"}, "impactScore": 3.6, "exploitabilityScore": 3.9}]}, "published": "2025-04-30T01:15:52.097", "references": [{"url": "https://github.com/vllm-project/vllm/blob/8cac35ba435906fb7eb07e44fe1a8c26e8744f4e/vllm/model_executor/models/phi4mm.py#L1182-L1197", "tags": ["Product"], "source": "security-advisories@github.com"}, {"url": "https://github.com/vllm-project/vllm/security/advisories/GHSA-vc6m-hm49-g9qg", "tags": ["Exploit", "Vendor Advisory"], "source": "security-advisories@github.com"}, {"url": "https://github.com/vllm-project/vllm/security/advisories/GHSA-vc6m-hm49-g9qg", "tags": ["Exploit", "Vendor Advisory"], "source": "134c704f-9b21-4f2e-91b3-4a467353bcc0"}], "vulnStatus": "Analyzed", "weaknesses": [{"type": "Secondary", "source": "security-advisories@github.com", "description": [{"lang": "en", "value": "CWE-1333"}]}], "descriptions": [{"lang": "en", "value": "vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. Versions starting from 0.8.0 and prior to 0.8.5 are affected by a critical performance vulnerability in the input preprocessing logic of the multimodal tokenizer. The code dynamically replaces placeholder tokens (e.g., <|audio_|>, <|image_|>) with repeated tokens based on precomputed lengths. Due to \u200b\u200binefficient list concatenation operations\u200b\u200b, the algorithm exhibits \u200b\u200bquadratic time complexity (O(n\u00b2))\u200b\u200b, allowing malicious actors to trigger resource exhaustion via specially crafted inputs. This issue has been patched in version 0.8.5."}, {"lang": "es", "value": "vLLM es un motor de inferencia y servicio de alto rendimiento y eficiente en memoria para LLM. Las versiones a partir de la 0.8.0 y anteriores a la 0.8.5 se ven afectadas por una vulnerabilidad cr\u00edtica de rendimiento en la l\u00f3gica de preprocesamiento de entrada del tokenizador multimodal. El c\u00f3digo reemplaza din\u00e1micamente los tokens de marcador de posici\u00f3n (p. ej., <|audio_|>, <|image_|>) con tokens repetidos basados ??en longitudes precalculadas. Debido a las ineficientes operaciones de concatenaci\u00f3n de listas, el algoritmo presenta una complejidad temporal cuadr\u00e1tica (O(n\u00b2)), lo que permite a los actores maliciosos activar el agotamiento de recursos mediante entradas especialmente manipuladas. Este problema se ha corregido en la versi\u00f3n 0.8.5."}], "lastModified": "2025-05-28T19:15:56.887", "configurations": [{"nodes": [{"negate": false, "cpeMatch": [{"criteria": "cpe:2.3:a:vllm:vllm:*:*:*:*:*:*:*:*", "vulnerable": true, "matchCriteriaId": "19C6D0C7-632B-4AA7-97E5-CCF21EC350E5", "versionEndExcluding": "0.8.5", "versionStartIncluding": "0.8.0"}], "operator": "OR"}]}], "sourceIdentifier": "security-advisories@github.com"}