CVE-2024-5206 - Vulnerability Details

CVE-2024-5206 - Sensitive Data Leakage in sklearn.feature_extraction.text.TfidfVectorizer in scikit-learn/scikit-learn

A sensitive data leakage vulnerability was identified in scikit-learn's TfidfVectorizer, specifically in versions up to and including 1.4.1.post1, which was fixed in version 1.5.0. The vulnerability arises from the unexpected storage of all tokens present in the training data within the `stop_words_` attribute, rather than only storing the subset of tokens required for the TF-IDF technique to function. This behavior leads to the potential leakage of sensitive information, as the `stop_words_` attribute could contain tokens that were meant to be discarded and not stored, such as passwords or keys. The impact of this vulnerability varies based on the nature of the data being processed by the vectorizer.

No CVSS v4.0

Attack Vector Local

Attack Complexity High

Privileges Required Low

Scope Unchanged

Confidentiality Impact High

Integrity Impact None

Availability Impact None

User Interaction None

Attack Vector Local

Attack Complexity High

Privileges Required Low

Scope Unchanged

Confidentiality Impact High

Integrity Impact None

Availability Impact None

User Interaction None

No CVSS v2

This CVE is not in the KEV list.

The EPSS score is 0.00037.

Key SSVC decision points have not yet been added.

Default status is the baseline for the product, each version can override it (e.g. patched versions marked unaffected).

Vendor Product Default status Versions

scikit-learn

scikit-learn/scikit-learn

affected

Version	Status	Constraints
`unspecified`	affected	< 1.5.0

Configuration 1 [-]

cpe:2.3:a:scikit-learn:scikit-learn:*:*:*:*:*:python:*:*

No data.

Project Subscriptions

Vendors	Products
Scikit-learn Subscribe	Scikit-learn Subscribe

Advisories

Source	ID	Title
EUVD	EUVD-2024-0161	A sensitive data leakage vulnerability was identified in scikit-learn's TfidfVectorizer, specifically in versions up to and including 1.4.1.post1, which was fixed in version 1.5.0. The vulnerability arises from the unexpected storage of all tokens present in the training data within the `stop_words_` attribute, rather than only storing the subset of tokens required for the TF-IDF technique to function. This behavior leads to the potential leakage of sensitive information, as the `stop_words_` attribute could contain tokens that were meant to be discarded and not stored, such as passwords or keys. The impact of this vulnerability varies based on the nature of the data being processed by the vectorizer.
Github GHSA	GHSA-jw8x-6495-233v	scikit-learn sensitive data leakage vulnerability

Fixes

Solution

No solution given by the vendor.

Workaround

No workaround given by the vendor.

References

Link	Providers
https://github.com/scikit-learn/scikit-learn/commit/70ca21f106b603b611da73012c9ade7cd8e438b8
https://huntr.com/bounties/14bc0917-a85b-4106-a170-d09d5191517c
https://nvd.nist.gov/vuln/detail/CVE-2024-5206
https://www.cve.org/CVERecord?id=CVE-2024-5206

History

Tue, 15 Jul 2025 13:45:00 +0000

Type	Values Removed	Values Added
Metrics	epss `{'score': 0.00029}`	epss `{'score': 0.00032}`

Thu, 24 Oct 2024 20:15:00 +0000

Type	Values Removed	Values Added
First Time appeared		Scikit-learn Scikit-learn scikit-learn
Weaknesses		CWE-922
CPEs		cpe:2.3:a:scikit-learn:scikit-learn::::::python::*
Vendors & Products		Scikit-learn Scikit-learn scikit-learn
Metrics		cvssV3_1 `{'score': 4.7, 'vector': 'CVSS:3.1/AV:L/AC:H/PR:L/UI:N/S:U/C:H/I:N/A:N'}`

Projects

Sign in to view the affected projects.

MITRE

Status: PUBLISHED

Assigner: @huntr_ai

Published: 2024-06-06T18:28:14.267Z

Updated: 2024-08-01T21:03:11.034Z

Reserved: 2024-05-22T15:52:49.284Z

Link: CVE-2024-5206

Vulnrichment

Updated: 2024-08-01T21:03:11.034Z

NVD

Status : Modified

Published: 2024-06-06T19:16:06.363

Modified: 2026-06-17T08:15:24.690

Link: CVE-2024-5206

Redhat

Severity : Moderate

Publid Date: 2024-06-06T00:00:00Z

Links: CVE-2024-5206 - Bugzilla

OpenCVE Enrichment

No data.

Weaknesses

Attack Vector Local

Attack Complexity High

Privileges Required Low

Scope Unchanged

Confidentiality Impact High

Integrity Impact None

Availability Impact None

User Interaction None

Attack Vector Local

Attack Complexity High

Privileges Required Low

Scope Unchanged

Confidentiality Impact High

Integrity Impact None

Availability Impact None

User Interaction None

Project Subscriptions

Projects

JSON object

JSON object

JSON object

JSON object

JSON object