You are currently viewing Understanding CVE-2025-66516: Critical XXE Exposure in Apache Tika

Understanding CVE-2025-66516: Critical XXE Exposure in Apache Tika

A maximum severity vulnerability has been identified in Apache Tika, a widely used open-source content analysis toolkit. This vulnerability, designated as CVE-2025-66516, has a CVSS score of 10.0, indicating its critical impact. The flaw allows XML External Entity (XXE) injection attacks, potentially leading to the exposure of sensitive internal resources and, in some instances, remote code execution.

Apache Tika is the backbone of many document ingestion pipelines, used in systems like Apache Solr, Elasticsearch, and various content analysis platforms. Consequently, this vulnerability poses a significant risk to a vast number of enterprise organizations relying on these technologies for search and data indexing.


Root Cause Analysis

While XXE is a known class of vulnerability, this specific instance (CVE-2025-66516) stems from how Apache Tika processes complex structures within PDF files, specifically the XML Forms Architecture (XFA). XFA is a family of proprietary XML specifications that was proposed and maintained by JetForm to enhance the processing of web forms.

The root cause lies deeper than the PDF parser itself. The vulnerability resides within the tika-core module, which provides the foundational XML parsing capabilities for the toolkit. When Tika encounters a PDF with embedded XFA data, it extracts the XML content for analysis. However, prior to version 3.2.2, the core XML parsers did not sufficiently restrict the resolution of external entities during this extraction process.

This oversight allows the parser to “trust” the XML input provided inside the PDF, attempting to resolve references to external files or URLs defined by an attacker, ultimately leading to unauthorized file access or Server-Side Request Forgery (SSRF).


The Exploitation Process

Exploiting CVE-2025-66516 requires an attacker to pass a malicious file to an application using a vulnerable version of Apache Tika. The typical attack chain is as follows:

  1. Payload Creation: An attacker creates a standard PDF document but injects a malicious XFA form into it.
  2. XXE Injection: Inside the XFA XML data, the attacker defines a DOCTYPE with an external entity. This entity points to a sensitive file on the target server (e.g., /etc/passwd or C:\Windows\win.ini) or an internal network endpoint.
  3. Delivery: The attacker uploads this PDF to a web application that uses Apache Tika for content extraction (e.g., a resume upload portal, a search engine indexing service, or a document management system).
  4. Execution: Tika parses the PDF to extract text and metadata. When it hits the XFA section, the vulnerable tika-core XML parser processes the malicious entity.
  5. Data Exfiltration: The content of the target file is read by the parser and included in the text output of Tika, which is then returned to the application (and potentially the attacker) or logged, completing the data theft.

Affected Products and Versions

The scope of this vulnerability is significantly broader than initially reported in related CVEs (such as CVE-2025-54988). It affects the Tika Core, the PDF Parser module, and legacy Parser bundles. Crucially, users running Tika 1.x are also affected.

The following table details the specific components and versions that require immediate attention:

Component Maven Artifact Affected Versions Fixed Version
Apache Tika Core org.apache.tika:tika-core 1.13 through 3.2.1 3.2.2
PDF Parser Module org.apache.tika:tika-parser-pdf-module 2.0.0 through 3.2.1 3.2.2
Apache Tika Parsers org.apache.tika:tika-parsers 1.13 before 2.0.0 2.0.0

Techniques and Tactics

This vulnerability maps to the following tactics and techniques in the MITRE ATT&CK framework, highlighting the attack vector and potential impact:

Tactic Technique ID Technique Name
Initial Access T1190 Exploit Public-Facing Application
Collection T1005 Data from Local System
Defense Evasion T1027 Obfuscated Files or Information (Embedding in PDF)

Mitigation & Remediation

To address this critical vulnerability, immediate action is required. This vulnerability replaces and expands upon the earlier reported CVE-2025-54988 (CVSS 8.4). Even if you patched based on that previous advisory, you may still be vulnerable if you did not update tika-core.

The Apache Tika project maintainers strongly urge users to take the following steps:

  • Upgrade Tika Core: This is the most critical step. Ensure you are running version 3.2.2 or later.
  • Upgrade Modules: Update the PDF parser module or the main parsers bundle depending on your Tika version.
  • Configuration Review: Review your application’s XML processing configurations. Ensure that external entity processing is explicitly disabled in any custom parser implementations.

Instantly Fix Risks with Saner Patch Management

Saner patch management is a continuous, automated, and integrated software that instantly fixes risks exploited in the wild. The software supports major operating systems like Windows, Linux, and macOS, as well as 550+ third-party applications.

It also allows you to set up a safe testing area to test patches before deploying them in a primary production environment. Saner patch management additionally supports a patch rollback feature in case of patch failure or a system malfunction.

Experience the fastest and most accurate patching software here.