A maximum severity vulnerability has been identified in Apache Tika, a widely used open-source content analysis toolkit. This vulnerability, designated as CVE-2025-66516, has a CVSS score of 10.0, indicating its critical impact. The flaw allows XML External Entity (XXE) injection attacks, potentially leading to the exposure of sensitive internal resources and, in some instances, remote code execution.
Apache Tika is the backbone of many document ingestion pipelines, used in systems like Apache Solr, Elasticsearch, and various content analysis platforms. Consequently, this vulnerability poses a significant risk to a vast number of enterprise organizations relying on these technologies for search and data indexing.
Root Cause Analysis
While XXE is a known class of vulnerability, this specific instance (CVE-2025-66516) stems from how Apache Tika processes complex structures within PDF files, specifically the XML Forms Architecture (XFA). XFA is a family of proprietary XML specifications that was proposed and maintained by JetForm to enhance the processing of web forms.
The root cause lies deeper than the PDF parser itself. The vulnerability resides within the tika-core module, which provides the foundational XML parsing capabilities for the toolkit. When Tika encounters a PDF with embedded XFA data, it extracts the XML content for analysis. However, prior to version 3.2.2, the core XML parsers did not sufficiently restrict the resolution of external entities during this extraction process.
This oversight allows the parser to “trust” the XML input provided inside the PDF, attempting to resolve references to external files or URLs defined by an attacker, ultimately leading to unauthorized file access or Server-Side Request Forgery (SSRF).
The Exploitation Process
Exploiting CVE-2025-66516 requires an attacker to pass a malicious file to an application using a vulnerable version of Apache Tika. The typical attack chain is as follows:
- Payload Creation: An attacker creates a standard PDF document but injects a malicious XFA form into it.
- XXE Injection: Inside the XFA XML data, the attacker defines a
DOCTYPEwith an external entity. This entity points to a sensitive file on the target server (e.g.,/etc/passwdorC:\Windows\win.ini) or an internal network endpoint. - Delivery: The attacker uploads this PDF to a web application that uses Apache Tika for content extraction (e.g., a resume upload portal, a search engine indexing service, or a document management system).
- Execution: Tika parses the PDF to extract text and metadata. When it hits the XFA section, the vulnerable
tika-coreXML parser processes the malicious entity. - Data Exfiltration: The content of the target file is read by the parser and included in the text output of Tika, which is then returned to the application (and potentially the attacker) or logged, completing the data theft.
Affected Products and Versions
The scope of this vulnerability is significantly broader than initially reported in related CVEs (such as CVE-2025-54988). It affects the Tika Core, the PDF Parser module, and legacy Parser bundles. Crucially, users running Tika 1.x are also affected.
The following table details the specific components and versions that require immediate attention:
| Component | Maven Artifact | Affected Versions | Fixed Version |
|---|---|---|---|
| Apache Tika Core | org.apache.tika:tika-core |
1.13 through 3.2.1 | 3.2.2 |
| PDF Parser Module | org.apache.tika:tika-parser-pdf-module |
2.0.0 through 3.2.1 | 3.2.2 |
| Apache Tika Parsers | org.apache.tika:tika-parsers |
1.13 before 2.0.0 | 2.0.0 |
Techniques and Tactics
This vulnerability maps to the following tactics and techniques in the MITRE ATT&CK framework, highlighting the attack vector and potential impact:
| Tactic | Technique ID | Technique Name |
|---|---|---|
| Initial Access | T1190 | Exploit Public-Facing Application |
| Collection | T1005 | Data from Local System |
| Defense Evasion | T1027 | Obfuscated Files or Information (Embedding in PDF) |
Mitigation & Remediation
To address this critical vulnerability, immediate action is required. This vulnerability replaces and expands upon the earlier reported CVE-2025-54988 (CVSS 8.4). Even if you patched based on that previous advisory, you may still be vulnerable if you did not update tika-core.
The Apache Tika project maintainers strongly urge users to take the following steps:
- Upgrade Tika Core: This is the most critical step. Ensure you are running version 3.2.2 or later.
- Upgrade Modules: Update the PDF parser module or the main parsers bundle depending on your Tika version.
- Configuration Review: Review your application’s XML processing configurations. Ensure that external entity processing is explicitly disabled in any custom parser implementations.
Instantly Fix Risks with Saner Patch Management
Saner patch management is a continuous, automated, and integrated software that instantly fixes risks exploited in the wild. The software supports major operating systems like Windows, Linux, and macOS, as well as 550+ third-party applications.
It also allows you to set up a safe testing area to test patches before deploying them in a primary production environment. Saner patch management additionally supports a patch rollback feature in case of patch failure or a system malfunction.
Experience the fastest and most accurate patching software here.
