A high-severity vulnerability in Apache Parquet’s Java implementation has security teams scrambling to patch affected systems before attackers can exploit the flaw.
The vulnerability, tracked as CVE-2025-46762, could allow attackers to execute malicious code remotely on systems processing Parquet files.
Apache Parquet is a columnar storage format commonly used in the Hadoop ecosystem. It is designed to efficiently store and process large datasets and is widely adopted across industries for big data analytics workloads.
The security flaw, which affects all Apache Parquet Java versions up to and including 1.15.1, was disclosed last Friday by Apache Parquet contributor Gang Wu.
The vulnerability specifically impacts the parquet-avro module, which is widely used in big data environments to handle data serialization and deserialization.
The issue stems from insecure schema parsing when processing Avro schemas embedded in Parquet file metadata.
Attackers can embed malicious code within these schemas that automatically executes when a vulnerable system attempts to deserialize the data.
Organizations using Apache Parquet Java in conjunction with big data frameworks such as Apache Spark, Hadoop, or Flink are potentially at risk. The vulnerability specifically affects applications that:
• Use the parquet-avro module for data processing
• Employ either the “specific” or “reflect” data models (rather than the “generic” model)
• Process Parquet files from untrusted sources
While Apache Parquet 1.15.1 had introduced some mitigations by restricting untrusted packages, security researchers found that the default “trusted packages” configuration still permitted code execution from pre-approved Java packages like java.util.
The Apache Software Group has released version 1.15.2, which fully addresses the vulnerability by significantly tightening package trust boundaries.
Organizations are strongly advised to:
- Upgrade immediately to Apache Parquet Java 1.15.2
- For systems that cannot be immediately upgraded to 1.15.2, set the JVM system property -Dorg.apache.parquet.avro.SERIALIZABLE_PACKAGES= (with an empty string value) as a temporary mitigation
- Audit data pipelines to identify and prioritize the use of the “generic” Avro model where possible, as it is not vulnerable to this exploit
Security experts are particularly concerned about the potential for supply chain attacks, where corrupted Parquet files could be inserted into data workflows to trigger backend exploits.
The Apache team has also released updated documentation highlighting secure configuration practices for Avro schema handling to help organizations better protect themselves against similar vulnerabilities in the future.
With proof-of-concept exploits for similar vulnerabilities typically emerging within days of public disclosure, organizations handling sensitive data are urged to prioritize patching immediately.
You must be logged in to post a comment.