https://github.com/apache/parquet-java/compare/apache-parque...
https://github.com/apache/parquet-java/compare/apache-parque...
If by “classic” you mean “using a language-dependent deserialization mechanism that is wildly unsafe”, I suppose. The surprising part is that Parquet is a fairly modern format with a real schema that is nominally language-independent. How on Earth did Java class names end up in the file format? Why is the parser willing to parse them at all? At most (at least by default), the parser should treat them as predefined strings that have semantics completely independent of any actual Java class.
But if avro-in-parquet is a weird optional feature, it should be off by default! Parquet’s metadata is primarily in Thrift, not Avro, and it seems to me that no Avro should be involved in decoding Parquet files unless explicitly requested.
(Yes, this doesn't make sense; the official Parquet Java library had some of the worst code design I've had the misfortune to depend on.)
The bug threads are still private, almost two weeks since it was disclosed and fixed. Very strange.
https://bugzilla.mozilla.org/show_bug.cgi?id=1956398
I think it’s revealing and unfortunate that everyone serious about Parquet, from DuckDB to Databricks, has written their own “codec”.
Some recent frustrations on this front from the DuckDB folks:
But the fix itself is public in both the Chrome [https://chromium.googlesource.com/chromium/src.git/+/36dbbf3...] and Firefox [https://github.com/mozilla/gecko-dev/commit/ac605820636c3b96...] source repos, and it makes pretty clear what the bug is.