When did vulnerability reports get so vague? Looks like a classic serialization bug
https://github.com/apache/parquet-java/compare/apache-parque...
replies(3):
https://github.com/apache/parquet-java/compare/apache-parque...
If by “classic” you mean “using a language-dependent deserialization mechanism that is wildly unsafe”, I suppose. The surprising part is that Parquet is a fairly modern format with a real schema that is nominally language-independent. How on Earth did Java class names end up in the file format? Why is the parser willing to parse them at all? At most (at least by default), the parser should treat them as predefined strings that have semantics completely independent of any actual Java class.
(Yes, this doesn't make sense; the official Parquet Java library had some of the worst code design I've had the misfortune to depend on.)