Max severity RCE flaw discovered in widely used Apache Parquet

1. jtchang ◴[06 Apr 25 20:52 UTC] No.43604870[source]▶

It's so dumb to assign it a CVSS score of 10.

Unless you are blindly accepting parquet formatted files this really doesn't seem that bad.

A vulnerability in parsing images, xml, json, html, css would be way more detrimental.

I can't think of many services that accept parquet files directly. And of those usually you are calling it directly via a backend service.

replies(3): >>43605359 #>>43605393 #>>43606782 #

2. bigfatkitten ◴[06 Apr 25 22:06 UTC] No.43605359[source]▶

>>43604870 (TP) #

Vendor CVSS scores are always inherently meaningless because they can't take into account the factors specific to the user's environment.

Users need to do their own assessments.

replies(1): >>43606784 #

3. jeroenhd ◴[06 Apr 25 22:12 UTC] No.43605393[source]▶

>>43604870 (TP) #

Unless you're logging user input without proper validation, log4j doesn't really seem that bad.

As a library, this is a huge problem. If you're a user of the library, you'll have to decide if your usage of it is problematic or not.

Either way, the safe solution is to just update the library. Or, based on the link shared elsewhere (https://github.com/apache/parquet-java/compare/apache-parque...) maybe avoid this library if you can, because the Java-specific code paths seem sketchy as hell to me.

replies(2): >>43605484 #>>43608211 #

4. ajross ◴[06 Apr 25 22:24 UTC] No.43605484[source]▶

>>43605393 #

> Unless you're logging user input without proper validation, log4j doesn't really seem that bad.

Most systems do log user input though, and "proper validation" is an infamously squishy phrase that mostly acts as an excuse. The bottom line is that the natural/correct/idiomatic use of Log4j exposed the library directly to user-generated data. The similar use of Apache parquet (an obscure tool many of us are learning about for the first time) does not. That doesn't make it secure, but it makes the impact inarguably lower.

I mean, come on: the Log4j exploit was a global zero-day!

replies(1): >>43613320 #

5. SpicyLemonZest ◴[07 Apr 25 01:57 UTC] No.43606782[source]▶

>>43604870 (TP) #

The score is meant for consumption by users of the software with the vulnerability. In the kind of systems where Parquet is used, blindly reading files in a context with more privileges than the user who wrote them is very common. (Think less "service accepting a parquet file from an API", more "ETL process that can read the whole company's data scanning files from a dump directory anyone can write to".)

replies(1): >>43610754 #

6. worthless-trash ◴[07 Apr 25 01:57 UTC] No.43606784[source]▶

>>43605359 #

This comment over generalises the problem, but is inherently absurd. There are key indicators in scoring that explain the attack itself which isn't environment specific.

I do agree that in most cases the deployment specific configuration affects the ability to be exploited and users or developers should analyse their own configuration.

7. seanhunter ◴[07 Apr 25 05:52 UTC] No.43608211[source]▶

>>43605393 #

It’s incredibly common to log things which contain text elements which come from a user request. I’ve worked on systems that do that 100s of thousands of times per day. I’ve literally never deserialized a parquet file that came from someone else even a single time and I’ve used parquet since it very first was released.

8. seanhunter ◴[07 Apr 25 12:57 UTC] No.43610754[source]▶

>>43606782 #

I get the point you’re making but I’m gonna push back a little on this (as someone who has written a fair few ETL processes in their time). When are you ever ETLing a parquet file? You are always ETLing some raw format (css, json, raw text, structured text, etc) and writing into parquet files, never reading parquet files themselves. It seems a pretty bad practise to write your etl to just pick up whatever file in whatever format from a slop bucket you don’t control. I would always pull files in specific formats from such a common staging area and everything else would go into a random “unstructured data” dump where you just make a copy of it and record the metadata. I mean it’s a bad bug and I’m happy they’re fixing it, but it feels like you have to go out of your way to encounter it in practice.

9. jeroenhd ◴[07 Apr 25 16:39 UTC] No.43613320{3}[source]▶

>>43605484 #

> Most systems do log user input though, and "proper validation" is an infamously squishy phrase that mostly acts as an excuse

That's my point: if you start adding constraints to a vulnerability to reduce its scope, high CVE scores don't exist.

Any vulnerability that can be characterised as "pass contents through parser, full RCE" is a 10/10 vulnerability for me. I'd rather find out my application isn't vulnerable after my vulnerability scanner reports a critical issue than let it lurk with all the other 3/10 vulnerabilities about potential NULL pointers or complexity attacks in specific method calls.

replies(1): >>43613528 #

10. ajross ◴[07 Apr 25 16:58 UTC] No.43613528{4}[source]▶

>>43613320 #

> Any vulnerability that can be characterised as "pass contents through parser, full RCE" is a 10/10 vulnerability for me

And I think that's just wildly wrong sorry. I view something exploited in the wild to compromise real systems as a higher impact than something that isn't, and want to see a "score" value that reflects that (IMHO, critical) distinction. Agree to disagree, as it were.