Most active commenters

(6)
tptacek(4)
natebc(3)
3eb7988a1663(3)
hinkley(3)
amluto(3)
worthless-trash(3)

Popular/hot comments

>>43604427 #
>>43603399 #
>>43603575 #
>>43603642 #
>>43603869 #
>>43604870 #

Max severity RCE flaw discovered in widely used Apache Parquet

(www.bleepingcomputer.com)

1. gnabgib ◴[06 Apr 25 17:37 UTC] No.43603231[source]▶

>>43603091 (OP) #

Original source: https://www.endorlabs.com/learn/critical-rce-vulnerability-i...

replies(1): >>43603290 #

2. foolswisdom ◴[06 Apr 25 17:44 UTC] No.43603290[source]▶

>>43603231 #

Note that the vulnerability applies to the Java library (or systems that use it, like Spark),as mentioned in the original source.

replies(1): >>43603389 #

3. formerly_proven ◴[06 Apr 25 17:47 UTC] No.43603309[source]▶

>>43603091 (OP) #

As per the PoC, yes — this is the usual Java Deserialization RCE where it’ll instantiate arbitrary classes. Java serialization really is a gift that keeps on giving.

replies(2): >>43603416 #>>43603473 #

4. ustad ◴[06 Apr 25 17:48 UTC] No.43603319[source]▶

>>43603091 (OP) #

Does anyone know if pandas is affected? I serialize/deserialize dataframes which pandas uses parquet under the hood.

replies(2): >>43603399 #>>43603494 #

5. k_bx ◴[06 Apr 25 17:56 UTC] No.43603389{3}[source]▶

>>43603290 #

Amazing how not mentioning such important detail is even possible.

replies(1): >>43607316 #

6. natebc ◴[06 Apr 25 17:56 UTC] No.43603399[source]▶

>>43603319 #

https://www.endorlabs.com/learn/critical-rce-vulnerability-i...

> Any application or service using Apache Parquet Java library versions 1.15.0 or earlier is believed to be vulnerable (our own data indicates that this was introduced in version 1.8.0; however, current guidance is to review all historical versions). This includes systems that read or import Parquet files using popular big-data frameworks (e.g. Hadoop, Spark, Flink) or custom applications that incorporate the Parquet Java code. If you are unsure whether your software stack uses Parquet, check with your vendors or developers – many data analytics and storage solutions include this library.

Seems safe to assume yes, pandas is probably affected by using this library.

replies(3): >>43603443 #>>43603446 #>>43604839 #

7. stefan_ ◴[06 Apr 25 17:58 UTC] No.43603416[source]▶

>>43603309 #

I love how these always instantly escalate into trivial code execution / reverse shell. Remember kids, C is the enemy!

The "fix" in question also screams "delete this crap immediately": https://github.com/wgtmac/parquet-mr/commit/d185f867c1eb968a...

replies(2): >>43603502 #>>43603648 #

8. nindalf ◴[06 Apr 25 18:00 UTC] No.43603443{3}[source]▶

>>43603399 #

The paragraph you pasted in states that only applications importing the Java library are vulnerable.

Isn’t pandas implemented in Python/C? How would it have been importing the Java library?

replies(2): >>43604826 #>>43605405 #

9. 3eb7988a1663 ◴[06 Apr 25 18:01 UTC] No.43603446{3}[source]▶

>>43603399 #

That does not follow for me. Pandas does not utilize Java/JVM.

replies(2): >>43604833 #>>43605403 #

10. ◴[06 Apr 25 18:04 UTC] No.43603473[source]▶

>>43603309 #

11. 3eb7988a1663 ◴[06 Apr 25 18:06 UTC] No.43603491[source]▶

>>43603091 (OP) #

Maybe the headline should note that this a parser vulnerability, not the format itself. I suppose that is obvious, but my first knee-jerk thought was, "Am I going to have to re-encode XXX piles of data?"

replies(2): >>43603869 #>>43604836 #

12. minimaxir ◴[06 Apr 25 18:06 UTC] No.43603494[source]▶

>>43603319 #

Pandas doesn't use the parquet python package under the hood: https://pandas.pydata.org/docs/reference/api/pandas.read_par...

> Parquet library to use. If ‘auto’, then the option io.parquet.engine is used. The default io.parquet.engine behavior is to try ‘pyarrow’, falling back to ‘fastparquet’ if ‘pyarrow’ is unavailable.

Those should be unaffected.

replies(1): >>43603695 #

13. pclmulqdq ◴[06 Apr 25 18:08 UTC] No.43603502{3}[source]▶

>>43603416 #

This is a bug in Java. Java is considered "memory safe" because of its GC and its VM. This is not a memory safety bug.

replies(1): >>43605304 #

14. nikanj ◴[06 Apr 25 18:18 UTC] No.43603575[source]▶

>>43603091 (OP) #

"Maximum severity RCE" no longer means "unauthenticated RCE by any actor", it now means "the vulnerability can only be exploited if a malicious file is imported"

Grumbling about CVE inflation

replies(3): >>43603718 #>>43604364 #>>43604433 #

15. g-mork ◴[06 Apr 25 18:25 UTC] No.43603642[source]▶

>>43603091 (OP) #

When did vulnerability reports get so vague? Looks like a classic serialization bug

https://github.com/apache/parquet-java/compare/apache-parque...

replies(3): >>43603809 #>>43604045 #>>43604276 #

16. hinkley ◴[06 Apr 25 18:25 UTC] No.43603648{3}[source]▶

>>43603416 #

The fix still loads the class before checking if it’s okay.

That’s a smaller attack window but it’s still not zero.

replies(1): >>43604534 #

17. westurner ◴[06 Apr 25 18:30 UTC] No.43603695{3}[source]▶

>>43603494 #

Python pickles have the same issue but it is a design decision per the docs.

Python docs > library > pickle: https://docs.python.org/3/library/pickle.html

Re: a hypothetical pickle parser protocol that doesn't eval code at parse time; "skipcode pickle protocol 6: "AI Supply Chain Attack: How Malicious Pickle Files Backdoor Models" .. "Insecurity and Python Pickles" : https://news.ycombinator.com/item?id=43426963

replies(2): >>43604174 #>>43605367 #

18. marcusb ◴[06 Apr 25 18:31 UTC] No.43603718[source]▶

>>43603575 #

CVSS, at least in its current form, needs to be taken out back and shot. See, for instance, https://daniel.haxx.se/blog/2025/01/23/cvss-is-dead-to-us/

replies(1): >>43604530 #

19. amluto ◴[06 Apr 25 18:39 UTC] No.43603809[source]▶

>>43603642 #

Better link: https://github.com/apache/parquet-java/pull/3169

If by “classic” you mean “using a language-dependent deserialization mechanism that is wildly unsafe”, I suppose. The surprising part is that Parquet is a fairly modern format with a real schema that is nominally language-independent. How on Earth did Java class names end up in the file format? Why is the parser willing to parse them at all? At most (at least by default), the parser should treat them as predefined strings that have semantics completely independent of any actual Java class.

replies(1): >>43603943 #

20. brokensegue ◴[06 Apr 25 18:46 UTC] No.43603869[source]▶

>>43603491 #

What would it mean for the vulnerability to be in the format and not the parser?

replies(3): >>43603925 #>>43603966 #>>43606334 #

21. dist-epoch ◴[06 Apr 25 18:52 UTC] No.43603925{3}[source]▶

>>43603869 #

Macros in old Microsoft Word documents were quite a popular attack.

22. bri3d ◴[06 Apr 25 18:54 UTC] No.43603943{3}[source]▶

>>43603809 #

This seems to come from parquet-avro, which looks to attempt to embed Avro in Parquet files and in the course of doing so, does silly Java reflection gymnastics. I don’t think “normal” parquet is affected.

replies(2): >>43604120 #>>43604161 #

23. 3eb7988a1663 ◴[06 Apr 25 18:57 UTC] No.43603966{3}[source]▶

>>43603869 #

I don't know. Something like a Python pickle file where parsing is unavoidable.

On a second read, I realized a format problem was unlikely, but the headline just said, "Apache Parquet". My mind might the same conclusion if it said "safetensors" or "PNG".

24. amluto ◴[06 Apr 25 19:16 UTC] No.43604120{4}[source]▶

>>43603943 #

The documentation for all of this is atrocious.

But if avro-in-parquet is a weird optional feature, it should be off by default! Parquet’s metadata is primarily in Thrift, not Avro, and it seems to me that no Avro should be involved in decoding Parquet files unless explicitly requested.

replies(1): >>43605060 #

25. tikhonj ◴[06 Apr 25 19:22 UTC] No.43604161{4}[source]▶

>>43603943 #

Last time I tried to use the official Apache Parquet Java library, parsing "normal" Parquet files depended on parquet-avro because the library used Avro's GenericRecord class to represent rows from Parquet files with arbitrary schemas. So this problem would presumably affect any kind of Parquet parsing, even if there is absolutely no Avro actually involved.

(Yes, this doesn't make sense; the official Parquet Java library had some of the worst code design I've had the misfortune to depend on.)

replies(2): >>43604367 #>>43605332 #

26. ◴[06 Apr 25 19:25 UTC] No.43604174{4}[source]▶

>>43603695 #

27. hypeatei ◴[06 Apr 25 19:39 UTC] No.43604276[source]▶

>>43603642 #

Tangential, but there was a recent sandbox escape vulnerability in both Chrome and Firefox.

The bug threads are still private, almost two weeks since it was disclosed and fixed. Very strange.

https://bugzilla.mozilla.org/show_bug.cgi?id=1956398

https://issues.chromium.org/issues/405143032

https://www.cve.org/CVERecord?id=CVE-2025-2783

replies(2): >>43604716 #>>43604761 #

28. kevincox ◴[06 Apr 25 19:49 UTC] No.43604364[source]▶

>>43603575 #

But Parquet is intended to be a safe format. So importing a malicious file should still be safe.

Like if a browser had a vulnerability parsing HTML of course it is a major concern because very often browsers to parse HTML from untrusted parties.

replies(1): >>43604910 #

29. twoodfin ◴[06 Apr 25 19:49 UTC] No.43604367{5}[source]▶

>>43604161 #

Indeed, given the massive interest Parquet has generated over the past 5 years, and its critical role in modern data infrastructure, I’ve been disappointed every time I’ve dug into the open source ecosystem around it for one reason or another.

I think it’s revealing and unfortunate that everyone serious about Parquet, from DuckDB to Databricks, has written their own “codec”.

Some recent frustrations on this front from the DuckDB folks:

https://duckdb.org/2025/01/22/parquet-encodings.html

replies(1): >>43609018 #

30. tptacek ◴[06 Apr 25 19:56 UTC] No.43604427[source]▶

>>43603091 (OP) #

Broken record, but "has a CVSS score of 10.0" is literally meaningless. In fact, over the last couple years, I've come to take vulnerabilities with very high CVSS scores less seriously. Remember, Heartbleed was a "7.5".

replies(5): >>43604810 #>>43605410 #>>43606314 #>>43609363 #>>43610358 #

31. tptacek ◴[06 Apr 25 19:57 UTC] No.43604433[source]▶

>>43603575 #

There's no such thing as CVE inflation because CVEs don't have scores. You're grumbling about CVSS inflation. But: CVSS has always been flawed, and never should have been taken seriously.

replies(1): >>43609370 #

32. buu700 ◴[06 Apr 25 20:06 UTC] No.43604530{3}[source]▶

>>43603718 #

I like the idea of CVSS, but it's definitely less precise than I'd like as-is. e.g. I've found that most issues which I would normally think of as low-severity get bumped up to medium by CVSS just for being network-based attack vectors, even if the actual issue is extremely edge case, extremely complex and/or computationally expensive to exploit, or not clearly exploitable at all.

33. josefx ◴[06 Apr 25 20:07 UTC] No.43604534{4}[source]▶

>>43603648 #

Java reflection can load classes without initializing them, so no untrusted code would have to be executed at that point.

replies(1): >>43604871 #

34. lpapez ◴[06 Apr 25 20:09 UTC] No.43604550[source]▶

>>43603091 (OP) #

Soon to be announced "Quake PAK files identified carrying malware, critical 10/10 vulnerability"

35. hovav ◴[06 Apr 25 20:30 UTC] No.43604716{3}[source]▶

>>43604276 #

Standard operating procedure for both the Chrome [https://chromium.googlesource.com/chromium/src/+/HEAD/docs/s...] and Firefox [https://www.mozilla.org/en-US/about/governance/policies/secu...] bug tracking systems.

But the fix itself is public in both the Chrome [https://chromium.googlesource.com/chromium/src.git/+/36dbbf3...] and Firefox [https://github.com/mozilla/gecko-dev/commit/ac605820636c3b96...] source repos, and it makes pretty clear what the bug is.

replies(1): >>43604798 #

36. ◴[06 Apr 25 20:36 UTC] No.43604761{3}[source]▶

>>43604276 #

37. benatkin ◴[06 Apr 25 20:41 UTC] No.43604798{4}[source]▶

>>43604716 #

Looks like this one only applied to windows. Here’s a link to the diff: https://chromium.googlesource.com/chromium/src.git/+/36dbbf3...

38. marcolussetti ◴[06 Apr 25 20:43 UTC] No.43604810[source]▶

>>43604427 #

That's mostly due to the switch from CVSS 2 to CVSS 3

39. ◴[06 Apr 25 20:45 UTC] No.43604826{4}[source]▶

>>43603443 #

40. ◴[06 Apr 25 20:46 UTC] No.43604833{4}[source]▶

>>43603446 #

41. necubi ◴[06 Apr 25 20:46 UTC] No.43604836[source]▶

>>43603491 #

Also that it's in the Java parquet library, which somehow is nowhere in the article

42. ◴[06 Apr 25 20:47 UTC] No.43604839{3}[source]▶

>>43603399 #

43. jtchang ◴[06 Apr 25 20:52 UTC] No.43604870[source]▶

>>43603091 (OP) #

It's so dumb to assign it a CVSS score of 10.

Unless you are blindly accepting parquet formatted files this really doesn't seem that bad.

A vulnerability in parsing images, xml, json, html, css would be way more detrimental.

I can't think of many services that accept parquet files directly. And of those usually you are calling it directly via a backend service.

replies(3): >>43605359 #>>43605393 #>>43606782 #

44. hinkley ◴[06 Apr 25 20:52 UTC] No.43604871{5}[source]▶

>>43604534 #

I haven’t been in Java for a good while. When did they do that?

Static initializers used to load on Classloader calls.

replies(1): >>43605102 #

45. mr_mitm ◴[06 Apr 25 20:58 UTC] No.43604910{3}[source]▶

>>43604364 #

Why is "user interaction: none" though? There should be reasoning attached to the CVSS vector in these CVEs.

replies(1): >>43605059 #

46. StressedDev ◴[06 Apr 25 21:21 UTC] No.43605059{4}[source]▶

>>43604910 #

Probably because there are services (AKA web services, software listening on a network port, etc.) out there which accept arbitrary Parquet files. This seems like a safe assumption given lots of organizations use micro-services or cloud venders use the same software on the same machine to process requests from different customers. This is a bad bug and if you use the affected code, you should update immediately.

47. bri3d ◴[06 Apr 25 21:21 UTC] No.43605060{5}[source]▶

>>43604120 #

To the sibling comment’s point, I suppose it’s not weird in the Java ecosystem. The parquet-java project has a design where it deserializes Parquet fields into Java representations grabbed from _other_ projects rather than either having some kind of canonical self-representation in memory or acting as just an abstract codec. So, one of the most common things to do is apparently to use the “Avro” flavored serdes to get generic records in memory (note that the actual Avro serialization format is not involved with doing that; parquet-java just uses the classes from Avro as the in memory representations and deserializes Parquet into them). The whole approach seems a bit goofy; I’d expect the library to work as some kind of abstracted codec interface (requiring the in-memory representations to host Parquet, rather than the other way around - like how pandas hosts fastparquet in Python land) or provide a canonical object representation. Instead, it’s this in between where it has a grab bag of converters that transform Parquet to and from random object types pulled from elsewhere in the Java ecosystem.

replies(1): >>43605376 #

48. josefx ◴[06 Apr 25 21:27 UTC] No.43605102{6}[source]▶

>>43604871 #

An overload for Class.forName with an explicit initialize parameter was added in Java 1.2 .

replies(1): >>43608222 #

49. chowells ◴[06 Apr 25 21:58 UTC] No.43605304{4}[source]▶

>>43603502 #

It's true. No memory is being used in contravention of the language semantics. Absolutely memory safe.

50. jeeeb ◴[06 Apr 25 22:03 UTC] No.43605332{5}[source]▶

>>43604161 #

The Apache Arrow libraries are a good alternative for reading parquet files in Java. They provide a column oriented interface, rather than the ugly Avro stuff in the Apache Parquet library.

51. bigfatkitten ◴[06 Apr 25 22:06 UTC] No.43605359[source]▶

>>43604870 #

Vendor CVSS scores are always inherently meaningless because they can't take into account the factors specific to the user's environment.

Users need to do their own assessments.

replies(1): >>43606784 #

52. echoangle ◴[06 Apr 25 22:08 UTC] No.43605367{4}[source]▶

>>43603695 #

But python pickle is only supposed to be used with trusted input, so it’s not a vulnerability.

53. amluto ◴[06 Apr 25 22:09 UTC] No.43605376{6}[source]▶

>>43605060 #

I’d still like to see a clear explanation of where one can stick a Java class name in a Parquet file such that it ends up interpreted by the Avro codec. And I’m curious why it was fixed by making a list of allowed class names instead of disabling the entire mechanism.

54. jeroenhd ◴[06 Apr 25 22:12 UTC] No.43605393[source]▶

>>43604870 #

Unless you're logging user input without proper validation, log4j doesn't really seem that bad.

As a library, this is a huge problem. If you're a user of the library, you'll have to decide if your usage of it is problematic or not.

Either way, the safe solution is to just update the library. Or, based on the link shared elsewhere (https://github.com/apache/parquet-java/compare/apache-parque...) maybe avoid this library if you can, because the Java-specific code paths seem sketchy as hell to me.

replies(2): >>43605484 #>>43608211 #

55. natebc ◴[06 Apr 25 22:13 UTC] No.43605403{4}[source]▶

>>43603446 #

I'm sorry. I made a mistake.

56. natebc ◴[06 Apr 25 22:14 UTC] No.43605405{4}[source]▶

>>43603443 #

I'm sorry. I made a mistake.

57. pclmulqdq ◴[06 Apr 25 22:14 UTC] No.43605410[source]▶

>>43604427 #

I am pretty convinced that CVSS has a very significant component of "how enterprise is it." Accepting untrusted parquet files without verification or exposing apache spark directly to users is a very "enterprise" thing to do (alongside having log4j log untrusted user inputs). Heartbleed sounded too technical and not "enterprise" enough.

replies(1): >>43607573 #

58. ajross ◴[06 Apr 25 22:24 UTC] No.43605484{3}[source]▶

>>43605393 #

> Unless you're logging user input without proper validation, log4j doesn't really seem that bad.

Most systems do log user input though, and "proper validation" is an infamously squishy phrase that mostly acts as an excuse. The bottom line is that the natural/correct/idiomatic use of Log4j exposed the library directly to user-generated data. The similar use of Apache parquet (an obscure tool many of us are learning about for the first time) does not. That doesn't make it secure, but it makes the impact inarguably lower.

I mean, come on: the Log4j exploit was a global zero-day!

replies(1): >>43613320 #

59. b8 ◴[07 Apr 25 00:42 UTC] No.43606314[source]▶

>>43604427 #

A new scoring system should be made that is a better signal.

replies(2): >>43606373 #>>43606374 #

60. jonstewart ◴[07 Apr 25 00:46 UTC] No.43606334{3}[source]▶

>>43603869 #

That data had to be encoded in a certain way which would lead to unavoidable exploitation in every conforming implementation. For example, PDF permits embedded JavaScript and… that has not gone well.

61. tptacek ◴[07 Apr 25 00:54 UTC] No.43606373{3}[source]▶

>>43606314 #

I think the original one did just fine: "info, low, medium, high, crit".

I could even do without "crit".

replies(1): >>43606709 #

62. saagarjha ◴[07 Apr 25 00:54 UTC] No.43606374{3}[source]▶

>>43606314 #

It's quite hard to do this.

63. worthless-trash ◴[07 Apr 25 01:49 UTC] No.43606709{4}[source]▶

>>43606373 #

I believe companies often call that the flaws impact.

It is different than the cvss rating.

replies(1): >>43606866 #

64. SpicyLemonZest ◴[07 Apr 25 01:57 UTC] No.43606782[source]▶

>>43604870 #

The score is meant for consumption by users of the software with the vulnerability. In the kind of systems where Parquet is used, blindly reading files in a context with more privileges than the user who wrote them is very common. (Think less "service accepting a parquet file from an API", more "ETL process that can read the whole company's data scanning files from a dump directory anyone can write to".)

replies(1): >>43610754 #

65. worthless-trash ◴[07 Apr 25 01:57 UTC] No.43606784{3}[source]▶

>>43605359 #

This comment over generalises the problem, but is inherently absurd. There are key indicators in scoring that explain the attack itself which isn't environment specific.

I do agree that in most cases the deployment specific configuration affects the ability to be exploited and users or developers should analyse their own configuration.

66. tptacek ◴[07 Apr 25 02:08 UTC] No.43606866{5}[source]▶

>>43606709 #

In that it is meaningful, yes.

replies(1): >>43609087 #

67. mr_toad ◴[07 Apr 25 03:24 UTC] No.43607316{4}[source]▶

>>43603389 #

Saying the vulnerability is in Parquet itself gets more clicks.

68. positr0n ◴[07 Apr 25 04:09 UTC] No.43607573{3}[source]▶

>>43605410 #

> alongside having log4j log untrusted user inputs

I'd think logging things like query parameters is extremely common.

69. seanhunter ◴[07 Apr 25 05:52 UTC] No.43608211{3}[source]▶

>>43605393 #

It’s incredibly common to log things which contain text elements which come from a user request. I’ve worked on systems that do that 100s of thousands of times per day. I’ve literally never deserialized a parquet file that came from someone else even a single time and I’ve used parquet since it very first was released.

70. hinkley ◴[07 Apr 25 05:53 UTC] No.43608222{7}[source]▶

>>43605102 #

Except they don't call Class.forName(..., false, ...) anywhere in the codebase, so my original comment still stands.

71. dev_l1x_be ◴[07 Apr 25 08:01 UTC] No.43609018{6}[source]▶

>>43604367 #

Unfortunately many of the big data libraries are like that and there is no motivation to fix these things. One example is the ORC Java libraries that had 100s of unnecessary dependencies while at the same time importing the filesystem into the format itself.

72. worthless-trash ◴[07 Apr 25 08:13 UTC] No.43609087{6}[source]▶

>>43606866 #

Surely you think AV:P has a meaningful description in the CVSS Score ?

73. tgv ◴[07 Apr 25 09:07 UTC] No.43609363[source]▶

>>43604427 #

It may be noisy, but recently Draytek routers had a 10 point one, and indeed, an office router had been taken over. It would stubornly reboot every couple of minutes, and not accept upgrades.

74. sean_flanigan ◴[07 Apr 25 09:08 UTC] No.43609370{3}[source]▶

>>43604433 #

Those CVE numbers go up every year… Sounds like inflation to me! ;-)

75. rini17 ◴[07 Apr 25 10:04 UTC] No.43609643[source]▶

>>43603091 (OP) #

Never roll your own deserialization :)

replies(1): >>43610367 #

76. junon ◴[07 Apr 25 12:07 UTC] No.43610358[source]▶

>>43604427 #

Yep. Any software these days can be "network accessible" if you put a server in front of it; that's usually what pumps the score up.

77. junon ◴[07 Apr 25 12:09 UTC] No.43610367[source]▶

>>43609643 #

That's not the takeaway here.

78. seanhunter ◴[07 Apr 25 12:57 UTC] No.43610754{3}[source]▶

>>43606782 #

I get the point you’re making but I’m gonna push back a little on this (as someone who has written a fair few ETL processes in their time). When are you ever ETLing a parquet file? You are always ETLing some raw format (css, json, raw text, structured text, etc) and writing into parquet files, never reading parquet files themselves. It seems a pretty bad practise to write your etl to just pick up whatever file in whatever format from a slop bucket you don’t control. I would always pull files in specific formats from such a common staging area and everything else would go into a random “unstructured data” dump where you just make a copy of it and record the metadata. I mean it’s a bad bug and I’m happy they’re fixing it, but it feels like you have to go out of your way to encounter it in practice.

79. jeroenhd ◴[07 Apr 25 16:39 UTC] No.43613320{4}[source]▶

>>43605484 #

> Most systems do log user input though, and "proper validation" is an infamously squishy phrase that mostly acts as an excuse

That's my point: if you start adding constraints to a vulnerability to reduce its scope, high CVE scores don't exist.

Any vulnerability that can be characterised as "pass contents through parser, full RCE" is a 10/10 vulnerability for me. I'd rather find out my application isn't vulnerable after my vulnerability scanner reports a critical issue than let it lurk with all the other 3/10 vulnerabilities about potential NULL pointers or complexity attacks in specific method calls.

replies(1): >>43613528 #

80. ajross ◴[07 Apr 25 16:58 UTC] No.43613528{5}[source]▶

>>43613320 #

> Any vulnerability that can be characterised as "pass contents through parser, full RCE" is a 10/10 vulnerability for me

And I think that's just wildly wrong sorry. I view something exploited in the wild to compromise real systems as a higher impact than something that isn't, and want to see a "score" value that reflects that (IMHO, critical) distinction. Agree to disagree, as it were.

81. marginalia_nu ◴[07 Apr 25 22:32 UTC] No.43616661[source]▶

>>43603091 (OP) #

I migrated off apache parquet to a very simple columnar format. Cut processing times in half, reduced RAM usage by almost 90%, and (as it turns out) dodged this security vulnerability.

I don't want to make too harsh remarks about the project, as it may simply not have been the right tool for my use case, though it sure gave me a lot of issues.

replies(1): >>43617894 #

82. ryan-duve ◴[08 Apr 25 02:42 UTC] No.43617894[source]▶

>>43616661 #

What "very simple columnar format" did you switch to?

replies(1): >>43619809 #

83. marginalia_nu ◴[08 Apr 25 09:30 UTC] No.43619809{3}[source]▶

>>43617894 #

https://github.com/MarginaliaSearch/SlopData

Writeup about some of the ideas that went into it:

https://www.marginalia.nu/log/a_112_slop_ideas/

84. yencabulator ◴[08 Apr 25 14:55 UTC] No.43622507[source]▶

>>43603091 (OP) #

In the parquet-java *Java implementation of Apache Parquet*.

Not in the file format.

↑