Most active commenters
  • (4)
  • natebc(3)

←back to thread

174 points andy99 | 13 comments | | HN request time: 0.82s | source | bottom
1. ustad ◴[] No.43603319[source]
Does anyone know if pandas is affected? I serialize/deserialize dataframes which pandas uses parquet under the hood.
replies(2): >>43603399 #>>43603494 #
2. natebc ◴[] No.43603399[source]
https://www.endorlabs.com/learn/critical-rce-vulnerability-i...

> Any application or service using Apache Parquet Java library versions 1.15.0 or earlier is believed to be vulnerable (our own data indicates that this was introduced in version 1.8.0; however, current guidance is to review all historical versions). This includes systems that read or import Parquet files using popular big-data frameworks (e.g. Hadoop, Spark, Flink) or custom applications that incorporate the Parquet Java code. If you are unsure whether your software stack uses Parquet, check with your vendors or developers – many data analytics and storage solutions include this library.

Seems safe to assume yes, pandas is probably affected by using this library.

replies(3): >>43603443 #>>43603446 #>>43604839 #
3. nindalf ◴[] No.43603443[source]
The paragraph you pasted in states that only applications importing the Java library are vulnerable.

Isn’t pandas implemented in Python/C? How would it have been importing the Java library?

replies(2): >>43604826 #>>43605405 #
4. 3eb7988a1663 ◴[] No.43603446[source]
That does not follow for me. Pandas does not utilize Java/JVM.
replies(2): >>43604833 #>>43605403 #
5. minimaxir ◴[] No.43603494[source]
Pandas doesn't use the parquet python package under the hood: https://pandas.pydata.org/docs/reference/api/pandas.read_par...

> Parquet library to use. If ‘auto’, then the option io.parquet.engine is used. The default io.parquet.engine behavior is to try ‘pyarrow’, falling back to ‘fastparquet’ if ‘pyarrow’ is unavailable.

Those should be unaffected.

replies(1): >>43603695 #
6. westurner ◴[] No.43603695[source]
Python pickles have the same issue but it is a design decision per the docs.

Python docs > library > pickle: https://docs.python.org/3/library/pickle.html

Re: a hypothetical pickle parser protocol that doesn't eval code at parse time; "skipcode pickle protocol 6: "AI Supply Chain Attack: How Malicious Pickle Files Backdoor Models" .. "Insecurity and Python Pickles" : https://news.ycombinator.com/item?id=43426963

replies(2): >>43604174 #>>43605367 #
7. ◴[] No.43604174{3}[source]
8. ◴[] No.43604826{3}[source]
9. ◴[] No.43604833{3}[source]
10. ◴[] No.43604839[source]
11. echoangle ◴[] No.43605367{3}[source]
But python pickle is only supposed to be used with trusted input, so it’s not a vulnerability.
12. natebc ◴[] No.43605403{3}[source]
I'm sorry. I made a mistake.
13. natebc ◴[] No.43605405{3}[source]
I'm sorry. I made a mistake.