←back to thread

45 points mousomashakel | 5 comments | | HN request time: 0.625s | source

Hey HN, I’ve built Fahmatrix, a minimal, fast Java library for working with tabular data — inspired by Python’s pandas, but designed for performance and simplicity on the JVM.

After working extensively with Python’s data stack, I often ran into limitations related to speed, especially in larger or long-running data workflows. So I built Fahmatrix from scratch to offer similar APIs for manipulating CSVs, performing summary statistics, slicing rows/columns, and more — but all in Java.

Features:

Lightweight and dependency-free

CSV/TSV import with auto-headers

Series/DataFrame structures (like pandas)

describe(), mean(), stdDev(), percentile() and more

Fast parallel operations on numeric columns

Java 17+ support

Docs: https://moustafa-nasr.github.io/Fahmatrix/ GitHub: https://github.com/moustafa-nasr/fahmatrix

I’d love feedback from the Java and data communities — especially if you’ve ever wanted a simple dataframe utility in Java without needing full-scale ML libraries.

Happy to answer any questions!

Show context
uwemaurer ◴[] No.44012253[source]
Always great to see efforts to make working with data frames easier. Here are some similar data frame libraries for Java:

https://github.com/jtablesaw/tablesaw

https://github.com/dflib/dflib

My preferred way is just use duckdb java API. I didn't see anything better in performance/efficiency. Also a SQL query is often easier to write

replies(2): >>44012928 #>>44019069 #
1. theanonymousone ◴[] No.44012928[source]
Yes. It has bothered me for a long time too. Maybe the best mix is a dataframe library with basic operations (column select, non-null etc), which also allows SQL for more complex stuff?
replies(2): >>44015885 #>>44019080 #
2. radus ◴[] No.44015885[source]
Polars and duckdb interoperate nicely and can enable this flexibility
replies(1): >>44016600 #
3. theanonymousone ◴[] No.44016600[source]
Does Polars have a Java library?
4. mousomashakel ◴[] No.44019080[source]
Totally agree that SQL can be the best tool for many jobs. My goal with Fahmatrix is to serve the opposite niche: where devs want something that's Java-native, procedural, and simple without reaching for an external engine. SQL support or DSL might come later though — I see the appeal.
replies(1): >>44019557 #
5. theanonymousone ◴[] No.44019557[source]
Sure. So maybe notehr comment would be to make it (particularly the Series class), as compatible with Java Streams as possible.

Next step would likely be compatibility with popular libraries such as Apache Commons Math: https://commons.apache.org/proper/commons-math/userguide/sta...