Most spreadsheet apps choke on big files. Coding in pandas or Polars works—but not everyone wants to write scripts just to filter or merge CSVs. CSV GB+ gives you a fast, point-and-click interface built on dual backends (memory-optimized or disk-backed) so you can process huge datasets offline.
Key Features: Handles massive CSVs with ease — merge, split, dedup, filter, batch export
Smart engine switch: disk-based "V Core" or RAM-based "P Core"
All processing is offline – no data upload or telemetry
Supports CSV, XLSX, JSON, DBF, Parquet and more
Designed for data pros, students, and privacy-conscious users
Register for 7-days free to pro try, pro versions remove row limits and unlock full features. I’m a solo dev building Data.olllo as a serious alternative to heavy coding or bloated enterprise tools.
Download for Windows: https://apps.microsoft.com/detail/9PFR86LCQPGS
User Guide: https://olllo.top/articles/article-0-Data.olllo-UserGuide
Would love feedback! I’m actively improving it based on real use cases.
What are you using for processing (polars)?
Marketing note: I'm sure you're proud of P Core/V Core, but that doesn't matter to your users, it's an implementation detail. At a maximum I'd write "intelligent execution that scales from small files to large files".
As an implementation note, I would make it simple to operate on just the first 1000 (10k or 100k) rows so responses are super quick, then once the users are happy about the transform, make it a single click to operate on the entire file with a time estimate.
Another feature I'd like in this vein is execute on a small subset, then if you find an error with a larger subset, try to reduce the larger subset to a small quick to reproduce version. Especially for deduping.
It‘s interesting to research how capable applications like Lotus123 have been even on low resolutions like 800x600 pixel compared to today’s standard
I created Buckaroo to provide a better table viewing experience inside of notebooks. I also built a low code UI and auto cleaning to expedite the wrote data cleaning tasks that take up a large portion of data analysis. Autocleaning is heuristically powered - no LLMs, so it's fast and your data stays local. You can apply different autocleaning strategies and visually inspect the results. When you are happy with the cleaning, you can copy and paste the python code as a reusable function.
All of this is open source, and its extendable/customizable.
Here's a video walking through autocleaning and how to extend it https://youtu.be/A-GKVsqTLMI
Here's the repo: https://github.com/paddymul/buckaroo
Speaking personally, "intelligent execution that scales from small files to large files" sounds like marketing buzz that could mean absolutely nothing. I like that it mentions specifically switching between RAM and disk-powered engines, because that suggests it's not just marketing speak, but was actually engineered. Maybe P vs V Core is not the best way to market it, but I think it's worth mentioning that design.
Yes, Data.olllo uses including Polars under the hood for fast and efficient processing. A demo video is in the works and should be up soon.
Good point about the "P Core/V Core" naming—I'll simplify that to focus more on the user benefit, like scaling from small to large files smoothly.
I also like your idea of running transformations on a sample first with a one-click full run—very aligned with the vision. And subset reproduction for errors is a great suggestion, especially for things like deduping. Appreciate it!
You're right that terms like "intelligent execution" can feel vague without concrete backing. My goal with mentioning P Core/V Core was to hint at the underlying design—switching between in-memory and disk-based engines like Polars and Vaex—without overwhelming with technical detail.
I’ll look for a better way to explain the idea clearly and briefly. Thanks again!
Data.olllo is focused more on local data processing, not just viewing—things like filtering, transforming, merging, and even running Python code (with AI assistance coming). It’s built for both small and large files with performance in mind, using many cores including Polars under the hood.
Also, good news: the macOS version is in the works and will be submitted to the Mac App Store soon!
That said, I also plan to add support for Parquet and other formats soon—definitely agree it's gaining traction for larger, structured datasets.
Appreciate the DuckDB comparison—great tool and definitely a benchmark worth learning from!
I'm not saying we need a morlock/eloi toggle.