←back to thread

Show HN: Arc – high-throughput time-series warehouse with DuckDB analytics

(github.com)

29 points ignaciovdk | 2 comments | 07 Oct 25 16:40 UTC | HN request time: 0.437s | source

Hi HN, I’m Ignacio, founder at Basekick Labs.

Over the past months I’ve been building Arc, a time-series data platform designed to combine very fast ingestion with strong analytical queries.

What Arc does? Ingest via a binary MessagePack API (fast path), Compatible with Line Protocol for existing tools (Like InfluxDB, I'm ex Influxer), Store data as Parquet with hourly partitions, Query via DuckDB engine using SQL

Why I built it:

Many systems force you to trade retention, throughput, or complexity. I wanted something where ingestion performance doesn’t kill your analytics.

Performance & benchmarks that I have so far.

Write throughput: ~1.88M records/sec (MessagePack, untuned) in my M3 Pro Max (14 cores, 36gb RAM) ClickBench on AWS c6a.4xlarge: 35.18 s cold, ~0.81 s hot (43/43 queries succeeded) In those runs, caching was disabled to match benchmark rules; enabling cache in production gives ~20% faster repeated queries

I’ve open-sourced the Arc repo so you can dive into implementation, benchmarks, and code. Would love your thoughts, critiques, and use-case ideas.

Thanks!

1. leguy ◴[08 Oct 25 00:35 UTC] No.45510680[source]▶

>>45505407 (OP) #

In conjunction with Postgres for related relational data, I’m using timescale for IoT based time series data.

Is this something I’d use instead of timescale, or, am I understanding that the intention here is to be a data warehouse, where we could potentially offload older data to Arc for longer term storage or trend analysis?

replies(1): >>45510885 #

2. ignaciovdk ◴[08 Oct 25 01:07 UTC] No.45510885[source]▶

>>45510680 (TP) #

Hey, thanks for asking.

I’d say both roles are possible, though the original intent of Arc was indeed to act as an offload / long-term store for systems like TimescaleDB, InfluxDB, Kafka, etc. The idea: you send data into Arc to reduce storage and query load on your primary database for ML, deep analysis, etc.

But as we built it, we discovered that Arc is really good not just at storage but at actively answering queries, so it’s kind of hybrid: somewhat “warehouse-like,” but still retaining database qualities in performance. I feel that saying a database its too much, but we are going on that direction.

IoT is absolutely one of the core use cases. You’re often ingesting tens or hundreds of thousands of events per second from edge devices, and you need a system that doesn’t choke. Our binary MessagePack ingestion helps shrink the payload size and reduce parsing overhead, that allows higher throughput for writes, which is crucial in IoT scenarios.

Let me know if you want to explore this a little more, not for selling you anything, at least not yet, I would love to understand your use case. Let me know if you are open: ignacio[at]basekick[dot]net