←back to thread

29 points ignaciovdk | 1 comments | | HN request time: 0.21s | source

Hi HN, I’m Ignacio, founder at Basekick Labs.

Over the past months I’ve been building Arc, a time-series data platform designed to combine very fast ingestion with strong analytical queries.

What Arc does? Ingest via a binary MessagePack API (fast path), Compatible with Line Protocol for existing tools (Like InfluxDB, I'm ex Influxer), Store data as Parquet with hourly partitions, Query via DuckDB engine using SQL

Why I built it:

Many systems force you to trade retention, throughput, or complexity. I wanted something where ingestion performance doesn’t kill your analytics.

Performance & benchmarks that I have so far.

Write throughput: ~1.88M records/sec (MessagePack, untuned) in my M3 Pro Max (14 cores, 36gb RAM) ClickBench on AWS c6a.4xlarge: 35.18 s cold, ~0.81 s hot (43/43 queries succeeded) In those runs, caching was disabled to match benchmark rules; enabling cache in production gives ~20% faster repeated queries

I’ve open-sourced the Arc repo so you can dive into implementation, benchmarks, and code. Would love your thoughts, critiques, and use-case ideas.

Thanks!

Show context
riku_iki ◴[] No.45530355[source]
> Write throughput: ~1.88M records/sec (MessagePack, untuned)

this doesn't sound like much, unless records are very large..

replies(1): >>45534621 #
1. ignaciovdk ◴[] No.45534621[source]
That’s fair, the number alone doesn’t mean much without context.

The benchmark measures fully written time-series records, not bytes. Each record typically includes 1–4 fields, tags, and timestamps, similar to InfluxDB’s Line Protocol structure.

For comparison, the same hardware (AWS c6a.4xlarge) handles around 240K RPS using Line Protocol, while Arc reaches 1.88M RPS with MessagePack, about 7.8× faster on ingestion throughput.

You can see the full ClickBench and ingestion benchmarks are in the repo.

TL;DR: Arc’s strength isn’t massive single records, it’s sustained high-throughput ingestion of structured time-series data while still staying analytical-query friendly.