Ask HN: What's the 2025 stack for a self-hosted photo library with local AI?

First of all, this is purely a personal learning project for me, aiming to combine three of my passions: photography, software engineering, and my family memories. I have a large collection of family photos and want to build an interactive experience to explore them, ala Google or Apple Photo features.

My goal is to create a system with smart search capabilities, and one of the most important requirements is that it must run entirely on my local hardware. Privacy is key, but the main driver is the challenge and joy of building it myself (an obviously learn).

The key features I'm aiming for are:

Automatic identification and tagging of family members (local face recognition).

Generation of descriptive captions for each photo.

Natural language search (e.g., "Show me photos of us at the beach in Luquillo from last summer").

I've already prompted AI tools for a high-level project plan, and they provided a solid blueprint (eg, Ollama with LLaVA, a vector DB like ChromaDB, you know it). Now, I'm highly interested in the real-world human experience. I'm looking for advice, learning stories, and the little details that only come from building something similar.

What tools, models, and best practices would you recommend for a project like this in 2025? Specifically, I'm curious about combining structured metadata (EXIF), face recognition data, and semantic vector search into a single, cohesive application.

Any and all advice would be deeply appreciated. Thanks!

Show context

crobibero ◴[30 Jun 25 18:23 UTC] No.44426343[source]▶

>>44426233 (OP) #

I think Immich checks a lot of these

https://immich.app/

replies(5): >>44426505 #>>44426857 #>>44427196 #>>44429603 #>>44434882 #

sz4kerto ◴[30 Jun 25 18:41 UTC] No.44426505[source]▶

>>44426343 #

This. It's a fascinating project, it is hard to believe how can an FLOSS project be so high quality. In my book it's on the level of Postgres (although it's a smaller project, probably).

replies(2): >>44426592 #>>44426992 #

1. denysvitali ◴[30 Jun 25 18:51 UTC] No.44426592[source]▶

>>44426505 #

Their frontend is amazing, their apps are not as performant, and the backend is (IMHO) the worst of them all.

No hate here, I'm really grateful for what they've achieved so far, but I think there's a lot of room for improvement (e.g: proper R/W query split, native S3 integration, faster endpoints, ...). I already mentioned it in their channel (they're a really welcoming community!) and I'm working on an alternative drop-in replacement backend (written in Go) [1] that will hopefully bring all the needed improvements.

TL;DR: It's definitely good, especially for an open-source project, and the team is very dedicated - but it's definitely not Postgres-good

[1]: https://github.com/denysvitali/immich-go-backend

replies(1): >>44427227 #

2. darkwater ◴[30 Jun 25 20:01 UTC] No.44427227[source]▶

>>44426592 (TP) #

Why the focus on S3 for a self-hosted app? Anyway kudos for the effort, I'm not experiencing performance issues in my locally self-hosted Immich installation but more performant software is always welcome.

replies(3): >>44427550 #>>44428576 #>>44429185 #

3. rkagerer ◴[30 Jun 25 20:33 UTC] No.44427550[source]▶

>>44427227 #

I'm wondering the same thing. He had me until he said "S3".

replies(1): >>44427626 #

4. bargainbin ◴[30 Jun 25 20:41 UTC] No.44427626{3}[source]▶

>>44427550 #

Likely means S3 compatibility so it can be used with anything, be it a cloud provider or a locally hosted solution like minio

replies(1): >>44427928 #

5. denysvitali ◴[30 Jun 25 21:11 UTC] No.44427928{4}[source]▶

>>44427626 #

S3-compatible storage. In my case, Backblaze B2. The idea is to make the backend compatible with rclone, so that one can pick whatever storage they want (including B2 / S3 and others)

replies(1): >>44432237 #

6. jasonjayr ◴[30 Jun 25 22:24 UTC] No.44428576[source]▶

>>44427227 #

I have and love my self-hosted immich install. If self-hosted could also use S3 storage, that allows me to use Garage (https://git.deuxfleurs.fr/Deuxfleurs/garage) , which also lets me play games with growable/redundant storage on a pile of second-hand hard drives. IIRC it can only use a mounted block device at the moment, (unless there is a nfs-exposed s3 translator ....)

A lot of existing tooling supports the s3 protocol, so it would simplify the storage picture (no pun intended).

7. toomuchtodo ◴[30 Jun 25 23:53 UTC] No.44429185[source]▶

>>44427227 #

S3 compatible means one can point it at any storage that talks S3, which is a lot more flexible than POSIX or NFS.

8. darkwater ◴[01 Jul 25 09:54 UTC] No.44432237{5}[source]▶

>>44427928 #

I backup my immich photos in B2 with rclone but I prefer having it as a separate process (also, the backup is append-only). I don't need "hyperscale", and storing directly on S3/B2/remotely breaks a bit the 3-2-1 rule I want to follow.

replies(1): >>44432680 #

9. denysvitali ◴[01 Jul 25 11:11 UTC] No.44432680{6}[source]▶

>>44432237 #

On B2 (and S3 storage in general) you can set a retention policy for what happens after you delete an object (e.g: object lock with persistance for at least 30 days). Of course this is not a substitute for a backup - but it's better than discovering that you deleted your whole 1TB library when it's too late

↑