←back to thread

224 points jamesxv7 | 2 comments | | HN request time: 0.472s | source

First of all, this is purely a personal learning project for me, aiming to combine three of my passions: photography, software engineering, and my family memories. I have a large collection of family photos and want to build an interactive experience to explore them, ala Google or Apple Photo features.

My goal is to create a system with smart search capabilities, and one of the most important requirements is that it must run entirely on my local hardware. Privacy is key, but the main driver is the challenge and joy of building it myself (an obviously learn).

The key features I'm aiming for are:

Automatic identification and tagging of family members (local face recognition).

Generation of descriptive captions for each photo.

Natural language search (e.g., "Show me photos of us at the beach in Luquillo from last summer").

I've already prompted AI tools for a high-level project plan, and they provided a solid blueprint (eg, Ollama with LLaVA, a vector DB like ChromaDB, you know it). Now, I'm highly interested in the real-world human experience. I'm looking for advice, learning stories, and the little details that only come from building something similar.

What tools, models, and best practices would you recommend for a project like this in 2025? Specifically, I'm curious about combining structured metadata (EXIF), face recognition data, and semantic vector search into a single, cohesive application.

Any and all advice would be deeply appreciated. Thanks!

Show context
mossTechnician ◴[] No.44426333[source]
This may not interest you, but Ente checks most of these boxes for me. It has face recognition and AI-based object search out of the box, and you can self-host their open-source server without any restrictions. The models they used might be useful for your project.
replies(3): >>44426503 #>>44426975 #>>44428905 #
akho ◴[] No.44426975[source]
The Ente self-hosting proposition seems strange. Why would I want to e2e encrypt my photos that I self-host? Sounds like it will only make life more difficult.
replies(5): >>44427017 #>>44429002 #>>44429476 #>>44430189 #>>44430219 #
prophesi ◴[] No.44430219[source]
If there's a server involved, there's no reason not to have sensitive files and information end-to-end encrypted, whether self-hosting or not.
replies(1): >>44432849 #
akho ◴[] No.44432849[source]
You do want to have things encrypted in transit and at rest. e2ee means server admins (I) cannot access the user's (mine) photos.
replies(1): >>44435005 #
prophesi ◴[] No.44435005[source]
The server admin can still access their own photos via the client. They wouldn't be able to access the photos of other users.

edit: To explain further why it's almost always desirable:

You guarantee that you and your users' information is safe if the server is compromised, if an admin goes rogue, or if local bodies of power request their information from you.

The information can't be sent to third-parties by design.

Any operations / transformations that need to be applied to the information will have to either be done via homomorphic encryption or on the client-side (which is much more likely to be open source / easy-to-deobfuscate compared to blackbox server code).

replies(1): >>44436480 #
akho ◴[] No.44436480[source]
I understand what e2ee is, thank you. I just don't think it’s justified for self-hosted photo servers.

E. g., “Any operations / transformations” includes facial recognition, CLIP embeddings, &c; you want to run this on the server, overnight, and to be able to re-run at a later date when new models become available. Under e2ee, that’s a round-trip through a client device at every model update. So that’s a significant downside, for no important upsides in the case when you and your family are the only users.

replies(1): >>44436643 #
prophesi ◴[] No.44436643[source]
I was explaining why e2ee has important upsides, not how e2ee works. With Ente (and I think Immich as well), facial recognition and generating new CLIP embeddings are done on-device[0], usually right when the photo is taken / before they're uploaded to the server.

[0] https://ente.io/blog/image-search-with-clip-ggml/

replies(1): >>44438033 #
akho ◴[] No.44438033[source]
Immich does it on the server.

What happens if there’s a new, better model? You’d need to re-download, decrypt, and run inference on all your past media, which is in terabytes for many.

I understand the benefit of e2ee in a situation where there is no trust between user and admin. In personal self-hosting, that’s the same person (or family), and the upsides are not as relevant. The downsides (possibility of data loss for, e. g., kids who are not very good with passwords/keys; difficulties with updating models / thumbs; …) remain important, and outweigh the benefits, even assuming the e2ee is implemented well.

replies(1): >>44440065 #
prophesi ◴[] No.44440065[source]
You do you, but the trust is beyond just admin and users. And family photos are treated as treasures. Data loss is a fair point, but if you're self-hosting a photos app I imagine server/db backups are part of your routine. Account recovery is all that's needed to recover lost photos from there. Well, unless your VPS is compromised in a manner of data loss for longer than you wished before your backups ran, in which case it's still better that such sensitive info was e2ee'd.

edit: also feel like I'm echoing the classic dropbox comment, but self-hosting in a sane and secure manner is harder than it's made out to be. It needs to be taken seriously.

replies(1): >>44440609 #
1. akho ◴[] No.44440609[source]
e2ee prevents account recovery.
replies(1): >>44446927 #
2. prophesi ◴[] No.44446927[source]
People have found decent solutions for that. Proton's is essentially a backup password/phrase or a file you keep safe. Not as simple as a magic link, and could still lose your backup phrase/file, but alas. Security is always a compromise on convenience.

[0] https://proton.me/blog/data-recovery-end-to-end-encryption