224 points jamesxv7 | 1 comments | 30 Jun 25 18:10 UTC | HN request time: 0.204s | source

First of all, this is purely a personal learning project for me, aiming to combine three of my passions: photography, software engineering, and my family memories. I have a large collection of family photos and want to build an interactive experience to explore them, ala Google or Apple Photo features.

My goal is to create a system with smart search capabilities, and one of the most important requirements is that it must run entirely on my local hardware. Privacy is key, but the main driver is the challenge and joy of building it myself (an obviously learn).

The key features I'm aiming for are:

Automatic identification and tagging of family members (local face recognition).

Generation of descriptive captions for each photo.

Natural language search (e.g., "Show me photos of us at the beach in Luquillo from last summer").

I've already prompted AI tools for a high-level project plan, and they provided a solid blueprint (eg, Ollama with LLaVA, a vector DB like ChromaDB, you know it). Now, I'm highly interested in the real-world human experience. I'm looking for advice, learning stories, and the little details that only come from building something similar.

What tools, models, and best practices would you recommend for a project like this in 2025? Specifically, I'm curious about combining structured metadata (EXIF), face recognition data, and semantic vector search into a single, cohesive application.

Any and all advice would be deeply appreciated. Thanks!

Show context

nico ◴[30 Jun 25 20:09 UTC] No.44427304[source]▶

>>44426233 (OP) #

I don't know about the photo-management aspects. However, I've had very good experiences running gemma3 (4b and 12b) locally via ollama

I've used gemma to process pictures and get descriptions and also to respond questions about the pictures (eg. is there a bicycle in the picture?). Haven't tried it for face recognition, but if you already have identified someone in one photo, it can probably tell you if the person in that photo is also in another photo

Just one caveat, if you are processing thousands of pictures, it will take a while to process them all (depending on your hardware and picture size). You could also try creating a processing pipeline, first extracting faces or bounding boxes of the faces with something like opencv, and then passing those to gemma3

Please post repo link if you ever decide to open source

replies(1): >>44427360 #

1. jamesxv7 ◴[30 Jun 25 20:16 UTC] No.44427360[source]▶

>>44427304 #

Thanks nico for sharing your experience! That's really helpful. The idea of using OpenCV to create a processing pipeline for face detection before passing it to Gemma is brilliant I hadn't thought of that. I'll definitely look into using gemma with ollama.

And for sure, if I get this to a point where it's open-source, I'll post the link here!

↑

Ask HN: What's the 2025 stack for a self-hosted photo library with local AI?