My goal is to create a system with smart search capabilities, and one of the most important requirements is that it must run entirely on my local hardware. Privacy is key, but the main driver is the challenge and joy of building it myself (an obviously learn).
The key features I'm aiming for are:
Automatic identification and tagging of family members (local face recognition).
Generation of descriptive captions for each photo.
Natural language search (e.g., "Show me photos of us at the beach in Luquillo from last summer").
I've already prompted AI tools for a high-level project plan, and they provided a solid blueprint (eg, Ollama with LLaVA, a vector DB like ChromaDB, you know it). Now, I'm highly interested in the real-world human experience. I'm looking for advice, learning stories, and the little details that only come from building something similar.
What tools, models, and best practices would you recommend for a project like this in 2025? Specifically, I'm curious about combining structured metadata (EXIF), face recognition data, and semantic vector search into a single, cohesive application.
Any and all advice would be deeply appreciated. Thanks!
No hate here, I'm really grateful for what they've achieved so far, but I think there's a lot of room for improvement (e.g: proper R/W query split, native S3 integration, faster endpoints, ...). I already mentioned it in their channel (they're a really welcoming community!) and I'm working on an alternative drop-in replacement backend (written in Go) [1] that will hopefully bring all the needed improvements.
TL;DR: It's definitely good, especially for an open-source project, and the team is very dedicated - but it's definitely not Postgres-good
We no longer are auto uploading to Google or Apple.
So far, I really like it. I haven't quite gone 100%, as we're still uploading with Synology's photo app, but Immich provides a much more refined, featured interface.
Near zero maintenance stack, incredibly easy to update, the client mobile apps even notify you (unobtrusively) when your server has an update available. The UI is just so polished & features so stable it's hard to believe it's open source.
A lot of existing tooling supports the s3 protocol, so it would simplify the storage picture (no pun intended).
I’m running a DS1813+. It’s stopped getting new feature updates. This approach lets me keep the storage running while migrating away the server components.
Given how good the new multimodal models are, I've been thinking it would be much better to just have a multimodal model describe the image, and let the searching be done by the already included melleisearch.
That said, due to reasons I haven't had time to mess with it past couple of months, so perhaps something drastic has changed.
Is it really that stable and flawless in terms of updates?
Because I'm sat here with ZFS, snapshotting and replication configured and wondering why people scare others off of it when the tools to mitigate issues are all free and should be used anyway as part of a bog-standard self-hosted stack.
I also perform all my updates manually - it's fully automated: a simple script that runs in seconds across my entire home server - but I don't have it on any schedule so I'm not doing anything blind. That at least affords me the luxury of being present if/when anything breaks (though for Immich that has not occurred yet).
So my photo storage on my home server is getting filled with a bunch of useless images that I only have on my phone temporarily and that I end up deleting shortly after.