←back to thread

311 points joshdickson | 1 comments | | HN request time: 0.21s | source

Hi HN!

Today I’m excited to launch OpenNutrition: a free, ODbL-licenced nutrition database of everyday generic, branded, and restaurant foods, a search engine that can browse the web to import new foods, and a companion app that bundles the database and search as a free macro tracking app.

Consistently logging the foods you eat has been shown to support long-term health outcomes (1)(2), but doing so easily depends on having a large, accurate, and up-to-date nutrition database. Free, public databases are often out-of-date, hard to navigate, and missing critical coverage (like branded restaurant foods). User-generated databases can be unreliable or closed-source. Commercial databases come with ongoing, often per-seat licensing costs, and usage restrictions that limit innovation.

As an amateur powerlifter and long-term weight loss maintainer, helping others pursue their health goals is something I care about deeply. After exiting my previous startup last year, I wanted to investigate the possibility of using LLMs to create the database and infrastructure required to make a great food logging app that was cost engineered for free and accessible distribution, as I believe that the availability of these tools is a public good. That led to creating the dataset I’m releasing today; nutritional data is public record, and its organization and dissemination should be, too.

What’s in the database?

- 5,287 common everyday foods, 3,836 prepared and generic restaurant foods, and 4,182 distinct menu items from ~50 popular US restaurant chains; foods have standardized naming, consistent numeric serving sizes, estimated micronutrient profiles, descriptions, and citations/groundings to USDA, AUSNUT, FRIDA, CNF, etc, when possible.

- 313,442 of the most popular US branded grocery products with standardized naming, parsed serving sizes, and additive/allergen data, grounded in branded USDA data; the most popular 1% have estimated micronutrient data, with the goal of full coverage.

Even the largest commercial databases can be frustrating to work with when searching for foods or customizations without existing coverage. To solve this, I created a real-time version of the same approach used to build the core database that can browse the web to learn about new foods or food customizations if needed (e.g., a highly customized Starbucks order). There is a limited demo on the web, and in-app you can log foods with text search, via barcode scan, or by image, all of which can search the web to import foods for you if needed. Foods discovered via these searches are fed back into the database, and I plan to publish updated versions as coverage expands.

- Search & Explore: https://www.opennutrition.app/search

- Methodology/About: https://www.opennutrition.app/about

- Get the iOS App: https://apps.apple.com/us/app/opennutrition-macro-tracker/id...

- Download the dataset: https://www.opennutrition.app/download

OpenNutrition’s iOS app offers free essential logging and a limited number of agentic searches, plus expenditure tracking and ongoing diet recommendations like best-in-class paid apps. A paid tier ($49/year) unlocks additional searches and features (data backup, prioritized micronutrient coverage for logged foods), and helps fund further development and broader library coverage.

I’d love to hear your feedback, questions, and suggestions—whether it’s about the database itself, a really great/bad search result, or the app.

1. Burke et al., 2011, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3268700/

2. Patel et al., 2019, https://mhealth.jmir.org/2019/2/e12209/

Show context
papa_bear ◴[] No.43574853[source]
This is neat. I've spent a lot of time thinking about implementing something similar for my company Eat This Much, but end up pushing it off in favor of focusing on our core meal planning features.

When something doesn't have a reference listed, and just says "sourced from a publicly available first-party datasource", what does that mean? Crawled from other sources and you'd prefer not to say? The wording does feel a little sketchy when contrasted with entries that do list sources.

When something does list references that don't seem super close to the actual food, what is the process like there for interpreting those values? Example, this Chicken Salad inheriting from Chicken Spread: https://www.opennutrition.app/search/chicken-salad-37mAX17YX...

The quality of the data might feel rough now, but I can see this being valuable for our users even if it's just an opt-in "show estimated micronutrients" or something. Would require labeling values as not being directly from a source of truth.

One thing that a lot of people are missing is that there is already a lot of inaccurate nutrition data out there. Even on information directly from the manufacturer, sometimes there are errors, or just old versions of the product that never get scrubbed from the internet (I imagine the latter case would be tricky for an LLM to deal with too). Just logging your dietary intake in any form will get you 80% of the benefit of tracking via some self awareness of your intake. Of course, it's an easy argument to point out that if you had the choice between verified data and fuzzy LLM data, you should go for the human verified data (for now).

replies(1): >>43575267 #
joshdickson ◴[] No.43575267[source]
Thank you for your questions and feedback.

> When something doesn't have a reference listed, and just says "sourced from a publicly available first-party datasource", what does that mean?

It depends, and the degree to which it depends is why the citation is ambiguous (although it is true, if imprecise). My goal is to individually cite the individual nutrients but it was simply too costly and time-consuming at the stage of the project at which I did this work.

> what is the process like there for interpreting those values?

Because the degree to which something in the database might be related to those values is so varied, it depends. The reasoning agent had access to those database entires, which is helpful because they tend to contain micronutrient data. It also had access to web data, as well as its own world knowledge, and considers sources in that order. Ultimately it was left up to the agent to decide what the most reasonable fit for each food was, thinking through what an average user likely meant by that entry (e.g. a typical user probably assumes a 'Tomato' is raw), and then to choose the best sources from there. For the chicken salad, it used approximate micronutrient values from the listed references to inform its answer, but adapted the end values for how the dish is described in the description.

> if you had the choice between verified data and fuzzy LLM data, you should go for the human verified data (for now)

Human verification isn't free, and that means it is not available to a lot of people who can't or don't want to pay for something. But if that's something that someone values, I would certainly not diss the human effort!

replies(1): >>43576966 #
1. papa_bear ◴[] No.43576966[source]
Very cool, thanks for elaborating on the process. Good luck, I'll be keeping an eye on your progress!