Most active commenters
  • joshdickson(6)
  • yamihere(4)

←back to thread

311 points joshdickson | 12 comments | | HN request time: 1.392s | source | bottom

Hi HN!

Today I’m excited to launch OpenNutrition: a free, ODbL-licenced nutrition database of everyday generic, branded, and restaurant foods, a search engine that can browse the web to import new foods, and a companion app that bundles the database and search as a free macro tracking app.

Consistently logging the foods you eat has been shown to support long-term health outcomes (1)(2), but doing so easily depends on having a large, accurate, and up-to-date nutrition database. Free, public databases are often out-of-date, hard to navigate, and missing critical coverage (like branded restaurant foods). User-generated databases can be unreliable or closed-source. Commercial databases come with ongoing, often per-seat licensing costs, and usage restrictions that limit innovation.

As an amateur powerlifter and long-term weight loss maintainer, helping others pursue their health goals is something I care about deeply. After exiting my previous startup last year, I wanted to investigate the possibility of using LLMs to create the database and infrastructure required to make a great food logging app that was cost engineered for free and accessible distribution, as I believe that the availability of these tools is a public good. That led to creating the dataset I’m releasing today; nutritional data is public record, and its organization and dissemination should be, too.

What’s in the database?

- 5,287 common everyday foods, 3,836 prepared and generic restaurant foods, and 4,182 distinct menu items from ~50 popular US restaurant chains; foods have standardized naming, consistent numeric serving sizes, estimated micronutrient profiles, descriptions, and citations/groundings to USDA, AUSNUT, FRIDA, CNF, etc, when possible.

- 313,442 of the most popular US branded grocery products with standardized naming, parsed serving sizes, and additive/allergen data, grounded in branded USDA data; the most popular 1% have estimated micronutrient data, with the goal of full coverage.

Even the largest commercial databases can be frustrating to work with when searching for foods or customizations without existing coverage. To solve this, I created a real-time version of the same approach used to build the core database that can browse the web to learn about new foods or food customizations if needed (e.g., a highly customized Starbucks order). There is a limited demo on the web, and in-app you can log foods with text search, via barcode scan, or by image, all of which can search the web to import foods for you if needed. Foods discovered via these searches are fed back into the database, and I plan to publish updated versions as coverage expands.

- Search & Explore: https://www.opennutrition.app/search

- Methodology/About: https://www.opennutrition.app/about

- Get the iOS App: https://apps.apple.com/us/app/opennutrition-macro-tracker/id...

- Download the dataset: https://www.opennutrition.app/download

OpenNutrition’s iOS app offers free essential logging and a limited number of agentic searches, plus expenditure tracking and ongoing diet recommendations like best-in-class paid apps. A paid tier ($49/year) unlocks additional searches and features (data backup, prioritized micronutrient coverage for logged foods), and helps fund further development and broader library coverage.

I’d love to hear your feedback, questions, and suggestions—whether it’s about the database itself, a really great/bad search result, or the app.

1. Burke et al., 2011, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3268700/

2. Patel et al., 2019, https://mhealth.jmir.org/2019/2/e12209/

1. yamihere ◴[] No.43570161[source]
>> User-generated databases can be unreliable

>> Foods discovered via these searches are fed back into the database,

Aren’t LLMs also unreliable? How do you ensure the new content is from an authoritative, accurate source? How do you ensure the numbers that make it into the database are actually what the source provided?

According to the Methodology/About page

>> The LLM is tasked with creating complete nutritional values, explicitly explaining the rationale behind each value it generates. Outputs undergo rigorous validation steps,

Those rigorous validation steps were also created with LLMs, correct?

>> whose core innovations leveraged AI but didn’t explicitly market themselves as “AI products.”

Odd choice for an entirely AI based service. First thought I had after reading that was: must be because people don’t trust AI generated information. Seems disengenuous to minimize the AI aspect in marketing while this product only exists because of AI.

Great idea though, thanks for giving it a shot!

replies(2): >>43570320 #>>43570409 #
2. rob ◴[] No.43570320[source]
Not really sure how the author thinks anybody who tracks their calories/macros seriously is going to trust a website that literally just makes up values for the vitamins, minerals, etc:

> TL;DR: They are estimates from giving an LLM (generally o3 mini high due to cost, some o1 preview) a large corpus of grounding data to reason over and asking it to use its general world knowledge to return estimates it was confident in, which, when escalating to better LLMs like o1-pro and manual verification, proved to be good enough that I thought they warranted release.

replies(3): >>43570711 #>>43570741 #>>43571180 #
3. joshdickson ◴[] No.43570409[source]
> Those rigorous validation steps were also created with LLMs, correct?

Not really. I do explain in the methodology post how good o1-pro is at the task, but there was a lot of manual effort involved in coming to that conclusion with my own effort to review the LLM's reasoning, and even still, o1-pro is not perfect.

replies(1): >>43570950 #
4. XorNot ◴[] No.43570711[source]
Also https://world.openfoodfacts.org/ exists, and has an app with everything you'd need. And is just crowd sourcing nutrition labels and barcodes.
replies(1): >>43570775 #
5. joshdickson ◴[] No.43570741[source]
I have tracked my macro intake seriously for years and use the database every day, as do many folks who used the initial app releases. It's actually more valuable to me to have the data in this format, even estimated, because what happens with other apps is you get gaps in macronutrient reporting on things like Omega 3's, and you wonder 'Am I not eating any Omega 3's or does the database containing the food I ate just not include them?'. In that case I'd much rather have an LLM that had access to as much relevant data as I could feed it reason through approximate nutrient distribution and give me the best estimate it could.

Appreciate the feedback!

6. joshdickson ◴[] No.43570775{3}[source]
OpenFoodFacts is a huge inspiration to this project, obviously. However, as someone with a normal diet, OFF lacks:

1. Generic, non-branded foods

2. Simple prepared foods that ease food entry

3. Restaurant foods

4. Micronutrients beyond those reported by the brand.

OFF is a fantastic project but OpenNutrition is really trying to fit a different niche. OFF does what it does very well; I would never be able to use it to track my food intake.

replies(1): >>43572538 #
7. yamihere ◴[] No.43570950[source]
Nice! Thanks for responding.

>> Outputs undergo rigorous validation steps, including cross-checking with advanced auditing models such as OpenAI’s o1-pro, which has proven especially proficient at performing high-quality random audits.

>> there was a lot of manual effort involved in coming to that conclusion with my own effort to review the LLM's reasoning

So, the randomly audited entries seemed reasonable to you – not even the data itself, just the reasoning about the generated data. Did the manual reviews stop once things started looking good enough? Are the audits ongoing, to fill out the rest of the dataset? Would those be manually double-checked as well?

>> I became interested in exploring how recent advances in generative AI could enable entirely new kinds of consumer products—ones whose core innovations leveraged AI but didn’t explicitly market themselves as “AI products.”

Once again: Why not market this as an AI product? This is LLMs all the way down.

People are already interested in using this dataset. I was. Now, LLM generated “usually close enough to not be actively harmful” data is being distributed as a source for any and all to use. I think your disclaimer is excellent. Does your license require an equivalent disclaimer be provided by those using this data?

replies(1): >>43571359 #
8. yamihere ◴[] No.43571180[source]
That’s the best part! People don’t care and won’t check! They’ll just pay money!

Most of the data being close enough to be better than nothing and not actively harmful + a disclaimer and the author is absolved of all responsibility!

Even better, this will now be used in all sorts of other apps, analyses, and for training other LLMs! And I expect all those will also prominently include an “all of this was genereated by an LLM” disclamers. For sure.

9. joshdickson ◴[] No.43571359{3}[source]
> not even the data itself, just the reasoning about the generated data

Poor phrasing on my end -- yes, absolutely the end data as well as the reasoning, as the reasoning tends to include the final answer.

Maybe I should! Appreciate the feedback.

replies(1): >>43571694 #
10. yamihere ◴[] No.43571694{4}[source]
Thanks again. Mine was an uncharitable interpretation, apologies for that. I appreciate your engagement with critical comments without coming off as defensive or snarky.

This looks like a lot of work and good will were poured into it, and I can see how it can be useful to a fitness focused audience.

You control the messaging on the site and in your apps, and you make it clear that this is not authoritative data. Everything built on top of this needs to have the same messaging, but it has probably been ingested into multiple LLMs already.

I think some sort of licensing requirement that the LLM source of this data be prominently disclosed will not keep this from becoming a source of truth for other datasets, products, and services; but, it is still worth the effort. All you can do is all you can do, right?

replies(1): >>43571986 #
11. joshdickson ◴[] No.43571986{5}[source]
The idea of including that requirement in the license is a good idea and I had not considered it, but I will -- frankly my motivations have been more on the citation side of things such that the need for quality disclaimers is not as great. Thank you for the suggestion.
12. teolemon ◴[] No.43572538{4}[source]
Hi Josh: Pierre, Open Food Facts NGO co-founder. 1. Generic, non-branded foods & 2. Simple prepared foods that ease food entry: Those two could be solved in a deterministic way, and we'd be happy for a separate Open Food Facts hosted API endpoint (basically a small backend serving a combination of all national generic databases), or improvement to the core software 3. Restaurant foods - Open Prices (our effort to collect geo-located prices on products) could be an entry point to collect menus, and potentially estimate nutrition for food in restaurants, since we have support for products without barcode. 4. Micronutrients beyond those reported by the brand. - We have an issue to propose approximation of micro-nutrients from reputable database: https://github.com/openfoodfacts/openfoodfacts-server/issues...

We're happy to cover more use-cases, so feel free to join the project and contribute your time/coding skills to help us solve those issues. https://slack.openfoodfacts.org or https://forum.openfoodfacts.org or directly https://github.com/openfoodfacts