Libpostal: C library for parsing/normalizing street addresses around the world

1. kerkeslager ◴[09 Jul 25 09:33 UTC] No.44507929[source]▶

I think fundamentally, no parsing/normalizing library can be effective for addresses. A much better approach is to have a search library which finds the address you're looking for within a dataset of all the addresses in the world.

Addresses are fundamentally unstructured data. You can't validate them structurally. It's trivial to create nonexistent addresses which any parsing library will parse just fine. On the flipside, there's enough variety in real addresses that your parser has to be extremely tolerant in what it accepts--so tolerant that it basically tolerates everything. The entire purpose of a parser for addresses is to reject invalid addresses, so if your parser tolerates everything it's pointless.

The only validation that makes any sense is "does this address exist in the real world?". And the way to do that is not parsing, it's by comparing to a dataset of all the addresses in the world.

I haven't evaluated this project enough to understand confidently what they're doing, but I hope they're approaching this as a search engine for address datasets, and not as a parsing/normalizing library.

replies(2): >>44508524 #>>44510660 #

2. vidarh ◴[09 Jul 25 11:06 UTC] No.44508524[source]▶

>>44507929 (TP) #

And keeping such datasets up to date is another matter entirely, because clearly a lot of companies rely datasets that were outdated before their company even existed.

A trivially simple example of just how messy this is when people try to constrain it is that it's nearly random whether or not a given carrier would insist on me giving an incorrect address for my previous place, seemingly because traditionally and prior to 1965 the address was in Surrey, England.

The "postcode area name" for my old house is Croydon, and Croydon has legally been in London since 1965, and was allocated it's own postcode area in 1966. "Surrey" hasn't been correct for addresses in Croydon since then.

But at least one delivery company insisted my old address was invalid unless I changed the town/postcode area to "Surrey", and refused to even attempt a delivery. Never mind they had my house number and postcode, which was sufficient to uniquely identify my house.

3. derdi ◴[09 Jul 25 14:44 UTC] No.44510660[source]▶

>>44507929 (TP) #

> real world [...] dataset

You are equating two things that are not equatable.