Most active commenters
  • kube-system(5)
  • gruez(5)
  • TGower(5)
  • miki123211(4)

←back to thread

295 points AndrewDucker | 23 comments | | HN request time: 1.36s | source | bottom
Show context
miki123211 ◴[] No.45045491[source]
Is there even such a thing as a "Mississippi IP?"

I.E. Are US ISPs, particularly big ones like Comcast, required to geolocate ISPs to the state where the person is actually in? What about mobile ones?

Where I live (not US), it is extremely common to get an IP that Maxmind geolocates to a region far from where you actually live.

replies(5): >>45045606 #>>45045616 #>>45046119 #>>45046293 #>>45050727 #
kube-system ◴[] No.45045616[source]
GeoIP services are not 100% accurate, but that doesn't mean they're completely useless.

The law in question requires "commercially reasonable efforts"

replies(2): >>45045851 #>>45047998 #
1. beefnugs ◴[] No.45045851[source]
Remember that massive surveillance capitalism apparatus that has been created for years? Now everyone must pay for it to legally comply with whatever arbitrary bullshit no matter how expensive the data becomes
replies(2): >>45045869 #>>45046366 #
2. kube-system ◴[] No.45045869[source]
The most popular GeoIP database has a free tier that would easily work for this. And there are many other options.
replies(1): >>45046803 #
3. gruez ◴[] No.45046366[source]
>Remember that massive surveillance capitalism apparatus that has been created for years? Now everyone must pay for it to legally comply with whatever arbitrary bullshit

Calling geoip databases "surveillance capitalism" seems like a stretch. It might be used by "surveillance capitalism", but you don't really have to surveil people to build a geoip database, only scrape RIR allocation records (all public, btw) and BGP routes, do ping tests, and parse geofeeds provided by providers. None of that is "surveillance capitalism" in any meaningful sense.

replies(2): >>45046455 #>>45048399 #
4. TGower ◴[] No.45046455[source]
If selling the physical location information of users isn't surveillance capatalism, then the term doesn't mean anything. "We don't surveil people, we just try to find out where they live and sell that data"
replies(1): >>45046497 #
5. gruez ◴[] No.45046497{3}[source]
If that's "surveillance capitalism", what's your opinion on databases that map phone numbers to locations? eg. when you get a phone call from 217-555-1234, and it shows "Springfield, IL"? Is that "surveillance capitalism"? That's basically all geoip databases are. Moreover there's plenty of non "surveillance capitalism" uses for geoip that make it questionable to call it "surveillance capitalism". Determining the region for a site, or automatically selecting the closest store, for instance. Before the advent of anycast CDNs, it was also basically the only way to route your visitors to the closest server.
replies(1): >>45046732 #
6. TGower ◴[] No.45046732{4}[source]
Is there a single company out there making it's money selling access to an area code database? GeoIP databases are much higher resolution and use active scanning methods like ping timing. If a company was spam calling me to estimate distance based on call connection lag, yes that would be surveillance capitalism.
replies(4): >>45046807 #>>45048270 #>>45048809 #>>45049140 #
7. tzs ◴[] No.45046803[source]
> The most popular GeoIP database has a free tier that would easily work for this

The free tier does have limits on the number of API calls can you can make. But the good news is you don't have to use their API. You can download the database [1] and do all the lookups locally without having to worry about going over their API limits.

It consists of 10 CSV files and is about 45 MB compressed, 380 MB uncompressed. For just identifying US states from IP address you just need 3 of the CSV files: a 207 MB file of IPv4 address information, a 120 MB file for IPv6, and a 6.7 MB file that lets you lookup by an ID that you find in one of the first two the information about the IP address location including state.

It's easy to write a script to turn this into an SQL database that just contains IP ranges and the corresponding state and then use that with sqlite or whatever network database you use internally from any of your stuff that needs this information.

If you don't actually need Geo IP in general and are only adding it in order to block specific states you can easily omit IPs that are not mapped to those states which would make it pretty small. The database has 3.4 million IPv4 address ranges, but only 5 359 of them are listed as being in Mississippi. There are 1.8 million address ranges in the IPv6 file, and 3 946 of them are listed as being in Mississippi.

Here's how to get the Mississippi ranges from the command line, although this is kind of slow--the 3rd line took 7.5 minutes on my M2 Mac Studio and the 4th took almost 4 minutes. A proper script or program would be a lot faster.

  grep ,MS,Mississippi, GeoLite2-City-Locations-en.csv | cut -d , -f 1 > 1
  sed -e s/^/,/ -e s/$/,/ < 1 > 2
  grep -f 2 GeoLite2-City-Blocks-IPv4.csv | cut -d , -f 1 > MS-IP4.txt
  grep -f 2 GeoLite2-City-Blocks-IPv6.csv | cut -d , -f 1 > MS-IP6.txt
Also a proper script or program would be able to look specifically at the correct field when matching the ID from the locations file to the IP range lines. The commands above just hope that things that look like location IDs don't occur in other fields in the IP range files.

  [1] URL=https://download.maxmind.com/geoip/databases/GeoLite2-City/download?suffix=tar.gz
      curl -L -u userid:license_key $URL > db.tar.gz
replies(2): >>45048237 #>>45050742 #
8. gruez ◴[] No.45046807{5}[source]
>Is there a single company out there making it's money selling access to an area code database?

So if someone is making money off of it it's suddenly "surveillance capitalism"? What makes it more or less "surveillance capitalism" compared to aws selling cloudfront to some ad company?

Moreover you can do better than area level code granularity. When landlines were more common and local number portability wasn't really a thing, can look at the CO number (second group) to figure out which town or neighborhood a phone number was from. Even if this was all information you could theoretically determine yourself, I'm sure there are companies that package up the data in a nice database for companies to use. In that case is that "surveillance capitalism"? Where's the "surveillance" aspect? It's not like you need to stalk anyone to figure out where a CO is located. That was just a property of the phone network.

>GeoIP databases are much higher resolution and use active scanning methods like ping timing. If a company was spam calling me to estimate distance based on call connection lag, yes that would be surveillance capitalism.

Why is the fact it's "active" or not a relevant factor in determining whether it's "surveillance capitalism" or not? Moreover spam calling people might be bad for other reasons, but it's not exactly "surveillance".

replies(1): >>45046865 #
9. TGower ◴[] No.45046865{6}[source]
Surveillance definition "Systematic observation of places and people by visual, aural, electronic, photographic or other means." If you are pinging someone's IP to determine their physical location, you are engaged in a form of surveillance. If you have a copy of the table of area codes to city mapping, you are not engaged in surviellance. If you aren't trying to make money, you are not engaged in capitallism.
replies(1): >>45047181 #
10. gruez ◴[] No.45047181{7}[source]
>Surveillance definition "Systematic observation of places and people by visual, aural, electronic, photographic or other means." If you are pinging someone's IP to determine their physical location, you are engaged in a form of surveillance.

Setting aside the problem with pinging home IPs (most home routers have ICMP echo requests disabled), your definition of "systematic observation" seems very flimsy. Is monitoring the global BGP routing table "systematic observation"? What about scraping RIR records? How is sending ICMP echo requests and observing the response times meaningfully similar to what google et al are doing? I doubt many people are upset about google "systematically observing"... the contents of books (for google books), or the layout of cities (for google maps, ignoring streetview). They're upset about google building dossiers on people. Observing the locations of groups of IP addresses (I'm not aware of any geoip products that can deanonymize specific IP addresses) seems very divorced from that, such that any attempts at equating the two because "systematic observation" is non-nonsensical.

replies(1): >>45047280 #
11. TGower ◴[] No.45047280{8}[source]
It seems like you missed the specifier "of places and people". Books are not people or places, but an IP addresses at any point in time is tied to either a specific person or place.

> They're upset about google building dossiers on people.

Their location being in that dossier is part of what upsets people.

replies(1): >>45047318 #
12. gruez ◴[] No.45047318{9}[source]
>but an IP addresses at any point in time is tied to either a specific person or place.

Except I'm not aware of any geoip databases that operate on a per-IP level. It's way too noisy, given that basically everyone uses dynamic IP addresses. At best you can figure out a given /24 is used by a given ISP to cover a certain neighborhood, not that 1.2.3.4 belongs is John Smith or 742 Evergreen Terrace.

replies(2): >>45047439 #>>45049175 #
13. TGower ◴[] No.45047439{10}[source]
Good to know, that does shift my opinion a bit. There is a spectrum from surveilling individuals to gathering population statistics. I'm not sure exactly where data that identifies a user to a group size of ~250 falls, especially given the geographic correlation, but it's definitely better.
14. kube-system ◴[] No.45048237{3}[source]
My comment said "database" and not "API" :)

Also there is no need to spend time parsing it yourself, there are plenty of existing libraries you can simply point at the file.

replies(1): >>45058078 #
15. kube-system ◴[] No.45048270{5}[source]
There are companies out there making money selling any kind of data you can imagine. A quick search shows dozens of companies offering this data for sale.
16. lmm ◴[] No.45048399[source]
> Calling geoip databases "surveillance capitalism" seems like a stretch. It might be used by "surveillance capitalism", but you don't really have to surveil people to build a geoip database, only scrape RIR allocation records (all public, btw) and BGP routes, do ping tests, and parse geofeeds provided by providers. None of that is "surveillance capitalism" in any meaningful sense.

How is it not? Most "normal" surveillance works the same way - you look up public records for the person you're going after, cross-reference them against each other somehow, and eventually find enough dirt on them or give up. This is surveillance, and it's being done by and in the interests of capitalism.

17. toast0 ◴[] No.45048809{5}[source]
> Is there a single company out there making it's money selling access to an area code database? GeoIP databases are much higher resolution and use active scanning methods like ping timing. If a company was spam calling me to estimate distance based on call connection lag, yes that would be surveillance capitalism.

Phone number assignments are mostly public, you don't really need to pay for this information, but there are certainly those who will sell it to you.

Of course, phone numbers don't really tie you to a rate center anymore, but a rate center is often much more geographically specific than an address for a large ISP. What I've seen near me, is a rate center often ties the number to a specific community. Larger cities often have several rate centers, smaller cities may have their own or several small cities may have one. Of course, phone company wiring tends to ignore municipal boundaries.

On the other hand, most large ISPs tend to use a single IP pool for a metro area. Not all large providers do it that way, of course, and larger metro areas may be subdivided. You can't really ping time your way to better data there either, most of the last mile technology adds enough latency that you can't tell if the customer is near the aggregation point or far.

18. miki123211 ◴[] No.45049140{5}[source]
Not really, but there are companies making their money selling a mapping of phone numbers to real names[1].

It's an uniquely American thing (Canada does it too, but access is regulated much more tightly).

This one[2] I could get reliable results from for free, but it seems to be "under maintenance" right now. Twillio just offers it as a service at 1 cent per number.

[1] https://en.wikipedia.org/wiki/CNAM [2] https://www.sent.dm/resources/phone-lookup

replies(1): >>45051164 #
19. miki123211 ◴[] No.45049175{10}[source]
Google does it I think?

At least in some cases, e.g. when multiple devices that are logged into their respective Google accounts are using that IP, and Google knows what location those usually reside at when together.

I've had Google pop up reliable location results for me, to the granularity of a small town, even if they had no information about me specifically to help them deduce this. It doesn't always happen though.

20. burnt-resistor ◴[] No.45050742{3}[source]
Exactly. No need to pay money to someone else for what's available for free(mium).
21. kube-system ◴[] No.45051164{6}[source]
I'm surprised this is notable these days, because the mapping of numbers to names used to be a completely free service that was dropped off on everyone's front porch.

https://en.wikipedia.org/wiki/Telephone_directory

replies(1): >>45061867 #
22. tzs ◴[] No.45058078{4}[source]
> My comment said "database" and not "API" :)

Sure, but it is quite common for companies offering database access to offer that via an API to query the database on their server rather than letting customers download the whole database for local use.

It thus seemed worthwhile to make it clear that MaxMind does let you download the whole database.

As far as libraries go sure that's a possibility. But for people who just need a simple IP to country or IP to US state lookup and need that from a variety of languages it may be overall less annoying to make your own DB from the CSV files that just handles what you need and nothing more.

I've already got libraries for sqlite, MySQL, or both for every language I use where I need to do these lookups, and almost all the applications that need these lookups are already connecting to our databases. Add an IPv4_to_Country table to that database (that they are already using) and then it just a matter of doing a "select country_code from IPv4_to_Country where ip_low <= ? and ? <= ip_high" with the two '?'s replaced with the IP address we want to lookup, probably using a DB handle they already have open.

Many would find that a lot easier than adding a dependency on a 3rd party GeoIP library. Beside it being one more thing on each machine that needs periodic updating (or rather N more if you are working in N different languages on that machine), I believe that most of these libraries require you to have a copy of the download database on the local machine, so that's another thing you have to keep up to date on every server.

With the "make your own simple SQL DB" approach you just have to keep an up to date download from MaxMind on the machine that builds your SQL DB. After building the SQL DB you then just have to upload it to your one network DB server (e.g., your MySQL server) and all your apps that query that DB are up to date no matter what server they are on or what language they are in.

If you are building an sqlite DB for some of your apps, you do have to copy that to all the servers that contain such apps, so you don't totally escape having to do updates on those machines.

If making the SQL DB were hard then maybe reducing dependencies and reducing the number of things that need updates might not be worth it, but the CSV files are organized very sensibly. The scripts to make the SQL DBs are close to trivial if you've got a decent CSV parser in the language you are writing them in.

23. miki123211 ◴[] No.45061867{7}[source]
Phone directories were one of these weird "one way" services.

In principle, you could use them to map numbers to names, but the way they were designed, it was a lot more effort than using them to map names to numbers. That was deliberate I think.