←back to thread

295 points AndrewDucker | 5 comments | | HN request time: 0.001s | source
Show context
miki123211 ◴[] No.45045491[source]
Is there even such a thing as a "Mississippi IP?"

I.E. Are US ISPs, particularly big ones like Comcast, required to geolocate ISPs to the state where the person is actually in? What about mobile ones?

Where I live (not US), it is extremely common to get an IP that Maxmind geolocates to a region far from where you actually live.

replies(5): >>45045606 #>>45045616 #>>45046119 #>>45046293 #>>45050727 #
kube-system ◴[] No.45045616[source]
GeoIP services are not 100% accurate, but that doesn't mean they're completely useless.

The law in question requires "commercially reasonable efforts"

replies(2): >>45045851 #>>45047998 #
beefnugs ◴[] No.45045851[source]
Remember that massive surveillance capitalism apparatus that has been created for years? Now everyone must pay for it to legally comply with whatever arbitrary bullshit no matter how expensive the data becomes
replies(2): >>45045869 #>>45046366 #
1. kube-system ◴[] No.45045869[source]
The most popular GeoIP database has a free tier that would easily work for this. And there are many other options.
replies(1): >>45046803 #
2. tzs ◴[] No.45046803[source]
> The most popular GeoIP database has a free tier that would easily work for this

The free tier does have limits on the number of API calls can you can make. But the good news is you don't have to use their API. You can download the database [1] and do all the lookups locally without having to worry about going over their API limits.

It consists of 10 CSV files and is about 45 MB compressed, 380 MB uncompressed. For just identifying US states from IP address you just need 3 of the CSV files: a 207 MB file of IPv4 address information, a 120 MB file for IPv6, and a 6.7 MB file that lets you lookup by an ID that you find in one of the first two the information about the IP address location including state.

It's easy to write a script to turn this into an SQL database that just contains IP ranges and the corresponding state and then use that with sqlite or whatever network database you use internally from any of your stuff that needs this information.

If you don't actually need Geo IP in general and are only adding it in order to block specific states you can easily omit IPs that are not mapped to those states which would make it pretty small. The database has 3.4 million IPv4 address ranges, but only 5 359 of them are listed as being in Mississippi. There are 1.8 million address ranges in the IPv6 file, and 3 946 of them are listed as being in Mississippi.

Here's how to get the Mississippi ranges from the command line, although this is kind of slow--the 3rd line took 7.5 minutes on my M2 Mac Studio and the 4th took almost 4 minutes. A proper script or program would be a lot faster.

  grep ,MS,Mississippi, GeoLite2-City-Locations-en.csv | cut -d , -f 1 > 1
  sed -e s/^/,/ -e s/$/,/ < 1 > 2
  grep -f 2 GeoLite2-City-Blocks-IPv4.csv | cut -d , -f 1 > MS-IP4.txt
  grep -f 2 GeoLite2-City-Blocks-IPv6.csv | cut -d , -f 1 > MS-IP6.txt
Also a proper script or program would be able to look specifically at the correct field when matching the ID from the locations file to the IP range lines. The commands above just hope that things that look like location IDs don't occur in other fields in the IP range files.

  [1] URL=https://download.maxmind.com/geoip/databases/GeoLite2-City/download?suffix=tar.gz
      curl -L -u userid:license_key $URL > db.tar.gz
replies(2): >>45048237 #>>45050742 #
3. kube-system ◴[] No.45048237[source]
My comment said "database" and not "API" :)

Also there is no need to spend time parsing it yourself, there are plenty of existing libraries you can simply point at the file.

replies(1): >>45058078 #
4. burnt-resistor ◴[] No.45050742[source]
Exactly. No need to pay money to someone else for what's available for free(mium).
5. tzs ◴[] No.45058078{3}[source]
> My comment said "database" and not "API" :)

Sure, but it is quite common for companies offering database access to offer that via an API to query the database on their server rather than letting customers download the whole database for local use.

It thus seemed worthwhile to make it clear that MaxMind does let you download the whole database.

As far as libraries go sure that's a possibility. But for people who just need a simple IP to country or IP to US state lookup and need that from a variety of languages it may be overall less annoying to make your own DB from the CSV files that just handles what you need and nothing more.

I've already got libraries for sqlite, MySQL, or both for every language I use where I need to do these lookups, and almost all the applications that need these lookups are already connecting to our databases. Add an IPv4_to_Country table to that database (that they are already using) and then it just a matter of doing a "select country_code from IPv4_to_Country where ip_low <= ? and ? <= ip_high" with the two '?'s replaced with the IP address we want to lookup, probably using a DB handle they already have open.

Many would find that a lot easier than adding a dependency on a 3rd party GeoIP library. Beside it being one more thing on each machine that needs periodic updating (or rather N more if you are working in N different languages on that machine), I believe that most of these libraries require you to have a copy of the download database on the local machine, so that's another thing you have to keep up to date on every server.

With the "make your own simple SQL DB" approach you just have to keep an up to date download from MaxMind on the machine that builds your SQL DB. After building the SQL DB you then just have to upload it to your one network DB server (e.g., your MySQL server) and all your apps that query that DB are up to date no matter what server they are on or what language they are in.

If you are building an sqlite DB for some of your apps, you do have to copy that to all the servers that contain such apps, so you don't totally escape having to do updates on those machines.

If making the SQL DB were hard then maybe reducing dependencies and reducing the number of things that need updates might not be worth it, but the CSV files are organized very sensibly. The scripts to make the SQL DBs are close to trivial if you've got a decent CSV parser in the language you are writing them in.