Most active commenters
  • kube-system(7)
  • miki123211(5)
  • gruez(5)
  • TGower(5)
  • cwbriscoe(3)

←back to thread

295 points AndrewDucker | 43 comments | | HN request time: 2.861s | source | bottom
1. miki123211 ◴[] No.45045491[source]
Is there even such a thing as a "Mississippi IP?"

I.E. Are US ISPs, particularly big ones like Comcast, required to geolocate ISPs to the state where the person is actually in? What about mobile ones?

Where I live (not US), it is extremely common to get an IP that Maxmind geolocates to a region far from where you actually live.

replies(5): >>45045606 #>>45045616 #>>45046119 #>>45046293 #>>45050727 #
2. estimator7292 ◴[] No.45045606[source]
You pretty much just plug the IP into a geolocating API and hope. There's nothing else to do. Any collateral damage is on the legislation, not any individual site or admin.

As you say, IP geolocation is unreliable. Unfortunately that's the only option. If it is technologically impossible to comply with the law, you just gotta do the best you can. If someone in MI gets a weird IP, there's absolutely nothing any third party can do. That's on the ISP for not allocating an appropriate IP or the legislators for being morons.

replies(1): >>45045637 #
3. kube-system ◴[] No.45045616[source]
GeoIP services are not 100% accurate, but that doesn't mean they're completely useless.

The law in question requires "commercially reasonable efforts"

replies(2): >>45045851 #>>45047998 #
4. selimthegrim ◴[] No.45045637[source]
MI is Michigan.
replies(1): >>45045682 #
5. phinnaeus ◴[] No.45045682{3}[source]
Right, they might get an MS IP and be blocked :P
6. beefnugs ◴[] No.45045851[source]
Remember that massive surveillance capitalism apparatus that has been created for years? Now everyone must pay for it to legally comply with whatever arbitrary bullshit no matter how expensive the data becomes
replies(2): >>45045869 #>>45046366 #
7. kube-system ◴[] No.45045869{3}[source]
The most popular GeoIP database has a free tier that would easily work for this. And there are many other options.
replies(1): >>45046803 #
8. tallytarik ◴[] No.45046119[source]
ISPs have no obligation, although the ubiquity of sites and apps relying on IP geolocation mean that ISPs are incentivized to provide correct info these days.

I run a geolocation service, and over the years we've seen more and more ISPs providing official geofeeds. The majority of medium-large ISPs in the US now provide a geofeed, for example. But there's still an ongoing problem in geofeeds being up-to-date, and users being assigned to a correct 'pool' etc.

Mobile IPs are similar but are still certainly the most difficult (relative lack of geofeeds or other accurate data across providers)

replies(1): >>45049207 #
9. cwbriscoe ◴[] No.45046293[source]
I live in Vancouver, WA and my IP comes back to Portland, OR.
replies(1): >>45047139 #
10. gruez ◴[] No.45046366{3}[source]
>Remember that massive surveillance capitalism apparatus that has been created for years? Now everyone must pay for it to legally comply with whatever arbitrary bullshit

Calling geoip databases "surveillance capitalism" seems like a stretch. It might be used by "surveillance capitalism", but you don't really have to surveil people to build a geoip database, only scrape RIR allocation records (all public, btw) and BGP routes, do ping tests, and parse geofeeds provided by providers. None of that is "surveillance capitalism" in any meaningful sense.

replies(2): >>45046455 #>>45048399 #
11. TGower ◴[] No.45046455{4}[source]
If selling the physical location information of users isn't surveillance capatalism, then the term doesn't mean anything. "We don't surveil people, we just try to find out where they live and sell that data"
replies(1): >>45046497 #
12. gruez ◴[] No.45046497{5}[source]
If that's "surveillance capitalism", what's your opinion on databases that map phone numbers to locations? eg. when you get a phone call from 217-555-1234, and it shows "Springfield, IL"? Is that "surveillance capitalism"? That's basically all geoip databases are. Moreover there's plenty of non "surveillance capitalism" uses for geoip that make it questionable to call it "surveillance capitalism". Determining the region for a site, or automatically selecting the closest store, for instance. Before the advent of anycast CDNs, it was also basically the only way to route your visitors to the closest server.
replies(1): >>45046732 #
13. TGower ◴[] No.45046732{6}[source]
Is there a single company out there making it's money selling access to an area code database? GeoIP databases are much higher resolution and use active scanning methods like ping timing. If a company was spam calling me to estimate distance based on call connection lag, yes that would be surveillance capitalism.
replies(4): >>45046807 #>>45048270 #>>45048809 #>>45049140 #
14. tzs ◴[] No.45046803{4}[source]
> The most popular GeoIP database has a free tier that would easily work for this

The free tier does have limits on the number of API calls can you can make. But the good news is you don't have to use their API. You can download the database [1] and do all the lookups locally without having to worry about going over their API limits.

It consists of 10 CSV files and is about 45 MB compressed, 380 MB uncompressed. For just identifying US states from IP address you just need 3 of the CSV files: a 207 MB file of IPv4 address information, a 120 MB file for IPv6, and a 6.7 MB file that lets you lookup by an ID that you find in one of the first two the information about the IP address location including state.

It's easy to write a script to turn this into an SQL database that just contains IP ranges and the corresponding state and then use that with sqlite or whatever network database you use internally from any of your stuff that needs this information.

If you don't actually need Geo IP in general and are only adding it in order to block specific states you can easily omit IPs that are not mapped to those states which would make it pretty small. The database has 3.4 million IPv4 address ranges, but only 5 359 of them are listed as being in Mississippi. There are 1.8 million address ranges in the IPv6 file, and 3 946 of them are listed as being in Mississippi.

Here's how to get the Mississippi ranges from the command line, although this is kind of slow--the 3rd line took 7.5 minutes on my M2 Mac Studio and the 4th took almost 4 minutes. A proper script or program would be a lot faster.

  grep ,MS,Mississippi, GeoLite2-City-Locations-en.csv | cut -d , -f 1 > 1
  sed -e s/^/,/ -e s/$/,/ < 1 > 2
  grep -f 2 GeoLite2-City-Blocks-IPv4.csv | cut -d , -f 1 > MS-IP4.txt
  grep -f 2 GeoLite2-City-Blocks-IPv6.csv | cut -d , -f 1 > MS-IP6.txt
Also a proper script or program would be able to look specifically at the correct field when matching the ID from the locations file to the IP range lines. The commands above just hope that things that look like location IDs don't occur in other fields in the IP range files.

  [1] URL=https://download.maxmind.com/geoip/databases/GeoLite2-City/download?suffix=tar.gz
      curl -L -u userid:license_key $URL > db.tar.gz
replies(2): >>45048237 #>>45050742 #
15. gruez ◴[] No.45046807{7}[source]
>Is there a single company out there making it's money selling access to an area code database?

So if someone is making money off of it it's suddenly "surveillance capitalism"? What makes it more or less "surveillance capitalism" compared to aws selling cloudfront to some ad company?

Moreover you can do better than area level code granularity. When landlines were more common and local number portability wasn't really a thing, can look at the CO number (second group) to figure out which town or neighborhood a phone number was from. Even if this was all information you could theoretically determine yourself, I'm sure there are companies that package up the data in a nice database for companies to use. In that case is that "surveillance capitalism"? Where's the "surveillance" aspect? It's not like you need to stalk anyone to figure out where a CO is located. That was just a property of the phone network.

>GeoIP databases are much higher resolution and use active scanning methods like ping timing. If a company was spam calling me to estimate distance based on call connection lag, yes that would be surveillance capitalism.

Why is the fact it's "active" or not a relevant factor in determining whether it's "surveillance capitalism" or not? Moreover spam calling people might be bad for other reasons, but it's not exactly "surveillance".

replies(1): >>45046865 #
16. TGower ◴[] No.45046865{8}[source]
Surveillance definition "Systematic observation of places and people by visual, aural, electronic, photographic or other means." If you are pinging someone's IP to determine their physical location, you are engaged in a form of surveillance. If you have a copy of the table of area codes to city mapping, you are not engaged in surviellance. If you aren't trying to make money, you are not engaged in capitallism.
replies(1): >>45047181 #
17. brewdad ◴[] No.45047139[source]
Vancouver residents may as well be Oregonians anyway. Most of them are paying OR income tax. They do most of their shopping and entertainment in Oregon too.
replies(1): >>45047170 #
18. cwbriscoe ◴[] No.45047170{3}[source]
I work for a Portland company at home in Vancouver so I get to skip their income tax. It's a 10-15 minute drive to the PDX area where there is a Best Buy, Ikea and other stores where I can easily skip sales if I want to.
replies(1): >>45048429 #
19. gruez ◴[] No.45047181{9}[source]
>Surveillance definition "Systematic observation of places and people by visual, aural, electronic, photographic or other means." If you are pinging someone's IP to determine their physical location, you are engaged in a form of surveillance.

Setting aside the problem with pinging home IPs (most home routers have ICMP echo requests disabled), your definition of "systematic observation" seems very flimsy. Is monitoring the global BGP routing table "systematic observation"? What about scraping RIR records? How is sending ICMP echo requests and observing the response times meaningfully similar to what google et al are doing? I doubt many people are upset about google "systematically observing"... the contents of books (for google books), or the layout of cities (for google maps, ignoring streetview). They're upset about google building dossiers on people. Observing the locations of groups of IP addresses (I'm not aware of any geoip products that can deanonymize specific IP addresses) seems very divorced from that, such that any attempts at equating the two because "systematic observation" is non-nonsensical.

replies(1): >>45047280 #
20. TGower ◴[] No.45047280{10}[source]
It seems like you missed the specifier "of places and people". Books are not people or places, but an IP addresses at any point in time is tied to either a specific person or place.

> They're upset about google building dossiers on people.

Their location being in that dossier is part of what upsets people.

replies(1): >>45047318 #
21. gruez ◴[] No.45047318{11}[source]
>but an IP addresses at any point in time is tied to either a specific person or place.

Except I'm not aware of any geoip databases that operate on a per-IP level. It's way too noisy, given that basically everyone uses dynamic IP addresses. At best you can figure out a given /24 is used by a given ISP to cover a certain neighborhood, not that 1.2.3.4 belongs is John Smith or 742 Evergreen Terrace.

replies(2): >>45047439 #>>45049175 #
22. TGower ◴[] No.45047439{12}[source]
Good to know, that does shift my opinion a bit. There is a spectrum from surveilling individuals to gathering population statistics. I'm not sure exactly where data that identifies a user to a group size of ~250 falls, especially given the geographic correlation, but it's definitely better.
23. Falkon1313 ◴[] No.45047998[source]
I wonder what is a "commercially reasonable effort" for a non-commercial website to collect, accurately verify, and securely store everyone's identity, location, and age?

Personally I'd say none at all, unless the government itself provides it as a free service, takes on all the liability, and makes it simple to use.

It also defines personally identifiable information as including "pseudonymous information when the information is used by a controller or processor in conjunction with additional information that reasonably links the information to an identified or identifiable individual." But it doesn't specify what it means by 'controller' or 'processor' either.

If a hobbyist just sets up a forum site, with no payment processor and no identified or identifiable information required, it would seem reasonable that the law should not apply. But I'm not a lawyer.

Clearly, however, attempting to comply with the law just in case, by requiring ID, would however then make it applicable, since that is personally identifiable information.

replies(2): >>45048480 #>>45053402 #
24. kube-system ◴[] No.45048237{5}[source]
My comment said "database" and not "API" :)

Also there is no need to spend time parsing it yourself, there are plenty of existing libraries you can simply point at the file.

replies(1): >>45058078 #
25. kube-system ◴[] No.45048270{7}[source]
There are companies out there making money selling any kind of data you can imagine. A quick search shows dozens of companies offering this data for sale.
26. lmm ◴[] No.45048399{4}[source]
> Calling geoip databases "surveillance capitalism" seems like a stretch. It might be used by "surveillance capitalism", but you don't really have to surveil people to build a geoip database, only scrape RIR allocation records (all public, btw) and BGP routes, do ping tests, and parse geofeeds provided by providers. None of that is "surveillance capitalism" in any meaningful sense.

How is it not? Most "normal" surveillance works the same way - you look up public records for the person you're going after, cross-reference them against each other somehow, and eventually find enough dirt on them or give up. This is surveillance, and it's being done by and in the interests of capitalism.

27. hellojesus ◴[] No.45048429{4}[source]
You live the best life; basically only federal income taxes.

The 20 min further south than you I live costs me over $30k/year.

replies(1): >>45048577 #
28. kube-system ◴[] No.45048480{3}[source]
> I wonder what is a "commercially reasonable effort" for a non-commercial website to collect, accurately verify, and securely store everyone's identity, location, and age?

> Personally I'd say none at all, unless the government itself provides it as a free service, takes on all the liability, and makes it simple to use.

1. There are many commercial services that do identity verification. There are many other commercial websites that have tools to do identity verification themselves. There are industry published best practices for these types of activities. All of these are evidence that you could use to demonstrate how you are making a commercially reasonable effort.

2. It's completely irrelevant whether you consider your website "commercial" or not. The law defines which websites it applies to, based on the activities they engage in.

https://law.justia.com/codes/mississippi/title-45/chapter-38...

3. Since when does the government have to give you compliance tools for free in order to require something of you? This isn't the standard for anything anywhere. Compliance with the law is often quite expensive. Honestly, buying an identity verification service is pretty cheap in the spectrum of compliance costs.

> If a hobbyist just sets up a forum site, with no payment processor and no identified or identifiable information required, it would seem reasonable that the law should not apply. But I'm not a lawyer.

You don't have to guess whether or not this is reasonable or not. If you read the law, you'll see that it says it only applies to sites that collect personally identifiable information.

From the above link, again:

> "Digital service" means a website, an application, a program, or software that collects or processes personal identifying information with Internet connectivity.

> "Personal identifying information" means any information, including sensitive information, that is linked or reasonably linkable to an identified or identifiable individual. The term includes pseudonymous information when the information is used by a controller or processor in conjunction with additional information that reasonably links the information to an identified or identifiable individual. The term does not include deidentified information or publicly available information.

replies(1): >>45049292 #
29. cwbriscoe ◴[] No.45048577{5}[source]
Yeah, I was working from home anyway so it just made sense to move. The money I saved paid my rent fully and then some.
30. toast0 ◴[] No.45048809{7}[source]
> Is there a single company out there making it's money selling access to an area code database? GeoIP databases are much higher resolution and use active scanning methods like ping timing. If a company was spam calling me to estimate distance based on call connection lag, yes that would be surveillance capitalism.

Phone number assignments are mostly public, you don't really need to pay for this information, but there are certainly those who will sell it to you.

Of course, phone numbers don't really tie you to a rate center anymore, but a rate center is often much more geographically specific than an address for a large ISP. What I've seen near me, is a rate center often ties the number to a specific community. Larger cities often have several rate centers, smaller cities may have their own or several small cities may have one. Of course, phone company wiring tends to ignore municipal boundaries.

On the other hand, most large ISPs tend to use a single IP pool for a metro area. Not all large providers do it that way, of course, and larger metro areas may be subdivided. You can't really ping time your way to better data there either, most of the last mile technology adds enough latency that you can't tell if the customer is near the aggregation point or far.

31. miki123211 ◴[] No.45049140{7}[source]
Not really, but there are companies making their money selling a mapping of phone numbers to real names[1].

It's an uniquely American thing (Canada does it too, but access is regulated much more tightly).

This one[2] I could get reliable results from for free, but it seems to be "under maintenance" right now. Twillio just offers it as a service at 1 cent per number.

[1] https://en.wikipedia.org/wiki/CNAM [2] https://www.sent.dm/resources/phone-lookup

replies(1): >>45051164 #
32. miki123211 ◴[] No.45049175{12}[source]
Google does it I think?

At least in some cases, e.g. when multiple devices that are logged into their respective Google accounts are using that IP, and Google knows what location those usually reside at when together.

I've had Google pop up reliable location results for me, to the granularity of a small town, even if they had no information about me specifically to help them deduce this. It doesn't always happen though.

33. miki123211 ◴[] No.45049207[source]
Mobile IPs reflect the user's "registered area" at best, not their actual location.

This is mostly because of how APNs / G-GNS / P-GW systems work. E.G. you may have an APN that puts you straight in a corporate network, and the mobile network needs you to keep using that APN when roaming. This is why your roaming IP is usually in the country you're from, not the one you're currently in.

I've heard of local breakout being possible, but never actually seen it in practice.

34. integralid ◴[] No.45049292{4}[source]
Worth noting that email is (or rather, may be) a PII, so having a comment box means you're processing PII.
replies(1): >>45051142 #
35. burnt-resistor ◴[] No.45050727[source]
Required? Not sure, but probably not, but they do so to monetize your metadata and provide hints to websites so they show language- and country-localized "local" versions of websites before/instead-of/as-a-fallback-to requesting location permissions.
36. burnt-resistor ◴[] No.45050742{5}[source]
Exactly. No need to pay money to someone else for what's available for free(mium).
37. kube-system ◴[] No.45051142{5}[source]
Comments don't necessarily require email.
replies(1): >>45056770 #
38. kube-system ◴[] No.45051164{8}[source]
I'm surprised this is notable these days, because the mapping of numbers to names used to be a completely free service that was dropped off on everyone's front porch.

https://en.wikipedia.org/wiki/Telephone_directory

replies(1): >>45061867 #
39. thmsths ◴[] No.45053402{3}[source]
Completely agree. If someone starts, says a whiskey tasting club, they can easily weed out minors by checking for a government issued ID at the door. It is free, scalable and provided by the government. If the government want hobbyists to do age verification online then they should provide a solution that is 100% free AND easy to implement.
replies(1): >>45053965 #
40. mulmen ◴[] No.45053965{4}[source]
Government IDs are not free or scalable. You need someone to check them. They also cost money to obtain.

You have conceded that sites with user-generated content should be age restricted. The question for the court is if a state can pass a law making that requirement.

41. fc417fc802 ◴[] No.45056770{6}[source]
If this law singlehandedly manages to get registration emails replaced with an anonymous messaging alternative that would be quite the unexpected win.
42. tzs ◴[] No.45058078{6}[source]
> My comment said "database" and not "API" :)

Sure, but it is quite common for companies offering database access to offer that via an API to query the database on their server rather than letting customers download the whole database for local use.

It thus seemed worthwhile to make it clear that MaxMind does let you download the whole database.

As far as libraries go sure that's a possibility. But for people who just need a simple IP to country or IP to US state lookup and need that from a variety of languages it may be overall less annoying to make your own DB from the CSV files that just handles what you need and nothing more.

I've already got libraries for sqlite, MySQL, or both for every language I use where I need to do these lookups, and almost all the applications that need these lookups are already connecting to our databases. Add an IPv4_to_Country table to that database (that they are already using) and then it just a matter of doing a "select country_code from IPv4_to_Country where ip_low <= ? and ? <= ip_high" with the two '?'s replaced with the IP address we want to lookup, probably using a DB handle they already have open.

Many would find that a lot easier than adding a dependency on a 3rd party GeoIP library. Beside it being one more thing on each machine that needs periodic updating (or rather N more if you are working in N different languages on that machine), I believe that most of these libraries require you to have a copy of the download database on the local machine, so that's another thing you have to keep up to date on every server.

With the "make your own simple SQL DB" approach you just have to keep an up to date download from MaxMind on the machine that builds your SQL DB. After building the SQL DB you then just have to upload it to your one network DB server (e.g., your MySQL server) and all your apps that query that DB are up to date no matter what server they are on or what language they are in.

If you are building an sqlite DB for some of your apps, you do have to copy that to all the servers that contain such apps, so you don't totally escape having to do updates on those machines.

If making the SQL DB were hard then maybe reducing dependencies and reducing the number of things that need updates might not be worth it, but the CSV files are organized very sensibly. The scripts to make the SQL DBs are close to trivial if you've got a decent CSV parser in the language you are writing them in.

43. miki123211 ◴[] No.45061867{9}[source]
Phone directories were one of these weird "one way" services.

In principle, you could use them to map numbers to names, but the way they were designed, it was a lot more effort than using them to map names to numbers. That was deliberate I think.