Ask HN: How did the internet discover my subdomain?

287 points govideo | 2 comments | 06 Mar 25 22:34 UTC | HN request time: 0.71s | source

I have a domain that is not live. As expected, loading the domain returns: Error 1016.

However...I have a subdomain with a not obvious name, like: userfileupload.sampledomain.com

This subdomain IS LIVE but has NOT been publicized/posted anywhere. It's a custom URL for authenticated users to upload media with presigned url to my Cloudflare r2 bucket.

I am using CloudFlare for my DNS.

How did the internet find my subdomain? Some sample user agents are: "Expanse, a Palo Alto Networks company, searches across the global IPv4 space multiple times per day to identify customers' presences on the Internet. If you would like to be excluded from our scans, please send IP addresses/domains to: scaninfo@paloaltonetworks.com", "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_7; en-us) AppleWebKit/534.20.8 (KHTML, like Gecko) Version/5.1 Safari/534.20.8", "Mozilla/5.0 (Linux; Android 9; Redmi Note 5 Pro) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.89 Mobile Safari/537.36",

The bots are GET requests which are failing, as designed, but I'm wondering how the bots even knew the subdomain existed?!

Show context

yatralalala ◴[07 Mar 25 12:49 UTC] No.43289743[source]▶

>>43285725 (OP) #

Hi, our company does this basically "as-a-service".

The options how to find it are basically limitless. Best source is probably Certificate Transparency project as others suggested. But it does not end there, some other things that we do are things like internet crawl, domain bruteforcing on wildcard dns, dangling vhosts identification, default certs on servers (connect to IP on 443 and get default cert) and many others.

Security by obscurity does not work. You can not rely on "people won't find it". Once it's online, everyone can find it. No matter how you hide it.

replies(13): >>43289843 #>>43290143 #>>43290420 #>>43290596 #>>43290783 #>>43292505 #>>43292547 #>>43292687 #>>43293087 #>>43303762 #>>43309048 #>>43317788 #>>43341607 #

TZubiri ◴[07 Mar 25 13:06 UTC] No.43289843[source]▶

>>43289743 #

"Security by obscurity does not work"

This is one of those false voyeur OS internet tennets designed to get people to publish their stuff.

Obscurity is a fine strategy, if you don't post your source that's good. If you post your source, that's a risk.

The fact that you can't rely on that security measure is just a basic security tennet that applies to everything: don't rely on a single security measure, use redundant barriers.

Truth is we don't know how the subdomain got leaked. Subdomains can be passwords and a well crafted subdomain should not leak, if it leaks there is a reason.

replies(16): >>43290226 #>>43290237 #>>43290330 #>>43290608 #>>43290616 #>>43290675 #>>43290677 #>>43290740 #>>43290760 #>>43291317 #>>43291775 #>>43291815 #>>43292414 #>>43292523 #>>43292777 #>>43295244 #

lolinder ◴[07 Mar 25 14:05 UTC] No.43290237[source]▶

>>43289843 #

Truth is we don't know that the subdomain got leaked. The example user agent they give says that the methodology they're using is to scan the IPv4 space, which is a great example of why security through obscurity doesn't work here: The IPv4 space is tiny and trivial to scan. If your server has an IPv4 address it's not obscure, you should assume it's publicly reachable and plan accordingly.

> Subdomains can be passwords and a well crafted subdomain should not leak, if it leaks there is a reason.

The problem with this theory is that DNS was never designed to be secret and private and even after DNS over HTTPS it's still not designed to be private for the servers. This means that getting to "well crafted" is an incredibly difficult task with hundreds of possible failure modes which need constant maintenance and attention—not only is it complicated to get right the first time, you have to reconfigure away the failure modes on every device or even on every use of the "password".

Here are just a few failure modes I can think of off the top of my head. Yes, these have mitigations, but it's a game of whack-a-mole and you really don't want to try it:

* Certificate transparency logs, as mentioned.

* A user of your "password" forgets that they didn't configure DNS over HTTPS on a new device and leaves a trail of logs through a dozen recursive DNS servers and ISPs.

* A user has DNS over HTTPS but doesn't point it at a server within your control. One foreign server having the password is better than dozens and their ISPs, but you don't have any control over that default DNS server nor how many different servers your clients will attempt to use.

* Browser history.

Just don't. Work with the grain, assume the subdomain is public and secure your site accordingly.

replies(1): >>43290464 #

immibis ◴[07 Mar 25 14:32 UTC] No.43290464[source]▶

>>43290237 #

> The IPv4 space is tiny and trivial to scan

Something many people don't expect is that the IPv6 space is also tiny and trivial to scan, if you follow certain patterns.

For example, many server hosts give you a /48 or /64 subnet, and your server is at your prefix::1 by default. If they have a /24 and they give you a /48, someone only has to scan 2^24 addresses at that host to find all the ones using prefix::1.

replies(2): >>43291509 #>>43291586 #

1. Sayrus ◴[07 Mar 25 16:26 UTC] No.43291509[source]▶

>>43290464 #

Assuming everyone is using /48 and binding to prefix::1, that's a 2^16 difference with scanning the IPv4 address space. Assuming a specific host with only one IPv6 /24 block and delegating /64, this is a 2^12 difference. Scanning for /64 on the entire IPv6 space is definitely not as tiny.

AWS only allows routing /80 to EC2 instances making a huge difference.

It doesn't mean that we should rely on obscurity, but the entire space is not tiny as IPv4 was.

replies(1): >>43295585 #

2. TZubiri ◴[07 Mar 25 22:39 UTC] No.43295585[source]▶

>>43291509 (TP) #

Interesting, so you may see the Ipv6 space as a tree, and go just for the first addresses of the block.

But if you just choose a random address you would enjoy a bit more immunity from brute force scanners here.

↑