287 points govideo | 1 comments | 06 Mar 25 22:34 UTC | HN request time: 0.254s | source

I have a domain that is not live. As expected, loading the domain returns: Error 1016.

However...I have a subdomain with a not obvious name, like: userfileupload.sampledomain.com

This subdomain IS LIVE but has NOT been publicized/posted anywhere. It's a custom URL for authenticated users to upload media with presigned url to my Cloudflare r2 bucket.

I am using CloudFlare for my DNS.

How did the internet find my subdomain? Some sample user agents are: "Expanse, a Palo Alto Networks company, searches across the global IPv4 space multiple times per day to identify customers' presences on the Internet. If you would like to be excluded from our scans, please send IP addresses/domains to: scaninfo@paloaltonetworks.com", "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_7; en-us) AppleWebKit/534.20.8 (KHTML, like Gecko) Version/5.1 Safari/534.20.8", "Mozilla/5.0 (Linux; Android 9; Redmi Note 5 Pro) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.89 Mobile Safari/537.36",

The bots are GET requests which are failing, as designed, but I'm wondering how the bots even knew the subdomain existed?!

1. oliwarner ◴[07 Mar 25 18:35 UTC] No.43292828[source]▶

>>43285725 (OP) #

Certificate Transparency would also be my guess. These are logs published by big TLS certificate issuers to cross-check and make sure they're not issuing certificates for domains they have no standing on.

The way around this is to issue a wildcard for your root domain and use that. Your main domain is discoverable but your subs aren't.

There are other routes: leaky extensions, leaky DNS servers, bad internet security system utilities that phone home about traffic. Who knows?

Unless your IP address redirects to your subdomain —not unheard of— it's not somebody IP/port scanning. Webservers don't typically leak anything about the domains they serve for.

↑

Ask HN: How did the internet discover my subdomain?