https://securitytrails.com/ also had my "secret" staging subdomain.
I made a catch-all certificate, so the subdomain didn't show up in CT logs.
It's still a secret to me how my subdomain ended up in their database.
However...I have a subdomain with a not obvious name, like: userfileupload.sampledomain.com
This subdomain IS LIVE but has NOT been publicized/posted anywhere. It's a custom URL for authenticated users to upload media with presigned url to my Cloudflare r2 bucket.
I am using CloudFlare for my DNS.
How did the internet find my subdomain? Some sample user agents are: "Expanse, a Palo Alto Networks company, searches across the global IPv4 space multiple times per day to identify customers' presences on the Internet. If you would like to be excluded from our scans, please send IP addresses/domains to: scaninfo@paloaltonetworks.com", "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_7; en-us) AppleWebKit/534.20.8 (KHTML, like Gecko) Version/5.1 Safari/534.20.8", "Mozilla/5.0 (Linux; Android 9; Redmi Note 5 Pro) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.89 Mobile Safari/537.36",
The bots are GET requests which are failing, as designed, but I'm wondering how the bots even knew the subdomain existed?!
https://securitytrails.com/ also had my "secret" staging subdomain.
I made a catch-all certificate, so the subdomain didn't show up in CT logs.
It's still a secret to me how my subdomain ended up in their database.
If you delegate a subdomain through Cloudflare to your own DNS servers, from what I remember from the animal book, the recursive server should ask Cloudflare for the address of the machine to which the delegation has been made (yours), and while any further resolutions would be answered by your machine, Cloudflare would at very least know of every query to that subdomain.
If you delegate a subdomain and have subdomains under that subdomain, then Cloudflare would only see resolutions to that subdomain and not to the sub-subdomains.
In other words, for most things, they'd have full insight.