However...I have a subdomain with a not obvious name, like: userfileupload.sampledomain.com
This subdomain IS LIVE but has NOT been publicized/posted anywhere. It's a custom URL for authenticated users to upload media with presigned url to my Cloudflare r2 bucket.
I am using CloudFlare for my DNS.
How did the internet find my subdomain? Some sample user agents are: "Expanse, a Palo Alto Networks company, searches across the global IPv4 space multiple times per day to identify customers' presences on the Internet. If you would like to be excluded from our scans, please send IP addresses/domains to: scaninfo@paloaltonetworks.com", "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_7; en-us) AppleWebKit/534.20.8 (KHTML, like Gecko) Version/5.1 Safari/534.20.8", "Mozilla/5.0 (Linux; Android 9; Redmi Note 5 Pro) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.89 Mobile Safari/537.36",
The bots are GET requests which are failing, as designed, but I'm wondering how the bots even knew the subdomain existed?!
Transparency logs are fine except if you have a wildcard cert (or no https, obviously).
IP scans are just this: scans for live ports. If you do not provide a host header in your call you get whatever the default response was set up. This can be a default site, a 404 or anything else.