287 points govideo | 2 comments | 06 Mar 25 22:34 UTC | HN request time: 0.4s | source

I have a domain that is not live. As expected, loading the domain returns: Error 1016.

However...I have a subdomain with a not obvious name, like: userfileupload.sampledomain.com

This subdomain IS LIVE but has NOT been publicized/posted anywhere. It's a custom URL for authenticated users to upload media with presigned url to my Cloudflare r2 bucket.

I am using CloudFlare for my DNS.

How did the internet find my subdomain? Some sample user agents are: "Expanse, a Palo Alto Networks company, searches across the global IPv4 space multiple times per day to identify customers' presences on the Internet. If you would like to be excluded from our scans, please send IP addresses/domains to: scaninfo@paloaltonetworks.com", "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_7; en-us) AppleWebKit/534.20.8 (KHTML, like Gecko) Version/5.1 Safari/534.20.8", "Mozilla/5.0 (Linux; Android 9; Redmi Note 5 Pro) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.89 Mobile Safari/537.36",

The bots are GET requests which are failing, as designed, but I'm wondering how the bots even knew the subdomain existed?!

Show context

yatralalala ◴[07 Mar 25 12:49 UTC] No.43289743[source]▶

>>43285725 (OP) #

Hi, our company does this basically "as-a-service".

The options how to find it are basically limitless. Best source is probably Certificate Transparency project as others suggested. But it does not end there, some other things that we do are things like internet crawl, domain bruteforcing on wildcard dns, dangling vhosts identification, default certs on servers (connect to IP on 443 and get default cert) and many others.

Security by obscurity does not work. You can not rely on "people won't find it". Once it's online, everyone can find it. No matter how you hide it.

replies(13): >>43289843 #>>43290143 #>>43290420 #>>43290596 #>>43290783 #>>43292505 #>>43292547 #>>43292687 #>>43293087 #>>43303762 #>>43309048 #>>43317788 #>>43341607 #

amelius ◴[07 Mar 25 14:25 UTC] No.43290420[source]▶

>>43289743 #

Well, I sure hope the remainder of my URLs are safe.

replies(1): >>43292913 #

1. amelius ◴[07 Mar 25 18:42 UTC] No.43292913[source]▶

>>43290420 #

Like, in: example.com/secret-id-48723487345

I hope the last bit is not leaked somehow (?)

Btw, we need a "falsehoods programmers believe about URLs" ...

Although there is: https://www.netmeister.org/blog/urls.html

replies(1): >>43293967 #

2. idoubtit ◴[07 Mar 25 20:10 UTC] No.43293967[source]▶

>>43292913 (TP) #

> Although there is: https://www.netmeister.org/blog/urls.html

I think the section named "Pathname" is wrong. It describes the path of an URL as if every server was Apache serving static files with its default configuration. It should describe how the path is converted into a HTTP request.

For instance, the article states that "all of these go to the same place : https://example.org https://example.org/ https://example.org// https://example.org//////////////////". That's wrong. A web client send a distinct HTTP request for each case, e.g starting with `GET // HTTP/1.1`. So the server will receive distinct paths. The assertion of "going to the same place" makes no sense in the general case.

↑

Ask HN: How did the internet discover my subdomain?