Most active commenters
  • jchw(3)

←back to thread

287 points govideo | 15 comments | | HN request time: 1.751s | source | bottom

I have a domain that is not live. As expected, loading the domain returns: Error 1016.

However...I have a subdomain with a not obvious name, like: userfileupload.sampledomain.com

This subdomain IS LIVE but has NOT been publicized/posted anywhere. It's a custom URL for authenticated users to upload media with presigned url to my Cloudflare r2 bucket.

I am using CloudFlare for my DNS.

How did the internet find my subdomain? Some sample user agents are: "Expanse, a Palo Alto Networks company, searches across the global IPv4 space multiple times per day to identify customers' presences on the Internet. If you would like to be excluded from our scans, please send IP addresses/domains to: scaninfo@paloaltonetworks.com", "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_7; en-us) AppleWebKit/534.20.8 (KHTML, like Gecko) Version/5.1 Safari/534.20.8", "Mozilla/5.0 (Linux; Android 9; Redmi Note 5 Pro) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.89 Mobile Safari/537.36",

The bots are GET requests which are failing, as designed, but I'm wondering how the bots even knew the subdomain existed?!

Show context
paxys ◴[] No.43287654[source]
Not sure why everyone is going on about certificate transparency logs when the answer is right there in the user agent. The company is scanning the ipv4 space and came upon your IP and port.
replies(6): >>43287671 #>>43287702 #>>43287703 #>>43287895 #>>43287976 #>>43288126 #
1. p0w3n3d ◴[] No.43288126[source]
Finding IP does not mean finding the domain. When doing HTTP request to IP you specify the domain you want to connect to. For example you can configure your /etc/hosts to have xxxnakedhamsters.google.com pointing to 8.8.8.8 and make the http request, which will cause Google getting the domain request (i.e. header Host: xxxnakedhamsters.google.com) and it will refuse it or try to redirect to http. Of course it's only related to HTTP because HTTPS will require certificate. That's why they're speaking about certificates.
replies(4): >>43288228 #>>43288802 #>>43289275 #>>43292054 #
2. ghusto ◴[] No.43288228[source]
First thing I’d do for an IP that answers is a reverse lookup, so I expect that’s at least in the list of things they’d try.
3. lewiscollard ◴[] No.43288802[source]
Depending on the web server's configuration, you very much _can_ find the domain which is configured on an IP address, by attempting to connect to that IP address via HTTPS and seeing what certificate gets served. Here's an example:

https://138.68.161.203/

> Web sites prove their identity via certificates. Firefox does not trust this site because it uses a certificate that is not valid for 138.68.161.203. The certificate is only valid for the following names: exhaust.lewiscollard.com, www.exhaust.lewiscollard.com

replies(1): >>43289108 #
4. jchw ◴[] No.43289108[source]
I don't think that does you any good for Cloudflare, though. They will definitely be using SNI.
replies(2): >>43289333 #>>43296431 #
5. melevittfl ◴[] No.43289275[source]
But there's no evidence in the OP's post that they have, in fact, discovered the domain. The only thing posted is that there is a GET request to a listening web server.

The OP and all the people talking about certificates are making the same assumption. Namely that the scanning company discovered the DNS name for the server and tried to connect. When, if fact, they simply iterate through IP address blocks and make get requests to any listening web servers they find.

replies(3): >>43290815 #>>43292596 #>>43298440 #
6. kelnos ◴[] No.43289333{3}[source]
That doesn't really matter, though. While OP is using Cloudflare, the actual server behind it is still a publicly-accessible IP address that an IPv4 space scanner can easily stumble upon.
replies(1): >>43289570 #
7. jchw ◴[] No.43289570{4}[source]
I misunderstood, I thought the subdomain was an R2 bucket. If it's just normal Cloudflare proxying to some backend this is probably the most likely answer.

That said, while I think it's not the case here, using Cloudflare doesn't mean the underlying host is accessible, as even on the free tier you can use Cloudflare Tunnels, which I often do.

replies(1): >>43296440 #
8. p0w3n3d ◴[] No.43290815[source]
OP states that the domain was discovered
replies(1): >>43290981 #
9. crazygringo ◴[] No.43290981{3}[source]
No they didn't. They said "How did the internet find my subdomain?" They're assuming the internet found their subdomain. They don't provide any evidence that happened, just that they found their IP address.
10. paxys ◴[] No.43292054[source]
> When doing HTTP request to IP you specify the domain you want to connect to

No, you make HTTP requests to an IP, not a domain. You convert the domain name to an IP in an earlier step (via a DNS query). You can connect to servers using their raw IPs and open ports all day if you like, which is what's happening here. Yes servers will (likely) reject the requests by looking at the host header, but they will still receive the request.

11. ◴[] No.43292596[source]
12. ◴[] No.43296431{3}[source]
13. ratg13 ◴[] No.43296440{5}[source]
they only state they are using cloudflare for DNS, they didn't say if they were proxying the connection
replies(1): >>43297617 #
14. jchw ◴[] No.43297617{6}[source]
Also a valid point. I guess without more details all we can really do is speculate about the exact setup. That said, I do now agree that the most likely answer is "the underlying host was accessible and caught by an IPv4 scanner" since well, that's pretty much what it says anyway.
15. denysvitali ◴[] No.43298440[source]
I really doubt CloudFlare gives them an IPv4 and they can see all the logs for said IPv4