Most active commenters
  • paxys(3)
  • (3)
  • peeters(3)
  • Dylan16807(3)
  • jchw(3)

←back to thread

287 points govideo | 39 comments | | HN request time: 0.751s | source | bottom

I have a domain that is not live. As expected, loading the domain returns: Error 1016.

However...I have a subdomain with a not obvious name, like: userfileupload.sampledomain.com

This subdomain IS LIVE but has NOT been publicized/posted anywhere. It's a custom URL for authenticated users to upload media with presigned url to my Cloudflare r2 bucket.

I am using CloudFlare for my DNS.

How did the internet find my subdomain? Some sample user agents are: "Expanse, a Palo Alto Networks company, searches across the global IPv4 space multiple times per day to identify customers' presences on the Internet. If you would like to be excluded from our scans, please send IP addresses/domains to: scaninfo@paloaltonetworks.com", "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_7; en-us) AppleWebKit/534.20.8 (KHTML, like Gecko) Version/5.1 Safari/534.20.8", "Mozilla/5.0 (Linux; Android 9; Redmi Note 5 Pro) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.89 Mobile Safari/537.36",

The bots are GET requests which are failing, as designed, but I'm wondering how the bots even knew the subdomain existed?!

1. paxys ◴[] No.43287654[source]
Not sure why everyone is going on about certificate transparency logs when the answer is right there in the user agent. The company is scanning the ipv4 space and came upon your IP and port.
replies(6): >>43287671 #>>43287702 #>>43287703 #>>43287895 #>>43287976 #>>43288126 #
2. ◴[] No.43287671[source]
3. pkulak ◴[] No.43287702[source]
Okay. But how did they get the proper host header?
replies(4): >>43287720 #>>43287730 #>>43287736 #>>43290041 #
4. peeters ◴[] No.43287703[source]
It's rather hilarious that nobody mentioned this in 7 hours. What am I missing?

~5 billion scans in a few hours is nothing for a company with decent resources. OP: in case you didn't follow, they're literally trying every possible IPv4 address and seeing if something exists on standard ports at that address.

I believe it would be harder to find out your domain that way if you were using SNI and only forwarded/served requests that used the correct host. But if you aren't using SNI, your server is probably just responding to any TLS connect request with your subdomain's cert, which will reveal your hostname.

replies(3): >>43287789 #>>43288073 #>>43288140 #
5. peeters ◴[] No.43287720[source]
There are a couple easy possibilities depending on server config.

1. Not using SNI, and all https requests just respond with the same cert. (Example, go to https://209.216.230.207/ and you'll get a certificate error. Go to the cert details and you'll see the common name is news.ycombinator.com).

2. http upgrades to https with a redirect to the hostname, not IP address. (Example, go to http://209.216.230.207/ and you get a 301 redirect to https://news.ycombinator.com)

6. jimnotgym ◴[] No.43287730[source]
I don't think op said that they had the correct host header?
7. INTPenis ◴[] No.43287736[source]
Could be a number of ways for example a default TLS cert, or a default vhost redirect.

I actually had a job once a few years ago where I was asked to hide a web service from crawlers and so I did some of these things to ensure no info leaked about the real vhost.

8. Dylan16807 ◴[] No.43287789[source]
> What am I missing?

That it was in fact mentioned many hours earlier, in more than one top level comment.

replies(1): >>43287855 #
9. peeters ◴[] No.43287855{3}[source]
I was referring more to the fact that the user agent explicitly contained the answer, rather than suggestions that it was IP scanning. But you're right I do see one comment that mentions that. And many more likely assumed the OP already figured that part out.
replies(1): >>43288817 #
10. ozim ◴[] No.43287895[source]
That perfectly fits midwit meme. Lots of people are smart enough to know transparency logs - but not smart enough to read OP post and understand the details.
replies(1): >>43289610 #
11. 4ndrewl ◴[] No.43287976[source]
Also it's Palo Alto. They're not some kiddie scripters. https://en.m.wikipedia.org/wiki/Palo_Alto_Networks
replies(3): >>43288357 #>>43289263 #>>43290403 #
12. globular-toast ◴[] No.43288073[source]
> What am I missing?

It's very common for people to read only up to the point they feel they can comment, then skip immediately to the comment. So, basically, noone read it.

replies(1): >>43289669 #
13. p0w3n3d ◴[] No.43288126[source]
Finding IP does not mean finding the domain. When doing HTTP request to IP you specify the domain you want to connect to. For example you can configure your /etc/hosts to have xxxnakedhamsters.google.com pointing to 8.8.8.8 and make the http request, which will cause Google getting the domain request (i.e. header Host: xxxnakedhamsters.google.com) and it will refuse it or try to redirect to http. Of course it's only related to HTTP because HTTPS will require certificate. That's why they're speaking about certificates.
replies(4): >>43288228 #>>43288802 #>>43289275 #>>43292054 #
14. fragmede ◴[] No.43288140[source]
Just the default hostname. It won't reveal all of them or any of the IP addresses of that box. secret-freedom-fighter.ice-cream-shop.example.com could have the same IP as example.com and you'd only know example.com
replies(1): >>43288170 #
15. A1kmm ◴[] No.43288170{3}[source]
If you've got one cert with a subject alt name for each host, they'd see them all. If you use SNI and they have different certificates, the domains might still be in Certificate Transparency logs. If a wildcard cert is used, that could help to conceal the exact subdomain.
16. ghusto ◴[] No.43288228[source]
First thing I’d do for an IP that answers is a reverse lookup, so I expect that’s at least in the list of things they’d try.
17. chinathrow ◴[] No.43288357[source]
Hm?

They sell you security but provide you with CVEs en masse.

https://www.cybersecuritydive.com/news/palo-alto-networks--h...

replies(1): >>43304091 #
18. lewiscollard ◴[] No.43288802[source]
Depending on the web server's configuration, you very much _can_ find the domain which is configured on an IP address, by attempting to connect to that IP address via HTTPS and seeing what certificate gets served. Here's an example:

https://138.68.161.203/

> Web sites prove their identity via certificates. Firefox does not trust this site because it uses a certificate that is not valid for 138.68.161.203. The certificate is only valid for the following names: exhaust.lewiscollard.com, www.exhaust.lewiscollard.com

replies(1): >>43289108 #
19. Dylan16807 ◴[] No.43288817{4}[source]
The user agent contains a partial answer. IP scanning doesn't give you the actual subdomain, so the question is slightly wrong or there are missing pieces.
replies(1): >>43288863 #
20. diggan ◴[] No.43288863{5}[source]
Judging by the logs (user agents really) right now in the submission, it's hard to tell if the requests were actually for the domain (since the request headers aren't included) or just for the IP.
replies(1): >>43293530 #
21. jchw ◴[] No.43289108{3}[source]
I don't think that does you any good for Cloudflare, though. They will definitely be using SNI.
replies(2): >>43289333 #>>43296431 #
22. ThatMedicIsASpy ◴[] No.43289263[source]
Am I google when I come with the useragent 'google here, no evil'?
23. melevittfl ◴[] No.43289275[source]
But there's no evidence in the OP's post that they have, in fact, discovered the domain. The only thing posted is that there is a GET request to a listening web server.

The OP and all the people talking about certificates are making the same assumption. Namely that the scanning company discovered the DNS name for the server and tried to connect. When, if fact, they simply iterate through IP address blocks and make get requests to any listening web servers they find.

replies(3): >>43290815 #>>43292596 #>>43298440 #
24. kelnos ◴[] No.43289333{4}[source]
That doesn't really matter, though. While OP is using Cloudflare, the actual server behind it is still a publicly-accessible IP address that an IPv4 space scanner can easily stumble upon.
replies(1): >>43289570 #
25. jchw ◴[] No.43289570{5}[source]
I misunderstood, I thought the subdomain was an R2 bucket. If it's just normal Cloudflare proxying to some backend this is probably the most likely answer.

That said, while I think it's not the case here, using Cloudflare doesn't mean the underlying host is accessible, as even on the free tier you can use Cloudflare Tunnels, which I often do.

replies(1): >>43296440 #
26. seba_dos1 ◴[] No.43289610[source]
The details aren't there, so it's "assume" rather than "understand".

The only proper response to OP's question is to ask for clarification: is the subdomain pointing to a separate IP? Are the logs vhost-specific or not?

If you don't get the answers, all you can do is to assume, and both assumptions may end up being right or wrong (with varying probability, perhaps).

27. flemhans ◴[] No.43289669{3}[source]
Funny, that'd be so unthinkable for me to do! But you're probably right.
28. paxys ◴[] No.43290041[source]
Who says they did?
29. bildung ◴[] No.43290403[source]
Looking at how they earned their 100s of CVEs, script kiddie almost looks like a compliment
30. p0w3n3d ◴[] No.43290815{3}[source]
OP states that the domain was discovered
replies(1): >>43290981 #
31. crazygringo ◴[] No.43290981{4}[source]
No they didn't. They said "How did the internet find my subdomain?" They're assuming the internet found their subdomain. They don't provide any evidence that happened, just that they found their IP address.
32. paxys ◴[] No.43292054[source]
> When doing HTTP request to IP you specify the domain you want to connect to

No, you make HTTP requests to an IP, not a domain. You convert the domain name to an IP in an earlier step (via a DNS query). You can connect to servers using their raw IPs and open ports all day if you like, which is what's happening here. Yes servers will (likely) reject the requests by looking at the host header, but they will still receive the request.

33. ◴[] No.43292596{3}[source]
34. Dylan16807 ◴[] No.43293530{6}[source]
Yes, that's the question being wrong option I listed.
35. ◴[] No.43296431{4}[source]
36. ratg13 ◴[] No.43296440{6}[source]
they only state they are using cloudflare for DNS, they didn't say if they were proxying the connection
replies(1): >>43297617 #
37. jchw ◴[] No.43297617{7}[source]
Also a valid point. I guess without more details all we can really do is speculate about the exact setup. That said, I do now agree that the most likely answer is "the underlying host was accessible and caught by an IPv4 scanner" since well, that's pretty much what it says anyway.
38. denysvitali ◴[] No.43298440{3}[source]
I really doubt CloudFlare gives them an IPv4 and they can see all the logs for said IPv4
39. heraldgeezer ◴[] No.43304091{3}[source]
Ah yes we all know if you sell a firewall the code has to be 100% bug free unbreakable