Ask HN: How did the internet discover my subdomain?

1. paxys ◴[07 Mar 25 05:55 UTC] No.43287654[source]▶

Not sure why everyone is going on about certificate transparency logs when the answer is right there in the user agent. The company is scanning the ipv4 space and came upon your IP and port.

replies(6): >>43287671 #>>43287702 #>>43287703 #>>43287895 #>>43287976 #>>43288126 #

2. ◴[07 Mar 25 06:00 UTC] No.43287671[source]▶

>>43287654 (TP) #

3. pkulak ◴[07 Mar 25 06:12 UTC] No.43287702[source]▶

>>43287654 (TP) #

Okay. But how did they get the proper host header?

replies(4): >>43287720 #>>43287730 #>>43287736 #>>43290041 #

4. peeters ◴[07 Mar 25 06:12 UTC] No.43287703[source]▶

>>43287654 (TP) #

It's rather hilarious that nobody mentioned this in 7 hours. What am I missing?

~5 billion scans in a few hours is nothing for a company with decent resources. OP: in case you didn't follow, they're literally trying every possible IPv4 address and seeing if something exists on standard ports at that address.

I believe it would be harder to find out your domain that way if you were using SNI and only forwarded/served requests that used the correct host. But if you aren't using SNI, your server is probably just responding to any TLS connect request with your subdomain's cert, which will reveal your hostname.

replies(3): >>43287789 #>>43288073 #>>43288140 #

5. peeters ◴[07 Mar 25 06:17 UTC] No.43287720[source]▶

>>43287702 #

There are a couple easy possibilities depending on server config.

1. Not using SNI, and all https requests just respond with the same cert. (Example, go to https://209.216.230.207/ and you'll get a certificate error. Go to the cert details and you'll see the common name is news.ycombinator.com).

2. http upgrades to https with a redirect to the hostname, not IP address. (Example, go to http://209.216.230.207/ and you get a 301 redirect to https://news.ycombinator.com)

6. jimnotgym ◴[07 Mar 25 06:20 UTC] No.43287730[source]▶

>>43287702 #

I don't think op said that they had the correct host header?

7. INTPenis ◴[07 Mar 25 06:21 UTC] No.43287736[source]▶

>>43287702 #

Could be a number of ways for example a default TLS cert, or a default vhost redirect.

I actually had a job once a few years ago where I was asked to hide a web service from crawlers and so I did some of these things to ensure no info leaked about the real vhost.

8. Dylan16807 ◴[07 Mar 25 06:37 UTC] No.43287789[source]▶

>>43287703 #

> What am I missing?

That it was in fact mentioned many hours earlier, in more than one top level comment.

replies(1): >>43287855 #

9. peeters ◴[07 Mar 25 06:57 UTC] No.43287855{3}[source]▶

>>43287789 #

I was referring more to the fact that the user agent explicitly contained the answer, rather than suggestions that it was IP scanning. But you're right I do see one comment that mentions that. And many more likely assumed the OP already figured that part out.

replies(1): >>43288817 #

10. ozim ◴[07 Mar 25 07:13 UTC] No.43287895[source]▶

>>43287654 (TP) #

That perfectly fits midwit meme. Lots of people are smart enough to know transparency logs - but not smart enough to read OP post and understand the details.

replies(1): >>43289610 #

11. 4ndrewl ◴[07 Mar 25 07:34 UTC] No.43287976[source]▶

>>43287654 (TP) #

Also it's Palo Alto. They're not some kiddie scripters. https://en.m.wikipedia.org/wiki/Palo_Alto_Networks

replies(3): >>43288357 #>>43289263 #>>43290403 #

12. globular-toast ◴[07 Mar 25 07:53 UTC] No.43288073[source]▶

>>43287703 #

> What am I missing?

It's very common for people to read only up to the point they feel they can comment, then skip immediately to the comment. So, basically, noone read it.

replies(1): >>43289669 #

13. p0w3n3d ◴[07 Mar 25 08:01 UTC] No.43288126[source]▶

>>43287654 (TP) #

Finding IP does not mean finding the domain. When doing HTTP request to IP you specify the domain you want to connect to. For example you can configure your /etc/hosts to have xxxnakedhamsters.google.com pointing to 8.8.8.8 and make the http request, which will cause Google getting the domain request (i.e. header Host: xxxnakedhamsters.google.com) and it will refuse it or try to redirect to http. Of course it's only related to HTTP because HTTPS will require certificate. That's why they're speaking about certificates.

replies(4): >>43288228 #>>43288802 #>>43289275 #>>43292054 #

14. fragmede ◴[07 Mar 25 08:05 UTC] No.43288140[source]▶

>>43287703 #

Just the default hostname. It won't reveal all of them or any of the IP addresses of that box. secret-freedom-fighter.ice-cream-shop.example.com could have the same IP as example.com and you'd only know example.com

replies(1): >>43288170 #

15. A1kmm ◴[07 Mar 25 08:11 UTC] No.43288170{3}[source]▶

>>43288140 #

If you've got one cert with a subject alt name for each host, they'd see them all. If you use SNI and they have different certificates, the domains might still be in Certificate Transparency logs. If a wildcard cert is used, that could help to conceal the exact subdomain.

16. ghusto ◴[07 Mar 25 08:23 UTC] No.43288228[source]▶

>>43288126 #

First thing I’d do for an IP that answers is a reverse lookup, so I expect that’s at least in the list of things they’d try.

17. chinathrow ◴[07 Mar 25 08:42 UTC] No.43288357[source]▶

>>43287976 #

Hm?

They sell you security but provide you with CVEs en masse.

https://www.cybersecuritydive.com/news/palo-alto-networks--h...

replies(1): >>43304091 #

18. lewiscollard ◴[07 Mar 25 09:57 UTC] No.43288802[source]▶

>>43288126 #

Depending on the web server's configuration, you very much _can_ find the domain which is configured on an IP address, by attempting to connect to that IP address via HTTPS and seeing what certificate gets served. Here's an example:

https://138.68.161.203/

> Web sites prove their identity via certificates. Firefox does not trust this site because it uses a certificate that is not valid for 138.68.161.203. The certificate is only valid for the following names: exhaust.lewiscollard.com, www.exhaust.lewiscollard.com

replies(1): >>43289108 #

19. Dylan16807 ◴[07 Mar 25 09:58 UTC] No.43288817{4}[source]▶

>>43287855 #

The user agent contains a partial answer. IP scanning doesn't give you the actual subdomain, so the question is slightly wrong or there are missing pieces.

replies(1): >>43288863 #

20. diggan ◴[07 Mar 25 10:07 UTC] No.43288863{5}[source]▶

>>43288817 #

Judging by the logs (user agents really) right now in the submission, it's hard to tell if the requests were actually for the domain (since the request headers aren't included) or just for the IP.

replies(1): >>43293530 #

21. jchw ◴[07 Mar 25 10:50 UTC] No.43289108{3}[source]▶

>>43288802 #

I don't think that does you any good for Cloudflare, though. They will definitely be using SNI.

replies(2): >>43289333 #>>43296431 #

22. ThatMedicIsASpy ◴[07 Mar 25 11:21 UTC] No.43289263[source]▶

>>43287976 #

Am I google when I come with the useragent 'google here, no evil'?

23. melevittfl ◴[07 Mar 25 11:22 UTC] No.43289275[source]▶

>>43288126 #

But there's no evidence in the OP's post that they have, in fact, discovered the domain. The only thing posted is that there is a GET request to a listening web server.

The OP and all the people talking about certificates are making the same assumption. Namely that the scanning company discovered the DNS name for the server and tried to connect. When, if fact, they simply iterate through IP address blocks and make get requests to any listening web servers they find.

replies(3): >>43290815 #>>43292596 #>>43298440 #

24. kelnos ◴[07 Mar 25 11:32 UTC] No.43289333{4}[source]▶

>>43289108 #

That doesn't really matter, though. While OP is using Cloudflare, the actual server behind it is still a publicly-accessible IP address that an IPv4 space scanner can easily stumble upon.

replies(1): >>43289570 #

25. jchw ◴[07 Mar 25 12:20 UTC] No.43289570{5}[source]▶

>>43289333 #

I misunderstood, I thought the subdomain was an R2 bucket. If it's just normal Cloudflare proxying to some backend this is probably the most likely answer.

That said, while I think it's not the case here, using Cloudflare doesn't mean the underlying host is accessible, as even on the free tier you can use Cloudflare Tunnels, which I often do.

replies(1): >>43296440 #

26. seba_dos1 ◴[07 Mar 25 12:27 UTC] No.43289610[source]▶

>>43287895 #

The details aren't there, so it's "assume" rather than "understand".

The only proper response to OP's question is to ask for clarification: is the subdomain pointing to a separate IP? Are the logs vhost-specific or not?

If you don't get the answers, all you can do is to assume, and both assumptions may end up being right or wrong (with varying probability, perhaps).

27. flemhans ◴[07 Mar 25 12:37 UTC] No.43289669{3}[source]▶

>>43288073 #

Funny, that'd be so unthinkable for me to do! But you're probably right.

28. paxys ◴[07 Mar 25 13:39 UTC] No.43290041[source]▶

>>43287702 #

Who says they did?

29. bildung ◴[07 Mar 25 14:23 UTC] No.43290403[source]▶

>>43287976 #

Looking at how they earned their 100s of CVEs, script kiddie almost looks like a compliment

30. p0w3n3d ◴[07 Mar 25 15:16 UTC] No.43290815{3}[source]▶

>>43289275 #

OP states that the domain was discovered

replies(1): >>43290981 #

31. crazygringo ◴[07 Mar 25 15:34 UTC] No.43290981{4}[source]▶

>>43290815 #

No they didn't. They said "How did the internet find my subdomain?" They're assuming the internet found their subdomain. They don't provide any evidence that happened, just that they found their IP address.

32. paxys ◴[07 Mar 25 17:23 UTC] No.43292054[source]▶

>>43288126 #

> When doing HTTP request to IP you specify the domain you want to connect to

No, you make HTTP requests to an IP, not a domain. You convert the domain name to an IP in an earlier step (via a DNS query). You can connect to servers using their raw IPs and open ports all day if you like, which is what's happening here. Yes servers will (likely) reject the requests by looking at the host header, but they will still receive the request.

33. ◴[07 Mar 25 18:16 UTC] No.43292596{3}[source]▶

>>43289275 #

34. Dylan16807 ◴[07 Mar 25 19:32 UTC] No.43293530{6}[source]▶

>>43288863 #

Yes, that's the question being wrong option I listed.

35. ◴[08 Mar 25 00:43 UTC] No.43296431{4}[source]▶

>>43289108 #

36. ratg13 ◴[08 Mar 25 00:44 UTC] No.43296440{6}[source]▶

>>43289570 #

they only state they are using cloudflare for DNS, they didn't say if they were proxying the connection

replies(1): >>43297617 #

37. jchw ◴[08 Mar 25 04:57 UTC] No.43297617{7}[source]▶

>>43296440 #

Also a valid point. I guess without more details all we can really do is speculate about the exact setup. That said, I do now agree that the most likely answer is "the underlying host was accessible and caught by an IPv4 scanner" since well, that's pretty much what it says anyway.

38. denysvitali ◴[08 Mar 25 08:03 UTC] No.43298440{3}[source]▶

>>43289275 #

I really doubt CloudFlare gives them an IPv4 and they can see all the logs for said IPv4

39. heraldgeezer ◴[08 Mar 25 22:20 UTC] No.43304091{3}[source]▶

>>43288357 #

Ah yes we all know if you sell a firewall the code has to be 100% bug free unbreakable