←back to thread

90 points Eikon | 1 comments | | HN request time: 0.201s | source

Hi HN,

I've been working on building a pipeline to create a DNS records database lately. The goal is to enable research as well as competitive landscape analysis on the internet.

The dataset for now spans around 4 billion records and covers all the common DNS record types:

    A
    AAAA 
    ANAME
    CAA
    CNAME
    HINFO
    HTTPS
    MX
    NAPTR
    NS
    PTR 
    SOA
    SRV
    SSHFP
    SVCB
    TLSA
    TXT
Each line in the CSV file represents a single DNS record in the following format: www.example.com,A,93.184.215.14

Let me know if you have any questions or feedback!

Show context
romperstomper ◴[] No.41871193[source]
There are quite many duplicates, looks like for CNAME records only/mostly. Here are some from the beginning

  staging.pannekoeken-poffertjes-restaurant-amstelland.nl,CNAME,www.pannekoeken-poffertjes-restaurant-amstelland.nl.
  staging.pannekoeken-poffertjes-restaurant-amstelland.nl,CNAME,www.pannekoeken-poffertjes-restaurant-amstelland.nl.
  www.domiciliatuempresa.com,CNAME,domiciliatuempresa.com.
  www.domiciliatuempresa.com,CNAME,domiciliatuempresa.com.
  *.autokozmetikakaposvar.hu,CNAME,autokozmetikakaposvar.hu.
  *.autokozmetikakaposvar.hu,CNAME,autokozmetikakaposvar.hu.
  c7ac691a.oob-nuq1907.indubitably.xyz,CNAME,oob-nuq1907.hosts.secretcdn.net.
  c7ac691a.oob-nuq1907.indubitably.xyz,CNAME,oob-nuq1907.hosts.secretcdn.net.
etc
replies(1): >>41871999 #
1. Eikon ◴[] No.41871999[source]
It’s because I don’t try to de duplicate and just saves whatever response I get, which translates to this obvious behavior for cnames. Shouldn’t be a big deal.

I may improve that in future releases.