←back to thread

90 points Eikon | 1 comments | | HN request time: 0.295s | source

Hi HN,

I've been working on building a pipeline to create a DNS records database lately. The goal is to enable research as well as competitive landscape analysis on the internet.

The dataset for now spans around 4 billion records and covers all the common DNS record types:

    A
    AAAA 
    ANAME
    CAA
    CNAME
    HINFO
    HTTPS
    MX
    NAPTR
    NS
    PTR 
    SOA
    SRV
    SSHFP
    SVCB
    TLSA
    TXT
Each line in the CSV file represents a single DNS record in the following format: www.example.com,A,93.184.215.14

Let me know if you have any questions or feedback!

1. m3047 ◴[] No.41863094[source]
I've worked in the industry at IID and Farsight. I am skeptical of many claims made by IoC vendors.

You need timestamps, or first / last seen.

Records don't exist in a vacuum. They come in RRsets. They are served (sometimes inconsistently) by different nameservers. Some use cases care about this.

Records which don't resolve are also useful, especially for use cases which amount to front-running. On any given day if the wind was blowing the right direction .belkin could be one of the top 10 non-resolving TLDs. If your data is any good, check under .cisco for stuff which resolves to 127.0.53.53. ;-)

Information about provenance (where the data comes from) is required for some use cases.

We shipped Farsight's DNSDB on one or more 1TB drives, depending on what the customer was purchasing.