First, the IOT devices reporting daily. In the absence of further context, I’m going to assume that it doesn’t matter when the devices report, so they can be configured to spread out their load. I’m also going to assume 1kb of data per device, but with an HTTPS API there’s roughly 7kb of overhead that we need to account for when calculating bandwidth. (Source: http://netsekure.org/2010/03/tls-overhead/ . TLS session resumption gets that down to ~300 bytes, but adds implementation complications.)
$ units -1v '1M req/day' 'req/sec'
1M req/day = 11.574074 req/sec
$ units -1v '1M req/day * 8kbyte/req' 'kbyte/sec' # Incoming bandwidth
1M req/day * 8kbyte/req = 92.592593 kbyte/sec
$ units -1v '1M req/day * 1kbyte/req' 'years/TB' # Storage
reciprocal conversion
1 / (1M req/day * 1kbyte/req) = 2.7379093 years/TB
It looks like our load here is a whopping 12 RPS, and we could handle the traffic on an ISDN line from the 1990s. Data storage is a little trickier; if we can’t compress or delete old data we may have to stop by Best Buy for a new hard drive every half decade or so.Users can’t be configured to load balance themselves, so we’ll be pessemistic and assume that every single one of them logs in to check their device over their morning coffee. We’ll also assume that every time they do that they want all of the data from the last 3 months, though in practice this could probably be summarized before we send it to them.
$ units -1v '5000 req/1 hour' 'req/sec'
5000 req/1 hour = 1.3888889 req/sec
$ units -1v '5000 req/1 hour * 90 kbyte/req' 'MB/sec' # Outgoing bandwidth
5000 req/1 hour * 90 kbyte/req = 0.125 MB/sec
For this, we have just under 2 RPS, but our responses are quite a lot bigger so the bandwidth is higher—we probably want to move into the early 2000s and upgrade to a DSL connection. Oh, and we also want to make sure our disk can handle the read load—but conveniently, since we’re just pulling raw data and not asking for any complicated joins, these numbers are actually the same. 2 RPS gives us 500ms per request; since a spinning rust drive pessimistically takes 10ms/seek we only have a ~50-seek budget per request so we probably want to make sure to cluster or partition the table by device to improve data locality. (Doing that increases the write load significantly, though, so maybe we want to upgrade to an SSD or think about a database that’s smart enough to do some kind of periodic bulk rebalancing.)Oh, I almost forgot, we'll also want to make sure we have disk space to keep track of all of those idle logins who aren’t doing anything:
$ units -1v '1M records * 1kbyte/rec' 'GB'
1M records * 1kbyte/rec = 1 GB
Modern computers—for very relative definitions of modern—are fast, if they don’t get bogged down. Based on my numbers, you could probably run your system from a Macbook in your desk drawer; but it would be trivial to add a requirement which would multiply any of these numbers by several orders of magnitude. The problem with trying to compare architectures is that you have to really understand what your needs are, and not only are they going to be difficult to compare to someone else’s setup (small differences can add up fast when multiplied across requests) but it’s also hard to tell how much effort the other system put into optimizing (or, for that matter, how much effort you want to put into optimizing).