Samsung 990 Pro 2TB has a latency of 40 μs
DDR4-2133 with a CAS 15 has a latency of 14 nano seconds.
DDR4 latency is 0.035% of one of the fastest SSDs, or to put it another way, DDR4 is 2,857x faster than an SSD.
L1 cache is typically accessible in 4 clock cycles, in 4.8 ghz cpu like the i7-10700, L1 cache latency is sub 1ns.
The amount of complexity the architecture has because of those constraints is insane.
When I worked at my previous job, management kept asking for that scale of designs for less than 1/1000 of the throughput and I was constantly pushing back. There's real costs to building for more scale than you need. It's not as simple as just tweaking a few things.
To me there's a couple of big breakpoints in scale:
* When you can run on a single server
* When you need to run on a single server, but with HA redundancies
* When you have to scale beyond a single server
* When you have to adapt your scale to deal with the limits of a distributed system, i.e. designing for DyanmoDB's partition limits.
Each step in that chain add irrevocable complexity, adds to OE, adds to cost to run and cost to build. Be sure you have to take those steps before you decide too.
Even a very unoptimized application running on a dev laptop can serve 1Gbps nowadays without issues.
So what are the constraints that demand a complex architecture?
* Reading/fetching the data - usernames, phone number, message, etc.
* Generating the content for each message - it might be custom per person
* This is using a 3rd party API that might take anywhere from 100ms to 2s to respond, and you need to leave a connection open.
* Retries on errors, rescheduling, backoffs
* At least once or at most once sends? Each has tradeoffs
* Stopping/starting that many messages at any time
* Rate limits on some services you might be using alongside your service (network gateway, database, etc)
* Recordkeeping - did the message send? When?