←back to thread

124 points edent | 1 comments | | HN request time: 0s | source
Show context
abigail95 ◴[] No.42726784[source]
Something is missing here, why do batch jobs take 13 hours? If this thing was started on an old mainframe why isn't the downtime just 5 minutes at 3:39 AM?

Exactly how much data is getting processed?

Edit: Why does rebuilding take a decade or more? This is not a complex system. It doesn't need to solve any novel engineering challenges to operate efficiently. Article does not give much insight into why this particular task couldn't be fixed in 3 months.

replies(6): >>42727086 #>>42727097 #>>42727182 #>>42727884 #>>42730222 #>>42732143 #
ajnin ◴[] No.42727182[source]
The batch jobs don't take 13 hours. They're just scheduled to run some time at night where the old offices used to be closed and the jobs could be ran with some expectations regarding data stability over the period. There are probably many jobs scheduled to run at 1AM then 2AM, etc, all depending on the previous to be finished so there is some large delay to ensure that a job does not start before the previous one is finished.

As to your "not a complex system" remark, when a system is built for 60 years, piling up new rules to implement new legislation and needs over time, you tend to end up with a tangled mess of services all interdependent that are very difficult to replace piece-wise with a new shiny architecturally pure one. This is closer to a distributed monolith than a microservices architecture. In my experience you can't rebuild such a thing "in 3 months". People who believe that are those that don't realize the complexity and the extraordinary amount of specifics, special cases, that are baked into the system, and any attempt to just rebuild from scratch in a few months hits that wall and ends up taking years.

replies(3): >>42727376 #>>42729588 #>>42731073 #
PaulAJ ◴[] No.42729588[source]
Anyone who doesn't understand what's so difficult should read this:

https://wiki.c2.com/?WhyIsPayrollHard

Its from a different domain, but it gives you a flavour of the headaches you encounter. These systems always look simple from the outside, but once you get inside you find endless reams of interrelated and arbitrary business rules that have accumulated. There is probably no complete specification (unless you count the accumulated legal, regulatory and procedural history of the DVLA), and the old code will have little or no accurate documentation (if you are lucky there will be comments).

replies(1): >>42729979 #
stego-tech ◴[] No.42729979{3}[source]
Basically this. The people running the show would desperately like to make it simpler, but ultimately it’s left overly complicated due to priorities from past leadership well above our paygrade.

The right solution is always to just rip off the bandaid and do it again by hand in a new language or platform, and to eliminate useless complexity while doing so. Unfortunately no leader would ever do this because the Board and/or Shareholders would crucify them for not outsourcing it to McKinsey first and using the fancy-pants automation tool their report recommended.

replies(2): >>42730925 #>>42734958 #
1. signal11 ◴[] No.42734958{4}[source]
There are a few shareholder-friendly patterns to get this done, but it is domain-specific. I’d say it’s more “rip off the bandaid slowly and carefully”.

Eg a common one is to wrap a new no-op new service around the old one, and gradually replace parts of the old one (the “strangler fig pattern”).

This is technically great, but it’s also financially great because you are don’t spending large sums on a big-bang rewrite. You’re spending relatively small sums on a “pay as you go” basis, something board members and shareholders do like.

But of course this depends on how your systems are set up.