Why some DVLA digital services don't work at night

(dafyddvaughan.uk)

124 points edent | 3 comments | 12 Jan 25 20:20 UTC | HN request time: 1.424s | source

Show context

abigail95 ◴[16 Jan 25 15:43 UTC] No.42726784[source]▶

Something is missing here, why do batch jobs take 13 hours? If this thing was started on an old mainframe why isn't the downtime just 5 minutes at 3:39 AM?

Exactly how much data is getting processed?

Edit: Why does rebuilding take a decade or more? This is not a complex system. It doesn't need to solve any novel engineering challenges to operate efficiently. Article does not give much insight into why this particular task couldn't be fixed in 3 months.

replies(6): >>42727086 #>>42727097 #>>42727182 #>>42727884 #>>42730222 #>>42732143 #

shermantanktop ◴[16 Jan 25 16:02 UTC] No.42727086[source]▶

>>42726784 #

It’s funny to me that I would never ask those questions. I’ve specialized in legacy rehab projects (among other things) and there seems to be no upper bound on how bad things can be or how many annoying reasons there are for why we can’t “just fix it.” Those “just” questions—which I ask too—end up being hopelessly naive. The answers will crush your soul if you let them, so you can’t let them, and you should always assume things are worse than you think.

TFA is spot on - the way to make progress is to cut problems up and deliver value. The unfortunate consequence is that badness gets more and more concentrated into the systems that nobody can touch, sort of like the evolution of a star into an eventual black hole.

replies(1): >>42727253 #

abigail95 ◴[16 Jan 25 16:14 UTC] No.42727253[source]▶

>>42727086 #

I made a lot of money moving mid size enterprises from legacy ERP systems to custom in house ones.

The DVLA dataset and the computations that are run on it can be studied and replicated in 3 months by a competent team. From there it can be improved.

There is no way that this system requires 13 hours of downtime. If it required two hours - even if the code was generated through automation it can be reverse engineered and optimized.

It is absolute rubbish that this thing is still unavailable outside of 8am-7pm.

I maintain my position that it could be replaced in 3 months.

I got my start in this business when I was in university and they told us our online learning software was going offline for 3 days for an upgrade. Those are the gatekeepers and low achievers we fight against. Think bigger.

replies(3): >>42729952 #>>42731011 #>>42732865 #

1. pixl97 ◴[17 Jan 25 00:56 UTC] No.42732865[source]▶

>>42727253 #

Ya I don't think I'd let you in two miles of a system like this.

Replacing legacy stuff always expands in scope far beyond the initial changes.

When you have to come back and add wait() entries in your new program because it spits data back faster than the old program ever could which then causes peripheral devices/drivers to crash which then pulls a dev and testers off something else important for days figuring out what kind of fresh hell is occurring is just status quo for ancient systems.

replies(1): >>42735293 #

2. gunian ◴[17 Jan 25 08:33 UTC] No.42735293[source]▶

>>42732865 (TP) #

idk much about dev much less legacy / enterprise dev but it seems like an A/B test type situation where you have 90% of the users running the legacy code and the remaining 10 on a new implementation would be feasible any idea why this is the case?

replies(1): >>42740736 #

3. pixl97 ◴[17 Jan 25 17:24 UTC] No.42740736[source]▶

>>42735293 #

This is what happens. All that testing with the required stakeholders takes way way more time than you'd expect.

Gets even more fun in .gov where the work can change significantly at particular times of the year.

Had one piece of Windows software required by the State of Texas used at year end like once a year. Seemingly nobody realized windows updates had stopped it from working until a few weeks before the deadline. I had to setup a box without updates for it to run for my customer. Lead to a lot of panic around the state.

↑