Most active commenters
  • isoprophlex(3)

←back to thread

1087 points smartmic | 33 comments | | HN request time: 1.853s | source | bottom
Show context
anthomtb ◴[] No.44303941[source]
So many gems in here but this one about microservices is my favorite:

grug wonder why big brain take hardest problem, factoring system correctly, and introduce network call too

replies(8): >>44304390 #>>44304916 #>>44305299 #>>44305300 #>>44306811 #>>44306862 #>>44306886 #>>44309515 #
1. default-kramer ◴[] No.44304916[source]
I'm convinced that some people don't know any other way to break down a system into smaller parts. To these people, if it's not exposed as a API call it's just some opaque blob of code that cannot be understood or reused.
replies(5): >>44304992 #>>44305050 #>>44307611 #>>44308060 #>>44310571 #
2. dkarl ◴[] No.44304992[source]
That's what I've observed empirically over my last half-dozen jobs. Many developers treat decomposition and contract design between services seriously, and work until they get it right. I've seen very few developers who put the same effort into decomposing the modules of a monolith and designing the interfaces between them, and never enough in the same team to stop a monolith from turning into a highly coupled amorphous blob.

My grug brain conclusion: Grug see good microservice in many valley. Grug see grug tribe carry good microservice home and roast on spit. Grug taste good microservice, many time. Shaman tell of good monolith in vision. Grug also dream of good monolith. Maybe grug taste good monolith after die. Grug go hunt good microservice now.

replies(4): >>44305340 #>>44305660 #>>44307196 #>>44312789 #
3. demosthanos ◴[] No.44305050[source]
> To these people, if it's not exposed as a API call it's just some opaque blob of code that cannot be understood or reused.

I think this is correct as an explanation for the phenomenon, but it's not just a false perception on their part: for a lot of organizations it is actually true that the only way to preserve boundaries between systems over the course of years is to stick the network in between. Without a network layer enforcing module boundaries code does, in fact, tend to morph into a big ball of mud.

I blame a few things for this:

1. Developers almost universally lack discipline.

2. Most programming languages are not designed to sufficiently account for #1.

It's not a coincidence that microservices became popular shortly after Node.js and Python became the dominant web backend languages. A strong static type system is generally necessary (but not sufficient) to create clear boundaries between modules, and both Python and JavaScript have historically been even worse than usual for dynamic languages when it comes to having a strong modularity story.

And while Python and JS have it worse than most, even most of our popular static languages are pretty lousy at giving developers the tools needed to clearly delineate module boundaries. Rust has a pretty decent starting point but it too could stand to be improved.

replies(3): >>44305396 #>>44307207 #>>44307716 #
4. stavros ◴[] No.44305340[source]
We've solved this problem by making the modules in the monolith only able to call each other from well-defined APIs, otherwise CI fails.
replies(2): >>44305435 #>>44309744 #
5. giantrobot ◴[] No.44305396[source]
3. Company structure poorly supports cross-team or department code ownership

Many companies don't seem to do a good job coordinating between teams. Different teams have different incentives and priorities. If group A needs fixes/work from group B and B has been given some other priority, group A is stuck.

By putting a network between modules different groups can limit blast damage from other teams' modules and more clearly show ownership when things go wrong. If group A's project fails because of B's module it still looks like A's code has the problem.

Upper management rarely cares about nuance. They want to assign blame, especially if it's in another team or department. So teams under them always want clear boundaries of responsibility so they don't get thrown under the bus.

The root cause of a lot of software problems is the organization that produces it more than any individual or even team working on it.

replies(1): >>44310305 #
6. PaulHoule ◴[] No.44305435{3}[source]
In the Java world both Spring and Guice are meant to do this, and if you have an ISomething you've got the possibility of making an ILocalSomething and a IDistributedSomething and swap one for the other.
replies(1): >>44305671 #
7. pbh101 ◴[] No.44305660[source]
Maybe the friction imposed to mess up the well-factored microservice arch is sufficiently marginally higher than in the monolith that the perception of value in the factoring is higher, whereas the implicit expectation of factoring the monolith is that you’ll look away for five seconds and someone will ruin it.
8. pbh101 ◴[] No.44305671{4}[source]
This is generally a bad idea imo. You fundamentally will have a hard time if your api is opaquely network-dependent or not. I suppose, you’ll be ok if you assume there is a network call, but that means your client will need to pay that cost every time, even if using the ILocal.
replies(1): >>44305772 #
9. PaulHoule ◴[] No.44305772{5}[source]
It depends on what the API is. For instance you might use something like JDBC or SQLAlchemy to access either a sqlite database or a postgres database.

But you are right that the remote procedure call is a fraught concept for more reasons than one. On one hand there is the fundamental difference between a local procedure call that takes a few ns and a remote call which might take 1,000,000 longer. There's also the fact that most RPC mechanisms that call themselves RPC mechanisms are terribly complicated, like DCOM or the old Sun RPC. In some sense RPC became mainstream once people started pretending it was REST. People say it is not RPC but often you have a function in your front end Javascript like fetch_data(75) and that becomes GET /data/75 and your back end JAXB looks like

    @GET
    @Path("/{id}")
    public List<Data> fetchData(@PathParam("id") int id) { ... }
10. vharish ◴[] No.44307196[source]
I think monoliths are not such a good idea anymore. Particularly with the direction development is going w.r.t the usage of LLMs, I think it's best to break things down. Ofcourse, it shouldn't be overdone.
replies(1): >>44307306 #
11. arkh ◴[] No.44307207[source]
> Developers almost universally lack discipline.

Or developers are given a deadline and no slack to learn the code base. So developers will tactically take the fastest route to closing their ticket.

replies(1): >>44307231 #
12. rkomorn ◴[] No.44307231{3}[source]
This. You'll take "too long", you'll be told you're overthinking/overengineering, people will preach iterating, that done is better than perfect, etc.

It's not developers that lack discipline. It's CTOs, VPs, etc.

13. discreteevent ◴[] No.44307306{3}[source]
> grug wonder why big brain take hardest problem, factoring system correctly, and introduce network call too

> I think it's best to break things down

Factoring system = break things down.

14. cjfd ◴[] No.44307611[source]
Well, if people are really that stupid maybe they should just not be developers.
15. zelphirkalt ◴[] No.44307716[source]
I think languages without proper support for modules are worse off than Python. Python actually has pretty good support for modules and defining their boundaries (via __init__.py).
16. isoprophlex ◴[] No.44308060[source]
I swear I'm not making this up; a guy at my current client needed to join two CSV files. A one off thing for some business request. He wrote a REST api in Java, where you get the merged csv after POSTing your inputs.

I must scream but I'm in a vacuum. Everyone is fine with this.

(Also it takes a few seconds to process a 500 line test file and runs for ten minutes on the real 20k line input.)

replies(4): >>44308237 #>>44308645 #>>44309926 #>>44309942 #
17. withinboredom ◴[] No.44308237[source]
I mean, it would be faster to just import them into an in-memory sqlite database, run a `union all` query and then dump it to a csv...

That's still probably the wrong way to do it, but 10 minutes for a 20k line file? That seems like poor engineering in the most basic sense.

replies(3): >>44308362 #>>44308551 #>>44397638 #
18. strken ◴[] No.44308362{3}[source]
I'd probably think of xsv, go to its github repo, remember it's unmaintained and got replaced by qsv, and then use qsv.
19. isoprophlex ◴[] No.44308551{3}[source]
It's a twenty line bash script. Pipe some shit into sqlite, done.

But the guy 'is known to get the job done' apparently.

replies(1): >>44308836 #
20. cfiggers ◴[] No.44308645[source]
I'm really dumb, genuinely asking the question—when people do such things, where are they generally running the actual code? Would it be in a VM on generally available infra that their company provides...? Or like... On a spare laptop under their desk? I have use cases for similar things (more valid use cases than this one, at least my smooth brain likes to think) but I literally don't know how to deploy it once it's written. I've never been shown or done it before.
replies(1): >>44309179 #
21. bee_rider ◴[] No.44308836{4}[source]
Maybe he’s recognized something brilliant. Management doesn’t know that the program he wrote was just a reimplementation of the Unix “cut” and “paste” commands, so he might as well get rewarded for their ignorance.

And to be fair, if folks didn’t get paid for reinventing basic Unix utilities with extra steps, the economy would probably collapse.

replies(1): >>44309238 #
22. marifjeren ◴[] No.44309179{3}[source]
Typically you run both the client program and the server program on your computer during development. Even though they're running on the same machine they can talk with one another using http as if they were both on the world wide web.

Then you deploy the server program, and then you deploy the client program, to another machine, or machines, where they continue to talk to one another over http, maybe over the public Internet or maybe not.

Deploying can mean any one of umpteen possible things. In general, you (use automations that) copy your programs over to dedicated machines that then run your programs.

23. isoprophlex ◴[] No.44309238{5}[source]
Clearly I'm the dumbass in this story, as we're all paid by the hour...
replies(1): >>44309265 #
24. bee_rider ◴[] No.44309265{6}[source]
Clearly! He’s found a magic portal to the good old days when the fruit was all low hanging, and you keep showing up with a ladder.
25. williamdclt ◴[] No.44309744{3}[source]
I honestly think it's the only way outside of one-person projects (and even then...), you need _some_ design pressure.
26. fredrikholm ◴[] No.44309926[source]
The worst part of stories like this is how much potential there is in gaslighting you, the negative person, on just how professional and wonderful this solution is:

  * Information hiding by exposing a closed interface via the API
  * Isolated, scalable, fault tolerant service
  * Iterable, understandable and super agile
You should be a team player isophrophlex, but its ok, I didn't understand these things either at some point. Here, you can borrow my copy of Clean Code, I suggest you give it a read, I'm sure you'll find it helpful.
27. pbohun ◴[] No.44309942[source]
Was it joining on some columns or just concatenating the files?

I'm going to laugh pretty hard if it could just be done with: cat file1.csv file2.csv > combined.csv

replies(2): >>44310462 #>>44397619 #
28. patrickmay ◴[] No.44310305{3}[source]
[O]rganizations which design systems (in the broad sense used here) are constrained to produce designs which are copies of the communication structures of these organizations.

— Melvin E. Conway, How Do Committees Invent?

29. Xenoamorphous ◴[] No.44310462{3}[source]
You need to account for the headers, which many (most?) csv files I've encountered have.

So I guess something like this to skip the headers in the second file (this also assumes that headers don't have line breaks):

  cp file1.csv combined.csv && tail -n+2 file2.csv >> combined.csv
30. 9rx ◴[] No.44310571[source]
To be fair, microservices is about breaking people down into smaller parts, with the idea of mirroring services found in the macro economy, but within the microcosm of a single business. In other words, a business is broken down into different teams that operate in isolation from each other, just as individual businesses do at the macro scale. Any technical outcomes from that are merely a result of Conway's Law.
31. manmal ◴[] No.44312789[source]
Put the modules in different git repos and interfaces will get super clean eventually.
32. xnx ◴[] No.44397619{3}[source]
There are also a lot of command line options for joining by column like csvkit
33. xnx ◴[] No.44397638{3}[source]
csvkit and duckdb would also be good options. Any llm will spit out a one-liner for any type of join you can describe.