yup
yup
So, I mean, good luck.
Why not use type hints in python? Isn't that a good enough substitute?
I wonder why go instead of rust if he wanted static typing, long term ease of maintanence and performance. Go's type system is not great especially for something like graphql. Gqlgen relies heavily on code generation. Last time I used it, I ran into so many issues. I ditched go together after several painful clashes with it that community always responded with: oh you don't need this.
(yeah except they implemented all those parts in arguably worse ways and ditched community solutions in the next few years)
One major benefit the GP fails to mention is that with graphql, it is easy to generate types for frontend. This makes your frontend far more sane. It's also way easier to test graphql since there are tools to automatically generate queries for performance testing unlike rest.
There is no need to add something for docs or interactivity like swagger.
>Today, the Python backends to the web services communicate directly with PostgreSQL via SQLAlchemy, but it is my intention to build out experimental replacement backends which are routed through GraphQL instead. This way, the much more performant and robust GraphQL backends become the single source of truth for all information in SourceHut.
I wonder how adding a layer of indirection can significantly improve performance. If I were writing this service, I would go all in on GraphQL and have the frontend talk to the GraphQL services directly rather than routing the requests from Python through to a GraphQL service then presumably to PostgreSQL.
Perhaps I am missing something. Indeed good luck to Drew here.
By GP (grandparent?) do you mean the article / blog post?
Because if so I see no indication that Drew plans to adopt a SPA architecture -- he seems intent on continuing to use server side rendering with little javascript, which would make "frontend types" sort of irrelevant.
The power in GraphQL comes from the graph and flexibility in fetching what you need. Usefull in general purpose APIs (like GitHub has).
You can of course do this with other standards but ime, it's easier to do this with graphql since you only have to build the api. There is less overhead overall since type information is part of the standard, not necessarily something people add afterwards or choose to. Introspection, graphiql and all the tooling is easier to use and doesn't need integrating something like swagger.
It comes setup by default on most solid graphql frameworks.
If you can't make that combination work well, there's another place to look for problems besides your tool kit. You might need to ask yourself if you really understand the tools you're trying to use.
But like I said, this has always been a very cool project. My "good luck" was meant more as actual good luck than a Morgan Freeman You're-trying-to-blackmail-batman kind of good luck.
A lot of the things thank make Python great for small projects, really bite you on a large or long-lived project. For me, the two biggest are lack of types and indentation for scoping. It is really easy to mess up white space during an edit or refactor. In many languages you would just reformat. In python, you have to be careful that a statement suddenly did not end up outside or inside an if block.
He has written a blog post about how he chooses programming languages as well https://drewdevault.com/2019/09/08/Enough-to-decide.html
> The system would become more stable with the benefit of static typing, and more scalable with a faster and lighter-weight implementation.
OK static typing I'm with you so far. Faster and lighter weight? I'm not so sure, sounds like you're having troubles with Flask and SQLAlchemy completely unrelated to REST. All production graphQL implementations I've seen are very heavy when they add in authentication and more advanced query capabilities. Is this REALLY so superior to REST?
> Another (potential) advantage of GraphQL is the ability to compose many different APIs into a single, federated GraphQL schema.
I guess the discoverability of GraphQL is better, but 90% of APIs on the internet prove that large REST APIs are very effective and achieve the same thing.
> I also mentioned earlier that I am unsatisfied with the Python/Flask/SQLAlchemy design that underlies most of SourceHut’s implementation. The performance characteristics of this design are rather poor, and I have limited options for improvement. The reliability is also not something I am especially confident in.
This is where you completely lose me. It's fine if you hate ORMs, its fine if you hate the SQLAlchemy API, but you're blaming your hammer for the fact that you think you built a shoddy house. Going out and buying a hammer won't fix the fact that you're lining up your nails all wrong.
> The GraphQL services are completely standalone, and it is possible to deploy them independently of the web application...
> ...it is my intention to build out experimental replacement backends which are routed through GraphQL instead.
I think these two captions go together, are you describing microservices? This can be achieved just fine with REST using simple load balancing strategies based on url routing or similar.
> Almost all of SourceHut today is built with Python, Flask, and SQLAlchemy, which is great for quickly building a working prototype. This has been an effective approach to building a “good” service and understanding the constraints of the problem space. However, it’s become clear to me that this approach isn’t going to cut it in the long term, where the goal is not just “good”, but “excellent”
This is a classic example of using a handful of annoying issues to justify an exciting large re-write that doesn't actually address the main issues you are having. If you are struggling with the SQLAlchemy library you will find alternative (and perhaps larger) struggles in all GraphQL implementations. Best of luck, this is a road I would not follow you on. Seriously though I wish you the best and hope your product succeeds despite this.
The constant use of generated code is another real pain point, particularly when I am writing business logic that needs to operate on generated types for which there is no interface available that does what I need (and why would there be? how would the library/codegen-tool author know all the permutations of business logic that might be out there?).
The sql library has some fairly annoying documentation, e.g.
> Scan implements the Scanner interface.
https://golang.org/pkg/database/sql/#NullBool.Scan
> Scanner is an interface used by Scan.
https://golang.org/pkg/database/sql/#Scanner
There is only really one concrete example of how to use Scan on the godoc page.
The introspection capabilities (reflect package) are quite obtuse and error prone (basically requiring 100% test coverage because you lose nearly the entire type system), yet absolutely critical to implementing anything notably complex with Go.
Not sure what you mean here. Is there a particular codegen tool you found lacking?
>The introspection capabilities (reflect package) are quite obtuse and error prone (basically requiring 100% test coverage because you lose nearly the entire type system), yet absolutely critical to implementing anything notably complex with Go.
Ah, the lack of generics. I haven't really written any particularly large projects, where have you had to use reflection when working with go in your projects?
I hope my questions don't come across as dismissive. I think the typical go response to your complaints are "it wont come up" (a YAGNI variant), so I'm always interested to hear about the cases where that argument fails.
But claiming some nebulous backend that's more performant and robust than Postgres is like, WTF? Are you using an actual GraphDB like Neo4J? Are you putting a graph frontend on Postgres like PostGraphQL? None of the post really makes any sense because GraphQL is a Query Language, not a data store. What are the CAP theorem tradeoffs in the new backend? What does more robust mean? What does more performant mean? This is a source control app. Those tradeoffs are meaningful.
There seems to be a lot of conflation between API design and data store and core programming tools all mixed into a big post that mostly sounds to me like, "I don't get how to make this (extremely popular and well-known platform that drives many websites 10000x my size) work well, so I'm trying something different that sounds cool."
Which, again, the author has always said this is an experiment, and that's cool. But the conceptual confusion in the post makes me think that moving away from boring tools and trying new tools is not going to end up going well.
But this is a source control app, and it's hopefully backed up somewhere besides sourcehut so it should be fine if he needs to backtrack.
It seems like exactly the ORM solution/problem but even more abstract and less under control since it pushes the orm out to browser clients and the frontend devs.
ORM suffer from being at beyond arms length from the query analyzer in the database server.
https://en.wikipedia.org/wiki/Query_optimization
A query optimizer that's been tuned over decades by pretty serious people.
Bad queries, overfetching, sudden performance cliffs everywhere.
Graphql actually adds another query language on top of the normal orm problem. (Maybe the answer is that graphql is so simple by design that it has no dark corners but that seems like a matter of mathematical proof that I haven't seen alluded to).
Why is graphql not going to have exactly this problem as we see people actually start to work seriously with it?
Four or five implementations in javascript, haskell and now go. From what I could see none of them were mentioning query optimization as an aspiration.
You need to have data loader (batching) on the backend to avoid n+1 queries and some other similar stuff with cache to improve the performance.
You also have cache and batching on the frontend usually. Apollo client (most popular graphql client in js) uses a normalized caching strategy (overkill and a pain).
For rate/abuse limiting, graphql requires a completely different approach. It's either point based on the numbers of nodes or edges you request so you can calculate the burden of the query before you execute it or deep introspection to avoid crashing your database. Query white listing is another option.
There are few other pain points you need to implement when you scale up. So yeah defo not needed if it's only a small project.
Thanks Drew and others for SourceHut.
My impression is GraphQL starts to shine when you have multiple backend systems, probably separated based on your org chart, and the frontend team needs to stitch them together for cohesive UX. The benchmark isn't absolute performance here, it's whether it performs better than the poor mobile app making a dozen separate API calls to different backends to stitch together a view.
https://mypy.readthedocs.io/en/stable/kinds_of_types.html#un...
https://mypy.readthedocs.io/en/stable/kinds_of_types.html#op...
I wouldn’t expect the performance issues to be much more problematic than they would be for REST endpoints that offer similar functionality. If you’re offering a public API, then either way you’re going to need to solve for clients who are requesting too many expensive resources. If you control the client and the server, then you probably don’t need to worry about it beyond the testing of your client code you would need to do anyway.
As far as query optimization goes, that’s largely out of scope of GraphQL itself, although many server implementations offer interesting ways to fulfill GraphQL queries. Dataloader is neat, and beyond that, I believe you can do any inspection of the query request you want, so you could for example see the nested path “Publisher -> Book -> Author -> name” and decide to join all three of those tables together. I’m not aware of any tools that provide this optimization automatically, but it’s not difficult to imagine it existing for some ORMs like those in Django or Rails.
These are largely matter of architecture design and graphql doesn't really fix those problems (my sense is it will make those problems harder actually).
pypi.org is another that's familiar. You know, every time you type `pip install x` yeah, that's pypi.
Although I think those are both powered mainly by Pyramid rather than flask. Still, same concept.
As others mention, large parts of google and youtube are still python. Dropbox was so invested in python that they employed Guido van Rossum for a while. Instagram, a lot of Yahoo! back when they were a thing, Spotify, Quora, Pinterest, Hipmunk, Disqus, and this really obscure satire site called The Onion that totally never gets any traffic at all.
All of them powered by python at their core, many of them Django, some Pyramid, and some Flask.
Yes, getting that big does require big teams. Becoming one of the top 100 or so sites on the internet always requires some special sauce as well as dedicated teams. But most of these companies started with Python and a framework and got to massive web scale along the way and never changed the core platform because there really wasn't a need. Handling scale isn't about your core language or framework. It's about dozens of other things that you can offload to other things if you're smart. But let's be real: sourcehut isn't close to any of that level of traffic.
My negativity on this isn't about stanning a particular language. I'm an agnostic in multiple ways. I'll use whatever tool seems like the best fit. I'm down on this because the explanation is tool-blaming, murky, unclear, and doesn't provide a lot of the detail I would want to have if I were depending on this service.
On the other hand, the guy has always said this is an alpha project and you should expect major changes. That's all fine. It's just weird to me to see a "why I'm changing from X to Y" post that doesn't really explain anything other than "I might be bad at this."
Edit: I remember now that the Apollo team is made up of members of the former Meteor team which worked in a similar way using a client side database.
It's attractive primarily to frontend developers. Instead of juggling various APIs (oftne poorly designed or underdesigned due to conflicting requirements and time constraints) you have a single entry into the system with almost any view of the data you want.
Almost no one ever talks about what a nightmare it becomes on the server-side, and how inane the implementations are. And how you have to re-do so many things from scratch, inefficiently, because you really have no control of the queries coming into the system.
My takeaway from GraphQL so far has been:
- good for frontend
- usable only for internal projects where you have full control of who has access to your system, and can't bring it down because you forgot an authorisation on a field somewhere or a protection against unlimited nested queries.
I have seen too many front end developers write queries equivalent to "select * from users, t1, t2, ... tn, left outer join on users.id = t1.user_id... etc etc etc".
I think that is just bullshit.
I prefer to tell the frontend people - these are the endpoints that backend team provides, go live with it. If you really have issues, talk to them, and they might just add a new endpoint for you.
If you let frontend developers dictate backend queries, all you will get is a big master cluster fuck. I am talking average joe developers that we usually find to work on startups.
Whatever you do, don't even think that GraphQL will solve your problems. You were on the right track staying away from it till now.
I can't also advise enough to stay away from a typed language (Go in this case) serving data in a different typed language (gql). You will eventually be pulling your hair out jumping through hoops matching types.
After my last web project that require gql and go, I did some digging around, thinking, there has to be a better alternative to this. I have worked with jQuery, React, GraphQL.
My conclusion was that next time I will stick to turbolinks (https://github.com/turbolinks/turbolinks) and try stimulus (https://stimulusjs.org/).
And if, in fact, you are storing a graph in a graph database, the QL makes a bit of sense.
But nothing in the post makes any sense out of any of that. It's just Python bad; REST bad; I read too much hacker news, and I feel like it's time for a change.
Like, when I complain about other people's REST APIs, that's out of my control. This guy is saying that his API is garbage, and instead of fixing it to make it better, he's just going to redo everything with a worse result. I don't get it.
Typed languages are great for systems development, and I think, not so good for writing web applications.
I also think, Ruby, Python, JS have dominated web dev world largely cause they don't come in the way of the developer having to constantly convert HTML (untyped) and JS (untyped)into types for the backend programming language.
Remember how ActiveRecord (not sure about SQLAlchemy) simply took away the pain of managing types? You didn't have to explicitly parse or cast "0.001" into 0.01.
I'm not sure if it is the implementation - and it could very well be - but there has been more overhead and complexities than with traditionally accessed REST APIs. I can't see much value-add.
This becomes a lot more apparent when you start to include TS in the mix.
Perhaps it just wasn't a good use case.
1. Protobufs use integer IDs for fields. GraphQL uses string names. IMHO this is a clear win for protobufs. Changing the name of a field of GraphQL is essentially impossible. Once a name is there it's there forever (eg mobile client versions are out there forever) so you're going to have to return null from it and create a new one. In protobufs, the name you see in code is nothing more than the client's bindings. Get a copy of the .proto file, change a name (but not the ID number) and recompile and everything will work. The wire format is the same;
2. People who talk about auto-generating GraphQL wrappers for Postgres database schemas (not the author of this post, to be clear, but it's common enough) are missing the point entirely. The whole point of GraphQL is to span heterogeneous and independent data sources;
3. Protobuf's notions of required vs optional fields was a design mistake that's now impossible to rectify without breaking changes. Maybe protobuf v3/gRPC did this. I'm honestly not sure.
4. Protobuf is just a wire format plus a way of generating language bindings for it. There are RPC extensions for this (Stubby internally at Google; gRPC externally and no they're not the same thing). GraphQL is a query language. I do think it's better than protobufs in this regard;
5. GraphQL fragments are one of these things that are probably a net positive but they aren't as good as they might appear. You will find in any large codebase that there are key fragments that if you change in any way you'll generate a massive recompile across hundreds or thousands of callsites. And if just one caller uses one of the fields in that fragment, you can't remove it;
6. GraphQL does kind of support union types (eg foo as Bar1, foo as Bar2) but it's awkward and my understanding is the mobile code generated is... less than ideal. Still, it's better than not having it. The protobuf equivalent is to have many optional submessages and there's no way to express that only one of them will be popualated;
7. Under the hood I believe the GraphQL query is stored on the server and identified by ID but the bindings for it are baked into the client. Perhaps this is just how FB uses it? It always struck me as somewhat awkward. Perhaps certain GraphQL queries are particularly large? I never bothered to look into the reason for this but given that the bindings are baked into the code it doesn't seem to gain you much;
8. GraphQL usage in Facebook is pervasive and it has first class support in iOS, Android and React. This is in stark contrast to protobufs where protobuf v2 in Google is probably there forever and protobuf v3/gRPC is largely for the outsiders. It's been several years now since I worked at Google but I would be shocked if this had changed or there was even an intention of changing it at this point;
9. The fact that you can do a GraphQL mutation and declare what fields are returned is, IMHO, very nice. It saves really awkward create/update then re-query hops.
10. This is probably a problem only for Google internally but another factor on top of protobuf version was the API version. Build artifacts were declared with this API version, which was actually a huge headache if you wanted to bring in dependencies, some of which were Java APIv1 and others Java APIv2. I don't really understand why you had to make this kind of decision in creating build artifacts. Again, maybe this has improved. I would be surprised however.
Lastly, as for Sourcehut, I had a look at their home page. I'm honestly still not exactly sure what they are or what value they create. There are 3 pricing plans that provide access to all features so I'd have to dig in to find the difference (hint: I didn't). So it's hard for me to say if GraphQL is an appropriate choice for them. At least their pages loaded fast. That's a good sign.
I think the biggest problem with GraphQL is the JavaScript ecosystem around it, and all of its implicit context. It seems to be built entirely on specific servers and clients, instead of on the general concepts.
Relay[1], a popular client-side library, adds all kinds of requirements in addition to the use of GraphQL. One of those is that until version 8, it required all mutation inputs and outputs to contain a "clientMutationId", which had to be round-tripped. It was an obvious hack for some client-side problem which added requirements to the backend. Somehow it had a specification written for it instead of being fixed before release. This hack is now in public APIs, like every single mutation in the GitHub API v4.
GraphQL also includes "subscriptions", which are described incredibly vaguely and frankly underspecified. There are all kinds of libraries and frameworks that "support subscriptions", but in practice they mean they just support the websocket transport[2] created by Apollo GraphQL.
If you just use it as a way to implement a well-structured API, and use the simplest tools possible to get you there, it's a pleasure to work with.
[1]: https://relay.dev/
[2]: https://github.com/apollographql/subscriptions-transport-ws
Query chaining/batching and specifying a sub-selection of response data seem like solid features.
The graph schema seems to make good on some of the HATEOS promises.
I like the idea of GraphQL but the downsides have me worried.
> But even then there's still some value in letting the client specify the shape of the the data it needs and having client SDKs
It may not exactly "shine" in those cases, but it reduces round trips and makes it easy for fronted engineers to make views that revolve around use cases instead of the resources in the database.
It’s been a 10x+ improvement on Flask, in my experience.
Maybe GraphQL adopters aren't sharing their experiences with it in production because they're realizing its faults? People are quick to announce successes and very reluctant to own, let alone share, costly mistakes. Also, people change jobs so often that those who influence a roll-out won't even be around long enough for the post-mortem. GraphQL publicity is consequently positively biased. If the HN community were to follow up with posters who announced their use of GraphQL the last two years, maybe we can find out how things are going?
And here I thought Basecamp was still 100% rails. Interesting to see that they're also developing backend JS frameworks.
I don't think they are missing a point; rather, they have a completely different point: eschew a backend and use GraphQL on a DB + a frontend that gets all that data. If you're developing rapidly and don't have complex backend logic, I can see why you'd want to do that.
Looking through some of the code for Sourcehut, there’s an insane amount of boilerplate or otherwise redundant code[1]. The shared code library is a mini-framework, with custom email and validation components[2][3]. In the ‘main’ project we can see the views that power mailing lists and projects[4][5].
I’m totally biased, but I can’t help but think “why Flask, and why not Django” after seeing all of this. Most of the repeated view boilerplate would have gone ([1] could be like 20 lines), the author could have used Django rest framework to get a quality API with not much work (rather than building it yourself[6]) and the pluggable apps at the core of Django seem a perfect fit.
I see this all the time with flasks projects. They start off small and light, and as long as they stay that way then Flask is a great choice. But they often don’t, and as the grow in complexity you end up re-inventing a framework like Django but worse whilst getting fatigued by “Python” being bad.
1. https://git.sr.ht/~sircmpwn/paste.sr.ht/tree/master/pastesrh...
2. https://git.sr.ht/~sircmpwn/core.sr.ht/tree/master/srht/emai...
3. https://git.sr.ht/~sircmpwn/core.sr.ht/tree/master/srht/vali...
4. https://git.sr.ht/~sircmpwn/hub.sr.ht/tree/master/hubsrht/bl...
5. https://git.sr.ht/~sircmpwn/hub.sr.ht/tree/master/hubsrht/bl...
6. https://git.sr.ht/~sircmpwn/paste.sr.ht/tree/master/pastesrh...
Other users of my API [1] just use straight HTTP with JSON as well. GraphQL clients seem to solve something we are not encountering. If urql [2] or gqless [3] work well when I try them I'd be up for changing my mind though.
[1]: https://github.com/kitspace/partinfo
[2]: https://github.com/FormidableLabs/urql
[3]: https://gqless.dev/
My current favorite way of building APIs is this Frankenstein's monster of Django/FastAPI, which actually works quite well so far:
https://www.stavros.io/posts/fastapi-with-django/
FastAPI is a much better way of writing APIs than DRF, I wish it were a Django library, but hopefully compatibility will improve as Django adds async support.
Can you show me a comparable codebase in django and how it looks? I'm genuinely curious how people deal with edge cases.
I quite like Python's type system, it's not very mature yet but it's definitely good enough to already catch a lot of bugs.
And when you are explicit about how you want to implement joins etc, you pretty much have to hand code the join anyway, so I don't see the point.
In almost all use cases that I've come across, a standard HTTP endpoint with properly selected parameters works just as well as a GraphQL endpoint, without the overhead of parsing/dealing with GraphQL.
Without it or a similar system frontend developers have to ask backend developers to create or modify an API endpoint every time the website is redesigned.
Also, it allows to combine data fetching for components and subcomponents automatically without having to do that manually in backend code, and automatically supports fine-grained caching of items.
I'm working on a REST code generator (generates a Go backend and a typescript/react frontend) that reads your postgres/MySQL schema and some additional metadata you provide (should auth be enabled? Which table is the users table and which columns are username and password stored as bcrypt). I'm still working on authorization part but basically optional per-endpoint logic DSL for simple stuff and optional Go hooks for more complex stuff.
This reminds me of Kubernetes' design. You have an API server which is practially the Kubernetes from user's perspective. `kubectl` is just one out of possibily many clients that talk to this API.
Edit: typos.
Python's single-threaded design makes it difficult to be responsive to small queries quickly while simultaneously serving large, time-consuming queries (i.e. git operations). You can get around this using worker queues to separate interpreter processes and an async design, or otherwise splitting your workload up... or you can use a language where "have a threadpool" is actually a properly supported concept, and an architecture where sharding git/email/etc backends is feasible.
It clearly looks questionable adaption for a single organization.
I agree with you that writing optimal queries or data modeling should not be shifted over to the frontend. With that said, there are basic aspects of this equation that can be factored out. There should be a layer where frontend can at the very least pick or mix the data they need. That doesn’t necessarily mean Graphql, but could also mean a simple middle stack layer where one can do this basic thing without being able to shoot themselves in the foot.
There’s a responsible way to do this, and most likely will be an ongoing discussion.
Is that really a big deal with HTTP2 and pipelining?
I can also imagine situations where it results in a better UX to make multiple small calls, rather than one big one, as you'll have something to render faster.
A major issue with pushing it to the frontend is that malicious clients can issue unexpected requests, putting strain on the database.
If the graphql query implementation doesn't allow that level of querying on the database, then it's not offering much more before you need to speak to the backend devs than a filterable rest endpoint.
This all came up years ago with OData.
I'm not even sure about this part.
I worked on a project recently where the mobile front end team hated working with GraphQL. They were far more used to working with REST/HTTP APIs, and in this particular project they only communicated with a single backend.
The team saw it as extra layers of complexity for no benefit.
The GraphQL backend was responsible for pulling together data from several upstream systems and providing it to clients. But the architect never was able to convince me of a single benefit here compared to REST.
But I think it could be even better, and this work will help. It will make it easier to write performant code without explicitly hand-optimizing everything.
There are more important reasons to consider GraphQL than performance, which I cover in detail in TFA.
Performance is also just one of many reasons why this approach is being considered.
I evaluated GraphQL twice before, and discarded it for many of the reasons brought up here. Even this time around, I give a rather lackluster review of it and mention that there are still many caveats. It's not magic, and I'm not entirely in love with it - but it's better than REST.
Query optimization, scalability, authentication, and many other issues raised here were part of the research effort and I would not have moved forward with it if I did not feel that they were adequately addressed.
Before assuming some limitation you have had with it in the past applies to sr.ht, I would recommend reading through the code:
https://git.sr.ht/~sircmpwn/git.sr.ht/tree/master/api
https://git.sr.ht/~sircmpwn/gql.sr.ht
If you're curious for a more detailed run-down of my problems with REST, Python, Flask, and SQLAlchemy, I answered similar comments last night on the Lobsters thread:
https://lobste.rs/s/me5emr/how_why_graphql_will_influence_so...
I would also like to point out that the last time I thought a rewrite was in order, we got wlroots, which is now the most successful project in its class.
Cheers.
I still want to rip out SQLAlchemy ORM and replace it with pure SQL via `asyncpg`, as SQLAlchemy ORM is not async and that causes a bunch of extra switching in the backend that certainly doesn't help eek out more perf, but at the moment it's a bit too much effort and users are happy.
Scaling is handled by just throwing more instances of the application at the problem, behind a load-balancer.
Really I just look at GraphQL as a nice RPC framework. The graph theory operations like field level resolvers are mostly useless. But if you treat each relationship as a node rather than each field, you can get it to work very nicely with a normalized data set. I haven’t found it hard to preserve join efficiency in the backend either, and it so far hasn’t forced me into redundant query operations.
Just as long as you don’t use appsync. Really, don’t even bother.
A REST endpoint on the other hand is fairly simple and understood; there's (mostly) a static set SQL queries behind it and as long as those are not returning any unwanted data you are pretty much guaranteed to not expose something you didn't want to.
> Graphql solves for this by being able to fetch the book detail and the joined author name with one round trip
I don't see why we need GraphQL to solve this though - a REST backend could have an endpoint that returns the exact same data.
I can see how GraphQL might be somewhat nice for front end developers when the data to be displayed hasn't been nailed down yet - maybe we decide to show the book's ISBN number, and we can do that without changing the backend. Maybe this justifies the extra complexity for some teams, but I'd personally much prefer the simplicity of a REST API, to which you can always add OData on top if you really want.
Their justification for needing it is that the API team takes too long to implement changes, and endpoints never give them the data shape they need.
The silent reason is that server-side code, databases, and security are a big scary unknown they are too lazy to learn.
A big project cannot afford to ask for high standards from frontenders. You need a hoard of cheap labor to crank out semi-disposable UIs.
If anyone else can share experiences of this sort of problems and solution, I'd be really interested to hear it. I've written non-GQL APIs before that back onto other internal and external services; what am I missing?
As someone who is building a public facing GraphQL API, I would disagree with this. Directives make it easy to add policies to types and fields in the schema itself, making it amenable to easy review.
A restful API also has the problem that if you want fine grained auth, you'll need to remember to add the policy to each controller or endpoint, so not that different.
The typed nature of GraphQL offers a way of extending and enriching behavior of your API in a very neat, cross cutting way.
For example we recently built a filtering system that introspected over collection types at startup to generate filter input types. We then built middleware that converted filter inputs into query plans for evaluation.
I previously worked at another company that offers a public REST API for public transport. Public transport info is quite a rich interconnected data set. Despite efforts to ensure that filtering was fairly generic, there was a lot of adhoc code that needed to be written to handle filtering. The code grew exponentially more complex as more filters were added. Maybe this system could have been architected in a better way, but the nature of REST doesn't make that exactly easy to do.
Bottom line is that I feel for public APIs, that there is a lot of demand for flexibility, and eventually a public facing RESTful API will grow to match or even exceed that of a GraphQL API in complexity.
So yeah, still seeing good speedups in our own benchmarks even though most of our endpoints are sync.
What was arguably more important though was how much switching to ASGI helped with handling WebSockets. We're using SocketIO, and trying to get a fundamentally async protocol working within sync (Flask) land was a massive pain. We had repeated reliability and deployment issues that were very hard to debug. Switching to FastAPI made that much easier.
Carefully designed endpoints require a lot of back and forth between teams with very different skill set so to build them you need to plan in advice, but it does not work here where you literally have to move fast (and fail fast!).
You only have two options left: (1) you either ask frontenders (or UX-wise devs) to do the whole thing or (2) you build an abstraction layer that let frontenders query arbitrarily complex data structures with near-perfect performances (YMMV).
In case (1) you’re looking for REAL full-stacks, and it’s not that easy to find such talented developers. In case (2) well… that’s GraphQL.
https://hasura.io/docs/1.0/graphql/manual/remote-schemas/ind...
https://hasura.io/blog/remote-joins-a-graphql-api-to-join-da...
If you need to convert your API's into a GraphQL first, you can wrap the endpoints with resolvers yourself, or use automated tooling:
Actually stitching together multiple services or DBs with it manually seems like it'd be a hellish experience that'd end in a massive data breech or repeated accidental dataloss + restore-from-backup. Or else valid GraphQL going to such a system would be so restricted that the benefit over just using REST (or whatever) is zero.
XgeneCloud makes it really simple to add business logic for generated APIs (REST and GraphQL both) over any SQL databases.
We just launched this week [2]
[1] : https://github.com/xgenecloud/xgenecloud
[2] : https://news.ycombinator.com/item?id=23466782
Website : https://xgenecloud.com
(disclaimer: founder here)
For WebSockets, all of the code is async, so I'm already using `asyncpg` for any database stuff that is happening there.
With regards to why are the sync endpoints faster, I think it is a number of things, some of which are userland changes that could've been made under Flask, but all of which are somewhat related to the switch. With regards to things that FastAPI itself has changed, I think using a (de)serialization lib like Pydantic and serializing to JSON by default (which is what we were doing under Flask anyway, though with Marshmallow) makes a lot of the code paths in the underlying lib a bit faster, because with Flask there was more "magic" going on behind the scenes. For userland stuff, I think partly because there is less magic going in the background (I really like FastAPIs dependency injection system), it's made it easier to identify the bottlenecks and optimize hot code paths.
GraphQL isn't particularly "graphy". Its name sucks. But don't worry, plenty of half-techy middle managers are out there making the same mistake and going "we do graph things, why don't you guys look into this GraphQL thing that's getting so much buzz?" It's not a great fit for graph operations, in fact. Not more than SQL, certainly.
As for N4J in particular, don't count on that to improve performance even if you're doing lots of graph stuff. Depends heavily on your usage patterns and it's very easy to modify a query in a way that seems like it'd be fine but in fact makes performance fall off a cliff. OTOH Cypher, unlike GraphQL, is a very nice language for querying graphs.
[0] https://www.encode.io/databases/
[1] https://fastapi.tiangolo.com/advanced/async-sql-databases/
...and other lies we tell ourselves to sleep soundly at night.
But just like ORMs, they do work for the simple cases which tend to abound and you can hand-optimize the rest.
It's the power of the frontend teams to create a rest endpoint out of a schema.
When you combine it with typescript/JVM/Swift, you get AUTO typing for the graphql queries, you know exactly the data model you get back based on your query. It's quite lovely.
The other aspect is that on the apollo/graphql server you can utilize dataloader and streamline a nested request into as few calls to each service as possible.
And the last benefit over a rest service. If you had to make multiple calls, you're doing round trips from the CLIENT to the backend services. The graphql server is _already_ in your backend service network, so all the data combining is on the orders of <10ms versus <100ms (or much worse for mobile).
GraphQL has a major advantage over rest in that you can't just change the schema without the clients breaking, so you know that your API isn't going to magically just screw you. (Most places use versioning for this, but not always). You can get some of this with RPCs but it's not as robust as the graphql schema.
REST endpoints are usually way more blackbox.
You can't claim that REST is better cuz you can look at the server... when you could do the same thing to the graphql server.
Graphql will -never- return you unwanted data. Because you wrote in the query exactly what you want.
If you want to examine an endpoint and JUST what it returns, you can do so really easily with graphiql.
https://developer.github.com/v4/explorer/
Just enter the api and you get an auto complete list of all the data fields you have access to. Or just use the schema explorer and click through. 100x easier than going through a sql query and analyzing a table.
I honestly think the backend benefits are relatively marginal, but on the client being able to 1) mock your entire backend before it even exists, 2) have built in automatic caching on every query and entity for free, 3) use a fully declarative, easily testable and mockable React hook API built with out-of-the-box exposed support for loading and error states, is so valuable. Components written against Apollo feel so clean and simple. That's not to say anything about the benefits you get from being able to introspect the entire backend API or test queries against it with Apollo's extensions or how you can easily retain the entire entity cache across reloads with simple plugins to Apollo.
Can you do all that with REST? Sure. But writing queries against REST APIs is a pain in the butt compared to what you get for free by using Apollo.
In REST (and seemingly in gRPC), you define these siloed endpoints to return different types of data. If we're imagining a Twitter REST API, you might imagine Tweet and User endpoints. In the beginning it's simple - the Tweet endpoint returns the message, the user ID, and some metadata. You can query the User
Then Twitter continues to develop. Tweets get media attachments. They get retweets. They get accessibility captions. The Tweet endpoint expands. The amount of information required to display a User correctly expands. Do you inline it in the Tweet? Do you always require a separate request to the User, which is also growing?
As the service grows, you have this tension between reusability and concision. Most clients need only some of the data, but they all need different data. If my understanding of gRPC is correct, it would have this similar kind of tension: business objects that gain responsibilities will likely gain overhead with every new object that is added, since the clients have no way of signaling which ones they need or don't need.
In GraphQL, you define your object model separately from how it's queried. So you can define all of these things as separate business objects: a Tweet has a User, Users have Tweets, Tweets can link to Media which has accessibility caption fields, etc. Starting from any entry point to the graph, you can query the full transitive closure of the graph that you can reach from that entry point, but you don't pay for the full thing.
This relaxes the tension between reusability and concision. Each querier can request just the data that it needs, and assuming the backend supports caching and batching when appropriate, it can have some assurances that the backend implementation is only paying for what you use.
Can you elaborate on this?
I've barely used protobufs, but in thrift I've found optional fields to be very useful.
A typical product would require integrations with several existing APIs, and potentially some new ones. These would be aggregated (and normalised) into a single schema built on top of GraphQL. Then the team would build different client UIs and iterate on them.
By having a single queryable schema, it's very easy to build and rebuild interfaces as needed. Tools like Apollo and React are particularly well suited for this, as you can directly inject data into components. The team can also reason on the whole domain, rather than a collection of data sources (easier for trying out new things).
Of course, it would lead to performance issues, but why would you optimise something without validating it first with the user? Queries might be inefficient, but with just a bit of caching you can ensure acceptable user experience.
Maybe it's just that the backend devs in my last project werenyevery good, but the backend GraphQL code was ridiculously complex and impossible to reason about.
1. The biggest mistake GraphQL made was putting 'QL' in the name so people think it's a query language comparable to SQL. It's not: https://news.ycombinator.com/item?id=23120997
2. Some benefits of GraphQL over REST: https://news.ycombinator.com/item?id=23124862
A decent system will provide the hooks you need to hand optimize certain cases somehow, but There are always limitations and hoops to jump through and additional complexity to manage. The extra layers that are meant to make your life easier are getting in the way instead. (May or may not still be worth it, but the point is, it’s not a foregone conclusion.)
But couldn't you intentionally or unintentionally write a query such that it returns too much data and borks the system? Un-intentionally is the worrisome aspect.
Then they build their own abstractions. And then, congratulations, they’ve spent longer than they should have to end up with a worse version of Django, that nobody but them finds “clear” or “unambiguous”.
If only we could capture these common, repetitive and important patterns and put them in some kind of library. A “framework”, if you will. That way you don’t need to copy-paste this stuff over and over again, and anyone who knows the library will find it clear and unambiguous!
In fact this is such a good idea that I’m going to do it myself. I’ll call the library Franz, after a famous pianist.
This is dependent on the framework, just as it is with GraphQL - for example, with ASP.NET Core you can apply an auth policy as a default, or by convention.
> Despite efforts to ensure that filtering was fairly generic, there was a lot of adhoc code that needed to be written to handle filtering.
I've never seen this problem with REST backends myself, but I work with a typed language, C#. Again though, this is more of a framework thing than a REST/GraphQL paradigm thing.
You'd want code gen to easily wrap REST services.
You could get some of the pipeline query/subquery stuff back (and lose caching) by setting up a proxy running this service or fallback to client side aggregation to span services not backed by the graph system (and maybe keep caching).
Maybe we're back to SOAP and WSDLs, though.
It's inspired by Hasura, the schema is almost the same. It's not optimized at all, but it's a nice way to quickly get started with GraphQL and expose your existing models.
I've certainly seen timing issues between frontend and backed teams; actually, I don't think I've ever been on a project where that wasn't an issue!
But on my last project, which had a GraphQL backend, this was still a problem. The backend integrated with several upstream databases and REST APIs, so the backend team had to build the queries and implementations before the frontend could do anything. At least with REST they would have been able to mock out some JSON more easily.
To be fair, the same devs that built it using GraphQL would likely have made many of the same mistakes with a REST API, but I do feel it would at least have been easier to reason about the code.
query {
user {
name
email {
primary
secondary
}
posts {
body
karma
}
}
}
It would create an entire database schema with users, emails, and posts, and the correct indexes and fk relations to support the graphql query. It would also generate mutations for updating the entities in a relational, cascading manner, so deleting a user would also delete their email and posts.How much client state you maintain seems to me to be orthogonal to GraphQL/REST.
Take your example or a multiple-REST workflow. I presume your point was that the workflow could be implemented by a single GraphQL query/mutation/whatever - but just the same, you can put as much code and logic as you like behind a REST call?
Regarding managing state I don't see GraphQL helping with that at all.
GraphQL formalizes the contract between front and back end in a very readable and maintainable way, so they can evolve in parallel and reconcile changes in a predictable, structured place (the GraphQL schema and resolvers). And it allows the frontend, with Relay, to deal with data dependencies in a very elegant and performant way.
The gist is that while you'd hope to have a stable interface, in reality things change. By marking fields required, they basically have to stay that way forever so you can support older clients. In doing so, you create a brittle API that may not be able to evolve with changing requirements.
For this reason, I believe Google got rid of required fields in proto3 (everything's either implicitly optional, or a repeated field which may have 0 elements).
I actually think that unless your company is massive or has a lot of expertise in GraphQL already, using it for private APIs may not be the best idea, as it could be a sign of certain internal dysfunctions or communication problems within or between engineering teams.
----
An example, however of the kind of filtering I was referring to, and why I still think it would be non trivial to do, even in something like ASP.NET, is the following: https://www.gatsbyjs.org/docs/graphql-reference/#filter. This of course isn't something you get out the box in GraphQL either, but the structure of the system made this (relatively) easy to do.
Of course you could add something like OData to your REST API which would definitely be a valid alternative, but that also would have its own warts, and is subject to similar criticisms as GQL.
GraphQL has been around for years and people keep making this argument, but where are all the horror stories of unbounded queries being made and systems being hobbled? The argument is beginning to sound anemic.
What do you consider the downsides?
Most people want #1, Graphene is a bad choice because you still have to write a lot of boilerplate code. It has the added benefit that the current process is responsible for parsing the GraphQL query and directly calling the database, vs. using something like Prisma/Hasura which (may) require a separate process which in turn calls your database (so 2 network hops).
GraphQL was never intended to be an ORM replacement, but many have steered it towards that direction. It's not a bad thing, but it's still the same level of abstraction and confusion that people have wrestled with when using traditional ORM's except now you're introducing a different query API vs. native code/fluent interfaces/SQL.
The first was when they removed tracebacks. Singularly useless thing to do IMO. But there's a --show-tracebacks option (or something like that, it was a long time ago) to show tracebacks, but it didn't work. I dug into the code for this one. IIRC, the guy who added the code to suppress tracebacks didn't take into account the CLI option. I patched it to not suppress tracebacks but there turned out to be another place where tracebacks were suppressed, and I eventually gave up.
The second incident (although, thinking about it they happened in chonologically reversed order) was when a junior dev came to me with a totally wacky traceback that he couldn't understand.
All he was trying to do was subclass the HTML Form widget, like a good OOP programmer, but it turned out that Django cowboys had used metaclasses to implement HTML Forms, and utterly defeated this poor kid.
I was so mad: Who uses metaclasses to make HTML forms? Overkill much?
(In the event the solution was simple: make a factory function to create the widget then patch it to have the desired behaviour and return it. But you shouldn't have to do that: OOP works as advertised, why fuck with a good thing?)
So, yeah, Django seems to me to be run by cowboys. I can't take it seriously.
FWIW, I'm learning Erlang/OTP and I feel foolish for thinking Python was good for web apps, etc. Don't get me wrong, I love Python (2) but it's not the right solution for every problem.
Here’s the ~20 lines of cowboy code you’re referring to[1] - collecting the declared fields and setting an attribute containing them.
Not exactly the kind of thing that should make you mad, and rather than overkill it’s exactly the use case for metaclasses.
And to top it off, that metaclass is completely optional, if you want to create a list of fields and pass it into BaseForm then go for it. Most don’t.
1. https://github.com/django/django/blob/5776a1660e54a951591644...
While Hasura + GraphQL can be used as an ORM (especially for serverless functions!), Hasura is designed to be used directly by clients as well.
Hasura has a metadata configuration system that works on top of an existing database that allows configuring mapping, relationships and most importantly permissions that make the GraphQL feasible to be used by even frontend apps. [1]
Further, Hasura has remote joins that can "join" across Postgres, GraphQL, REST sources and secure them. [2]
[1] https://hasura.io/blog/hasura-authorization-system-through-e...
[2]: https://hasura.io/blog/remote-joins-a-graphql-api-to-join-da...
I've seen GraphQL schemas being implemented on the client, it's certainly doable, but the performance is terrible compared to doing it on a server close to the source of truth.
Ah, then I misunderstood; I was thinking along the lines of dotnet's authorisation filters.
Filtering might require some reflection, expressions or funcs, which aren't necessarily "everyday" things for some devs, but they shouldn't pose any real trouble for seasoned dotnet devs. If you really want a standard that works OOTB for Entity Framework (and I assume EF Core), you have the option of OData too.
Cheers!
> Django never “removed tracebacks”
I don't want to get into a juvenile back and forth, but I must insist that Django did so suppress tracebacks. I don't know what it does today but I remember clearly patching the server code to re-enable them and that the '--traceback' CLI switch didn't do it.
> I would love to see a declarative form that didn’t use metaclasses in some way.
Here you go: https://pypi.org/project/html/
Clever design, elegant code, under 20k (including docs and tests), no metaclasses.
> Not exactly the kind of thing that should make you mad, and rather than overkill it’s exactly the use case for metaclasses.
There is no use case for metaclasses. GvR called that paper "The Killing Joke" for a reason. ( https://www.python.org/doc/essays/metaclasses/ https://www.youtube.com/watch?v=FBWr1KtnRcI ) I read it for kicks, because I'm that kind of freak, but it's not the kind of thing that you should ever use in production code.
What made me mad is that Django's gratuitous use of metaclasses broke my junior dev. The kid was doing the right thing and it was exploding in his face with an inscrutable error message: that's on Django.
Keep in mind that I’m only really advocating for it as a query language for HTTP APIs (which as a side benefit has some nice existing tooling which you may or may not find useful).
That’s not even the same thing - it’s a (horribly old) library for generating HTML.
We’re talking about forms: sets of typed fields including validations that can be optionally rendered to HTML. Think wtforms[1]
> There is no use case for metaclasses.
There are, that essay (about Python 1.5 no less) does little to dissuade people from using them, going so far as to offer concrete code samples.
It’s also hopelessly outdated: nobody uses metaclasses like that at all, especially not for tracing! It’s hard to blame it though, this document was written before even decorators where introduced.
And let’s not ignore the call to authority by pointing out that GvR uses metaclasses extensively while working on type hints, GaE libraries, and even in his early asyncio code.
In actual fact metaclasses a few use cases, including the most common: syntactic sugar. Like anything this can be heavily abused and is most useful when creating libraries rather than used within traditional application code. In any case, shunning it wholesale is stupid.
Half remembered issues with junior developers are not great arguments against a useful part of a language. Who’s to say that it was even related to metaclasses, and your apparent allergy to them isn’t colouring your memory?
1. https://wtforms.readthedocs.io/en/2.3.x/crash_course/#gettin...
You're not going to convince me that Django isn't an overblown toy. I'm not going to convince you that it is.
Same with metaclasses, you're not going to convince me that they're a good idea (despite what GvR does with them) and I'm not going to convince you that using them is irresponsible.
So what are we left with?
> That’s not even the same thing
But html.py (would have) solved the problem we had. without the nasty surprise.
> In any case, shunning it wholesale is stupid.
No, it's conservative. That's a different thing.
I want to be able to hire someone who can modify a form. The more complex and obscure the the code is (even if it's only twenty lines long) the smaller the pool of folks who can use it with mastery.
Think about it.
Anyway, I'm off learning Erlang/OTP now and it really makes Python's runtime look like a joke in comparison. Web-app backends are Erlang-shaped, not Python-shaped. Not using it sooner makes me feel stupid.
You might get away with it in a GraphQL implementation because you can possibly slap it in top a centralized endpoint, but I really question its efficiency in this case.
- Have the client specify which fields to return, and return only those fields
- Use the above to allow for expanding nested objects when needed
- Specify an API schema somehow.
All GraphQL does is formalize these things into a specification. In my experience the conditional field inclusion is one of the most powerful features. I can simply create a query which contains all of the fields without paying for a performance penalty unless the client actually fetches all those fields simultaneously.
GraphQL queries tend to map rather neatly on ORM queries. Of course you run into the same sort of nonsense you get with ORMS, such as the n+1 one problem and whatnot. The same sort of tools for fixing those issues are available since your graphql query is just going to call the ORM in any case, with one large addition. Introspecting graphql queries is much easier than ORM or SQL queries. I can avoid n+1 problems by seeing if the query is going to look up a nested object and prefetch it. With an ORM I've yet to see one which allows you to do that.
Lastly GraphQL allows you to break up your API very smartly. Just because some object is nested in another doesn't mean they are nested in source code. One object type simply refers to another object type. If an object has some nested objects that needs query optimizing you can stick that optimization in a single place and stop worrying about it. All the objects referring to it will benefit from the optimization without knowing about it.
GraphQL combines all of the above rather smartly by having your entire API declared as (more or less) a single object. That only works because queries only run if you actually ask for the relevant fields to be returned. It's very elegant if you ask me!
Long story short: yes you run into the same sort of issues optimization wise you get with an ORM, but importantly they don't stack on top the problems your ORM is causing already.
Hand-written SQL gives the author a chance to be more precise in its needs, not only in the SQL but also by pre-generating obvious indexes to be performant from the get-go.
And the whole point of GraphQL is that you specify exactly what data you need, so overfetching is avoided. This is in contrast to traditional REST API's where you get a fixed resource.
And, of course, an optimizer will handle different SQL queries differently.
If present. And for that they need to be created, and which are the right ones is less obvious when working higher in the abstraction staircase.
> It doesn't make a difference. How could it?
Because the SQL generated by ORMs can be wildly stupid in many cases. Here's one blogpost with an example, where regular SQL and query builder that maps to SQL almost directly generate a decent query on a simple relation, while a full ORM does something stupid: https://blog.logrocket.com/why-you-should-avoid-orms-with-ex...
Obsoleted garbage without code blocks, lambdas, if-expressions, pattern-matching, speed, kids thinking they're metaprogramming/fp gurus.
vs
Garbage dumber than C without generics, error handling, sadistic linter with unused variable-error, kids thinking they're programmers.
PS: of course python's type system better than Go, b/c anything is better than nothing. Moveover, python's type system has nothing common with python: sumtypes is useless without pattern matching and they doesn't play well with classes, scope-leaking variables isn't really static.
I believe there will be modern high-level programming language someday.
RIIR!
GraphQL allows you to specify what data you need. Obviously, if you use some middleware which throws this information away and just fetches everything from the database, then you have an inefficient system. But this is not a problem inherent to GraphQL or generated SQL in general.
While it might be orthogonal to the design decision, it might add to the amount of unanticipated work that will be required just because of the enormous flexibility.