It is difficult to beat raster tiles in that respect. vector tiles split up responsibility for what you get visually over multiple moving pieces with different provenance and operators.
They're probably missing a raqm/freebidi or something in the stack on the client side.
Having the data in tiles also complicates some things (where the renderer needs to consider merged tiles to get a similar result).
(Some countries use Roman numerals and some use Arabic numerals for numbers. And by Arabic numerals I don't mean 1-9.)
The difficulty is that the stack for rendering vector tiles in the browser is different than the legacy vector->png generation that's been done for ages.
Not really. You need to build geometries from raw OSM data (aka the stuff that you edit) then transform those geometries into MVT format adding appropriate attributes from the original data. In general you actually will want to normalize the data and throw out anything that is not included in the vector tile schema you are using. The net result is quite far from raw OSM data in any case.
PS: I maintain a project that stores actual OSM data in MBTiles format for offline editing, and yes proper editing apps have to do the above on the fly and it is the main reason they are not lightning fast.