Terminal emulators display grids of characters using all sorts of horrifying protocols.
Web browsers display html generated by other programs.
What sort of "horrifying protocols"? The entire VT220 state machine diagram can be printed on a single letter- or A4-sized sheet of paper. That's the complete "protocol" of that particular terminal (and of any emulator of it). Implementing the VT220 with a few small extensions (e.g., 256 colors or 24-bit colors) wouldn't be too onerous. I implemented such a parser myself in probably a few hundred lines of code, plus a bit more to do all of the rendering (drawing glyphs directly to a bitmapped display, with no libraries) and handling user input from a keyboard. You'd have a difficult time properly parsing and rendering a significant subset of HTML in less than a few _thousand_ lines of code.
Edit to add: terminal emulators often implement other terminals like VT420, but the VT220 is enough for the vast majority of terminal needs.
VT500 should include ReGIS