Most active commenters
  • dylan604(5)
  • strbean(4)
  • padjo(3)
  • spankalee(3)
  • auxiliarymoose(3)
  • codedokode(3)

Element: setHTML() method

(developer.mozilla.org)
142 points todsacerdoti | 70 comments | | HN request time: 2.255s | source | bottom
1. michalpleban ◴[] No.45674843[source]
So is this basically a safe version of innerHTML?
replies(2): >>45674953 #>>45677088 #
2. Octoth0rpe ◴[] No.45674953[source]
Yes, although a slightly more relevant way of putting it would be that it's an inbuilt DOMPurify (dompurify being an npm package commonly used to sanitize html before injecting it).
3. evilpie ◴[] No.45674985[source]
We enabled this by default in Firefox Nightly (only) this week.
replies(1): >>45675933 #
4. redbell ◴[] No.45675122[source]
> This feature is not Baseline because it does not work in some of the most widely-used browsers.

This is interesting, but it appears to be in its early days as none of the major browsers seem to support it.. yet.

replies(2): >>45675806 #>>45675936 #
5. CaptainOfCoit ◴[] No.45675154[source]
Really happy to see it, after 25 years (https://www.bugcrowd.com/glossary/cross-site-scripting-xss/) of surviving without it. It always struck me as an obvious missing part of the DOM API, and I still don't know why it took this long time.

But mostly I'm just happy that it's finally here, I do appreciate all the hard work people been doing to get this live.

6. AlienRobot ◴[] No.45675157[source]
Great functionality, terrible name.
replies(2): >>45675224 #>>45675738 #
7. dzogchen ◴[] No.45675208[source]
Neat. I think once this is adopted by HTMX (or similar libraries) you don't need to sanitize on the server side anymore?
replies(1): >>45675272 #
8. varun_ch ◴[] No.45675224[source]
I sometimes wonder whether what the DOM APIs could look like in a hypothetical world where we could start over with everything.
9. ishouldbework ◴[] No.45675241[source]
> It then removes any HTML entities that aren't allowed by the sanitizer configuration, and further removes any XSS-unsafe elements or attributes — whether or not they are allowed by the sanitizer configuration.

Emphasis mine. I do not understand this design choice. If I explicitly allow `script` tag, why should it be stripped?

If the method was called setXSSSafeSubsetOfHTML sure I guess, but feels weird for setHTML to have impossible-to-override filter.

replies(6): >>45675325 #>>45675333 #>>45675336 #>>45675342 #>>45675791 #>>45677986 #
10. dylan604 ◴[] No.45675272[source]
Do you honestly feel that we will ever be in a place for the server to not need to sanitize data from the client? Really? I don't. Any suggestion to me of "not needing to sanitize data from client" will immediately have me thinking the person doing the suggesting is not very good at their job, really new, or trying to scam me.

There's no reason to not sanitize data from the client, yet every reason to sanitize it.

replies(4): >>45675347 #>>45675432 #>>45675693 #>>45676358 #
11. evilpie ◴[] No.45675325[source]
If you want to use an XSS-unsafe Sanitizer you have to use setHTMLUnsafe.
12. jmull ◴[] No.45675333[source]
I guess they are going for a safe default... the idea is people who don't carefully read the docs or carefully monitor the provenance of their dynamically generated HTML will probably reach for "setHTML()".

Meanwhile, there's "setHTMLUnsafe()" and, of course, good old .innerHTML.

13. strbean ◴[] No.45675336[source]
This is primarily an ergonomic addition, so it kinda makes sense to me to not make the dangerous footguns more ergonomic in the process. You can still assign `innerHTML` etc. to do the dangerous thing.
replies(2): >>45675456 #>>45675472 #
14. ◴[] No.45675342[source]
15. jsmith99 ◴[] No.45675347{3}[source]
It's arguably easier just to sanitise at display time otherwise you have problems like double escaping.
replies(1): >>45675653 #
16. strbean ◴[] No.45675432{3}[source]
It can be a complicated and error-prone process, mainly in scenarios where you have multiple mediums that require different sanitizers. Obviously you should do it. But in such scenarios, the best practice is to sanitize as close to the place it is used as possible. I've seen terrible codebases where they tried to apply multiple layers of sanitization on user input before storing to the DB, then reverse the unneeded layers before output. Obviously this didn't work.

Point being, if you can move sanitization even closer to where it is used, and that sanitization is actually provided by the standard library of the platform in question, that's a massive win.

replies(2): >>45675650 #>>45676516 #
17. hsbauauvhabzb ◴[] No.45675456{3}[source]
Ideally this should be called dangerouslySetInnerHTML but hindsight blah blah
18. meowface ◴[] No.45675472{3}[source]
I agree, though I also agree with the parent that the method name is a little bit confusing. "safeSetHTML" or "setUntrustedHTML" or something would be clearer.
replies(4): >>45676054 #>>45677128 #>>45677840 #>>45677920 #
19. immibis ◴[] No.45675650{4}[source]
By "sanitise" what's really meant is usually "escape". User typed their display name as <script>. You want the screen to say their display name, which is <script>. Therefore you send &lt;script&gt;. That's not their display name - that's just what you write in HTML to get their display name to appear on the screen. You shouldn't store it in the database in the display_name column.
replies(1): >>45675922 #
20. bpt3 ◴[] No.45675653{4}[source]
Easier does not mean better, which seems to be true in this case given the many, many vulnerabilities that have been exploited over the years due to a lack of input sanitization.
replies(1): >>45675823 #
21. padjo ◴[] No.45675693{3}[source]
Sanitize as close as possible to where it is used is usually best, then you don’t have to keep track of what’s sanitized and what’s not sanitized for very long.

(Especially important if sanitation is not idempotent!)

22. jonathrg ◴[] No.45675738[source]
Why? Does it not set the HTML?
replies(1): >>45676209 #
23. wewtyflakes ◴[] No.45675791[source]
Wouldn't that open the floodgates by allowing code that could itself call `setHTML` again but then further revise the args to escalate its privileges?
24. ibowankenobi ◴[] No.45675803[source]
The API design could be better. Document fragments are designed to be reused. It should accept an optional fragment key which accepts a document fragment.If not a fragment, throw, if has children, empty contents first.
replies(1): >>45675902 #
25. padjo ◴[] No.45675823{5}[source]
In this case easier is actually better. Sanitize a string at the point where you are going to use it. The locality makes it easy to verify that sanitation has been done correctly for the context. The alternative means you have to maintain a chain of custody for the string and ensure it is safe.
replies(1): >>45676451 #
26. spankalee ◴[] No.45675902[source]
In what way are document fragments meant to be reused?

They empty their contents into the new parent when they're appended, so they can't be meaningfully appended a second time without rebuilding them.

`<template>` is mean to be reused, since you're meant to clone it in order to use it, and then you can clone it again.

replies(1): >>45676831 #
27. strbean ◴[] No.45675922{5}[source]
Agreed. The codebase I'm thinking of was html encoding stuff before storing it, then when they needed to e.g. send an SMS, trying to remember to decode. Terrible.
28. spankalee ◴[] No.45675933[source]
I'll be very excited to use this in Lit when it hits baseline.

While lit-html templates are already XSS-hardened because template strings aren't forgeable, we do have utilities like `unsafeHTML()` that let you treat untrusted strings as HTML, which are currently... unsafe.

With `Element.setHTML()` we can make a `safeHTML()` directive and let the developer specify sanitizer options too.

replies(1): >>45676099 #
29. JadeNB ◴[] No.45675936[source]
A sibling comment by evilpie says that it is enabled in Firefox Nightly: https://news.ycombinator.com/item?id=45674985
replies(1): >>45676240 #
30. padjo ◴[] No.45675938[source]
As someone who has dealt with more than my fair share of content injection vulnerabilities over the years this is great to see at last. It’s kinda crazy that this only coming now while other, more cumbersome solutions like CSP have been around for years.
31. strbean ◴[] No.45676054{4}[source]
Idk about that, there's a good argument that the most obvious methods should be the safe ones. That's what juniors will probably jump to first. If you need the unsafe ones, you'll probably be able to figure that out and find them quickly.
32. StrauXX ◴[] No.45676099{3}[source]
Why don't you use DOMPurify right now? It's battle tested and supports configs just like this proposal.
replies(2): >>45676509 #>>45676629 #
33. modinfo ◴[] No.45676114[source]
Cursor build a pseudo-sethtml: https://github.com/skorotkiewicz/pseudo-sethtml
replies(1): >>45676607 #
34. _the_inflator ◴[] No.45676140[source]
Maybe it is then time for having something that is beyond "use strict" at the beginning auf a JavaScript document as one option to use the statement.

I think a config object in which you define for script options like sanitization and other script configuration might be helpful.

After all, there almost always need to be backward compatibility be ensured, and this might work. I am no spec guy, it is just an idea. React makes use of "use client/server", so this would be more central and explicit.

35. netsharc ◴[] No.45676209{3}[source]
It doesn't say "There's a lot of hidden sanitizing stuff inside this method" from the name...

Something like "setSafeHTML()" would be preferable. (Since it's Mozilla, there should be a few committee meetings to come up with the appropriate name)...

replies(1): >>45677148 #
36. MarsIronPI ◴[] No.45676240{3}[source]
Actually, it exists behind an about:config as far back as 138. So if you enable it, it even works in the current ESR.
37. auxiliarymoose ◴[] No.45676358{3}[source]
If you sanitize on the server, you are making assumptions about what is safe/unsafe for your clients. It's possible to make these assumptions correctly, but that requires keeping them in sync with all clients which is hard to do correctly.

Something that's sanitized from an HTML standpoint is not necessarily sanitized for native desktop & mobile applications, client UI frameworks, etc. For example, with Cloudflare's CloudBleed security incident, malformed img tags sent by origin servers (which weren't themselves by themselves unsafe in browsers) caused their edge servers to append garbage (including miscellaneous secure data) from heap memory to some requests that got indexed by search engines.

Sanitization is always the sole responsibility of the consumer of the content to make sure it presents any inbound data safely. Sometimes the "consumer" is colocated on the server (e.g. for server rendered HTML + no native/API users) but many times it's not.

replies(1): >>45676498 #
38. dylan604 ◴[] No.45676451{6}[source]
if you are using it at the client, sure, but then why is the server involved? if you are sending it to the server, you need to treat it like it is always coming from a hacker with very bad intentions. i don't care where the data comes from, my server will sanitize it for its own protection. after all, just because it left "clean" from your browser does not mean it was not interfered with elsewhere upstream TLS be damned. if we've double encoded something, that's fine, it won't blow up the server. at the end of that day, that's what is most important. if some double decoding doesn't happen correctly on the client, then <shrugEmoji>
39. dylan604 ◴[] No.45676498{4}[source]
> If you sanitize on the server, you are making assumptions about what is safe/unsafe for your clients.

No. I'm making decisions on what is safe for my server. I'm a back end guy, I don't really care about your front end code. I will never deem your front end code's requests as trustworthy. If the front end code cannot properly handle encoding, the back end code will do what it needs to do to not allow stupid string injection attacks. I don't know where your request has been. Just because you think it came from your code in the browser does not mean that was the last place it was altered before hitting the back end.

replies(1): >>45676709 #
40. ffsm8 ◴[] No.45676509{4}[source]
Why would the framework do that?

The app developers can still use that right now, but if the framework forces it's usage it'd unnecessarily increase package size for people that didn't need it.

41. dylan604 ◴[] No.45676516{4}[source]
You're making a bad assumption that client side code was the last place the submitted string was altered in the path to the server. The man in the middle might have a different idea and should always be protected against on the server where it is the last place to sanitize it.
42. exdeejay_ ◴[] No.45676607[source]
This code only does the most basic and naive regex filtering that even a beginner XSS course's inputs would work against. With the Node example code and input string:

  <p>Hello <scr<script>ipt>alert(1)</scr<script>ipt> World</p>
The program outputs:

  $ node .
  <p>Hello <script>alert(1)</script> World</p>
  {
    sanitizedHTML: '<p>Hello <script>alert(1)</script> World</p>',
    wasModified: true,
    removedElements: [],
    removedAttributes: []
  }
Asking a chatbot to make a security function and then posting it for others to use without even reviewing it is not only disrespectful, but dangerous and grossly negligent. Please take this down.
replies(1): >>45677975 #
43. cheeaun ◴[] No.45676617[source]
Found a polyfill here https://github.com/mozilla/sanitizer-polyfill
44. spankalee ◴[] No.45676629{4}[source]
One, lit-html doesn't have any dependencies.

Two, even if we did, DOMPurify is ~2.7x bigger than lit-html core (3.1Kb minzipped), and the unsafeHTML() directive is less than 400 bytes minzipped. It's just really big to take on a sanitizer, and which one to use is an opinion we'd have to have. And lit-html is extensible and people can already write their own safeHTML() directive that uses DOMPurify.

For us it's a lot simpler to have safe templates, an unsafe directive, and not parse things to finely in between.

A built-in API is different for us though. It's standard, stable, and should eventually be well known by all web developers. We can't integrate it with no extra dependencies or code, and just adopt the standard platform options.

45. auxiliarymoose ◴[] No.45676709{5}[source]
How can user input be unsafe on the server? Are you evaluating it somehow?

User-generated content shouldn't be trusted in that way (inbound requests from client, data fields authored by users, etc.)

replies(1): >>45676826 #
46. dylan604 ◴[] No.45676826{6}[source]
Is that a serious question?

INSERT INTO table (user_name) VALUES ...

Are you one of today's 10000 on server side sanitizing of user data?

replies(2): >>45676888 #>>45676944 #
47. ibowankenobi ◴[] No.45676831{3}[source]
You can absolutely reuse a document fragment

https://ibrahimtanyalcin.github.io/Cahir/

the whole rendering uses a single fragment.

48. krapp ◴[] No.45676888{7}[source]
Are you one of today's 10000 on using parameterized queries and prepared statements?

Unless you're doing something stupid like concatenating strings into SQL queries, there's no need to "sanitize" anything going into a database. SQL injection is a solved problem.

Coming from the database and sending to the client, sure. But unless you're doing something stupid like concatenating strings into SQL statements it hasn't been necessary to "sanitize" data going into a database in ages.

Edit: I didn't realize until I reread this comment that I repeated part of it twice, but I'm keeping it in because it bears repeating.

replies(1): >>45677125 #
49. auxiliarymoose ◴[] No.45676944{7}[source]
Communicating with a SQL driver by concatenating strings containing user input and then evaluating it? wat?

I'm very interested in what tech stack you are using where this is a problem.

replies(1): >>45677146 #
50. intrasight ◴[] No.45677088[source]
I'm confused as to why you need a "safe" version if you're the one generating and injecting the HTML.
replies(2): >>45677311 #>>45677377 #
51. hoppp ◴[] No.45677125{8}[source]
SQL injection is solved if you use dependencies that solve it of course.

Other than SQL injection there is command or log injection, file names need to be sanitized or any user uploaded content for XSS and that includes images. Any incoming JSON data should be sanitized, extra fields removed etc.

Log injection is a pretty nasty sort of hack that depending on how the logs are processed can lead to XSS or Command injection

52. jfengel ◴[] No.45677128{4}[source]
I like React's dangerouslySetInnerHTML. The name so clearly conveys "you can do this but you really, really, really shouldn't".
replies(1): >>45677701 #
53. jfengel ◴[] No.45677146{8}[source]
People do it all the time, on any tech stack that lets you execute command strings. A lot of of early databases didn't even support things like parameterized inserts.
54. hoppp ◴[] No.45677148{4}[source]
Well ,could it be safelySetHTML instead of setSafeHTML ?

The second one could imply the HTML is already safe while the first one is safe way to set html.

If it's just setHTML then it could imply that don't care if its safe or not.

55. sergeykish ◴[] No.45677170[source]
So `.setHTML("<script>...</script>")` does not set HTML?
replies(1): >>45677372 #
56. evbogue ◴[] No.45677311{3}[source]
Why should a web page only have a single person generating and injecting HTML into it?
57. xp84 ◴[] No.45677372[source]
Sounds reasonable enough to me. 99.99% of the times you’re in an actual script, if you mean to execute code, you’d just execute it yourself, rather than making a script tag full of code and sticking that tag into a random DOM element. That’s why the default wouldn’t honor the script tag and there’d be an “unsafe” method explicitly named as such to hint you that you’re doing something weird.
58. matmo ◴[] No.45677377{3}[source]
Isn't this kinda like asking "why does my gun need a safety if I'm the only one consciously pulling the trigger"?
59. codedokode ◴[] No.45677417[source]
I don't like this. This could be implemented as a JS library. I believe browsers should provide the minimal API so that they are smaller and easier to create. As for safe alternative to innerHTML, it is called innerText.
replies(3): >>45677474 #>>45677546 #>>45677680 #
60. csmantle ◴[] No.45677474[source]
I think innerText and setHTML() have different purposes. The former inserts the whole string as a text leaf, while the latter tries to preserve structures that are meaningful in context.

---

Libraries can surely do the same job, but then the exact behavior would vary among a sea of those libs. Having specs defined [0] for such an interface would hopefully iron out much of these variations, as well as enabling some performance gains.

[0]: https://wicg.github.io/sanitizer-api/#dom-element-sethtml

replies(1): >>45677927 #
61. petralithic ◴[] No.45677680[source]
> I believe browsers should provide the minimal API so that they are smaller and easier to create.

That ship has long since sailed. Browsers are so complex that it takes quite some effort to support the various levels of 9s of the percentage of compatibility with standards, not to mention the browser makers themselves define many of the standards.

replies(1): >>45677947 #
62. domenicd ◴[] No.45677701{5}[source]
Indeed, the web platform now has setHTML() and setHTMLUnsafe() to replace the innerHTML setter.

There's also getHTML() (which has extra capabilities over the innerHTML getter).

63. SoftTalker ◴[] No.45677840{4}[source]
Why not name it what it does: sanitizeAndSetHTML
64. xp84 ◴[] No.45677920{4}[source]
Naming things in that manner hasn’t proven to be a good idea over the years.

When you have 2 of something and one is safe/better and the other one is known to be problematic, you give the awkward name to the problematic one and the obvious name to the safe/better one. Noobs oughtn’t to be attempting the other one, and anyone who is mature enough to have reason to do it, are mature enough to appreciate the reason behind that complexity.

65. codedokode ◴[] No.45677927{3}[source]
And if you need something that is not in a spec, you have to use a library anyway. Also the point was that browser should be as simple as possible and not like a whole new OS.
66. ◴[] No.45677947{3}[source]
67. codedokode ◴[] No.45677975{3}[source]
I wonder why Cursor chose regex approach when it is widely known that it is a wrong method. Is it a result of training on low-quality forums for beginners?
68. systoll ◴[] No.45677986[source]
A script tag would be able to call setHTMLUnsafe, bypassing whatever sanitation you configured.

I’d’ve made it a runtime error to call setHTML with an unsafe config, but Javascript tends toward implicit reinterpretation rather than erroring-out.

69. CGamesPlay ◴[] No.45678000[source]
Is “XSS-unsafe” precisely defined anywhere? I assume it means “any access to the JS interpreter”, but assuming in this context seems decidedly unsafe.