HTML Sanitizer API
Publikováno: 16.12.2021
Three cheers for (draft stage) progress on a Sanitizer API! It’s gospel that you can’t trust user input. And indeed, any app I’ve ever worked on has dealt with bad actors trying to slip in and execute nefarious code …
Three cheers for (draft stage) progress on a Sanitizer API! It’s gospel that you can’t trust user input. And indeed, any app I’ve ever worked on has dealt with bad actors trying to slip in and execute nefarious code somewhere it shouldn’t.
It’s the web developer’s job to clean user input before it is used again on the page (or stored, or used server-side). This is typically done with our own code or libraries that are pulled down to help. We might write a RegEx to strip anything that looks like HTML (or the like), which has the risk of bugs and those bad actors finding a way around what our code is doing.
Instead of user-land libraries or our dancing with it ourselves, we could let the browser do it:
// some function that turns a string into real nodes
const untrusted_input = to_node("<em onclick='alert(1);'>Hello!</em>");
const sanitizer = new Sanitizer();
sanitizer.sanitize(untrusted_input); // <em>Hello!</em>
Then let it continue to be a browser responsibility over time. As the draft report says:
The browser has a fairly good idea of when it is going to execute code. We can improve upon the user-space libraries by teaching the browser how to render HTML from an arbitrary string in a safe manner, and do so in a way that is much more likely to be maintained and updated along with the browser’s own changing parser implementation.
This kind of thing is web standards at its best. Spot something annoying (and/or dangerous) that tons of people have to do, and step in to make it safer, faster, and better.