The Web Platform Incubator Community Group recently published the Draft Community Group Report for the HTML Sanitizer API. The HTML Sanitizer API lets developers take untrusted strings of HTML and sanitize those strings for safe insertion into a document’s DOM. The most common use case of HTML string sanitization is to prevent cross-site scripting (XSS) attacks.
The API proposal detailed the rationale behind the new proposal as follows:
Web applications often need to work with strings of HTML on the client side, perhaps as part of a client-side templating solution, perhaps as part of rendering user-generated content, etc. It is difficult to do so in a safe way, however; the naive approach of joining strings together and stuffing them into an
Element
'sinnerHTML
is fraught with risk, as that can and will cause JavaScript execution in a number of unexpected ways […]
[User space libraries] has proven to be a fragile approach […] We can improve upon the user-space libraries by teaching the browser how to render HTML from an arbitrary string in a safe manner, and do so in a way that is much more likely to be maintained and updated along with the browser’s own changing parser implementation.
The API goals are three-fold: mitigate the risk of DOM-based cross-site scripting attacks; make HTML output safe for use within the current user agent, taking into account its current understanding of HTML; allow developers to override the default set of elements and attributes to prevent script gadget attacks.
The HTML Sanitizer API covers three use cases. In the first use case, developers have input data that is available as a tree of DOM nodes. They can then sanitize the tree contents and incorporate the sanitized DOM as follows:
let s = new Sanitizer();
// Case: The input data is available as a tree of DOM nodes.
let userControlledTree = ...;
element.replaceChildren(s.sanitize(userControlledTree));
In the second use case, developers have HTML content in the form of a string and know the target DOM element. The Sanitizer API can be used as follows:
const user_string = "..."; // The user string.
const sanitizer = new Sanitizer( ... ); // Our Sanitizer;
// We want to insert the HTML in user_string into a target element with id
// target. That is, we want the equivalent of target.innerHTML = value, except
// without the XSS risks.
document.getElementById("target").setHTML(user_string, sanitizer);
In the third use case, developers have HTML content in the form of a string, know the type of the target DOM element (e.g., div
, span
), but can’t or don’t want to perform the insertion immediately after sanitization. The proposed API can be used as follows:
let s = new Sanitizer();
let forDiv = s.sanitizeFor("div", userControlledInput);
// Later:
document.querySelector(`${forDiv.localName}#target`).replaceChildren(forDiv.children);
Sanitizing HTML content refers to removing semantically harmful parts (such as script execution) from HTML strings. To convert a string into DOM nodes, it needs to be parsed as specified by the HTML parsing algorithm. String-to-DOM parsing is context-dependent, i.e. the same HTML string will be parsed into different DOM nodes, according to the surrounding context.
The string <em>bla
will be parsed as follows in a <div>
and <textarea>
context:
<div><em>bla</div> ⇨ <div><em>bla</em></div>
<textarea><em>bla</textarea> ⇨ <textarea><em>bla</textarea>
The string <td>text
will be parsed as follows in a <table>
and non-table (<div>
) context:
<table><td>text</table> ⇨ <table><td>text</table>
<div><td>text</div> ⇨ <div>text</div>
The previously exemplified parsing context-dependent behavior makes string-to-string HTML user-contributed sanitizer libraries inherent prone to a class of XSS-style attacks called mXSS. MXSS attacks exploit the fact that string-to-string HTML sanitizer APIs do not have access to the necessary context information at sanitizing time. Two mXSS-style exploits in real-world libraries can be found here and here.
The Sanitizer API is incubated in the Sanitizer API WICG. It is not a W3C standard (i.e., web standard) nor is it on the W3C Standards Track. The API is currently not available by default in any browser. Developers may however preview an early implementation in Chrome 93 or later by enabling the about://flags/#enable-experimental-web-platform-features
flag.