You have a fragment of body HTML (e.g. a div
containing a couple of p
tags; as opposed to a full HTML document) that you want to parse. Perhaps it was provided by a user submitting a comment, or editing the body of a page in a CMS.html
Use the Jsoup.parseBodyFragment(String html)
method.java
String html = "<div><p>Lorem ipsum.</p>"; Document doc = Jsoup.parseBodyFragment(html); Element body = doc.body();
The parseBodyFragment
method creates an empty shell document, and inserts the parsed HTML into the body
element. If you used the normal Jsoup.parse(String html)
method, you would generally get the same result, but explicitly treating the input as a body fragment ensures that any bozo HTML provided by the user is parsed into the body
element.node
The Document.body()
method retrieves the element children of the document's body
element; it is equivalent to doc.getElementsByTag("body")
.shell
If you are going to accept HTML input from a user, you need to be careful to avoid cross-site scripting attacks. See the documentation for the Whitelist
based cleaner, and clean the input with clean(String bodyHtml, Whitelist whitelist)
.api