You have HTML in a Java String, and you want to parse that HTML to get at its contents, or to make sure it's well formed, or to modify it. The String may have come from user input, a file, or from the web.html
Use the static Jsoup.parse(String html)
method, or Jsoup.parse(String html, String baseUri)
if the page came from the web, and you want to get at absolute URLs (see [working-with-urls]).java
String html = "<html><head><title>First parse</title></head>" + "<body><p>Parsed HTML into a doc.</p></body></html>"; Document doc = Jsoup.parse(html);
The parse(String html, String baseUri)
method parses the input HTML into a newDocument
. The base URI
argument is used to resolve relative URLs into absolute URLs, and should be set to the URL where the document was fetched from. If that's not applicable, or if you know the HTML has a base
element, you can use the parse(String html)
method.node
As long as you pass in a non-null string, you're guaranteed to have a successful, sensible parse, with a Document containing (at least) a head
and a body
element. (BETA: if you do get an exception raised, or a bad parse-tree, please file a bug.)web
Once you have a Document, you can get get at the data using the appropriate methods in Document and its supers Element
and Node
.api