全部的Filter均實現了NodeFilter接口,此接口只有一個方法Boolean accept(Node node),用於肯定某個節點 是否屬於此Filter過濾的範圍。 HtmlParser在org.htmlparser.filters包以內一共定義了16個不一樣的Filter,也能夠分爲幾類。html
判斷類Filter: TagNameFilternode
HasAttributeFilterorm
HasChildFilterhtm
HasParentFilter接口
HasSiblingFilterip
IsEqualFilterget
邏輯運算Filterit
AndFilterio
NotFilterList
OrFilter
XorFilter
其餘Filter:
NodeClassFilter
StringFilter
LinkStringFilter
LinkRegexFilter
RegexFilter
CssSelectorNodeFilter
除此以外,能夠自定義一些Filter,用於完成特殊需求的過濾
Tag類
主要和NodeClassFilter配合使用
Remark:註釋
AppletTag:
BaseHrefTag:
Body Tag:"BODY";//getBody();內部調用額是toPlainTextString();
Bullet:"LI"
BulletList:"UL","OL"
CompositeTag:
DefinitionList:"DL"
DefinitionListBullet:"DD","DT"
Div:"DIV"
DoctypeTag:「!DOCTYPE"
FormTag:
FrameSetTag:
FrameTag:
HeadingTag:"H1","H2","H3","H4","H5","H6"
HeadTag:"HEAD"
Html:"HTML"
ImageTag:
InputTag:"INPUT"
JspTag:"%","%=","%@"
LabelTag:"LABEL"
LinkTag:
MetaTag:
ObjectTag:
OptionTag:
ParagraphTag:"P"
ProcessingInstructionTag:"?"
ScriptTag:
SelectTag:"SELECT"
Span:"SPAN"
StyleTag:"STYLE"
TableColumn:"TD"
TableHeader:"TH"
TableRow:"TR"
TableTag:"TABLE"
TagNode:
TextareaTag:"TEXTAREA"
TitleTag:"TITLE"
TextNode: