Dominator

Parse, hierarchize, analyse xhtml

Constructors

this
this()

Instantiate empty Dominator

this
this(string haystack)

Instantiate object and load the Document

Members

Functions

getElement
string getElement(Node node)

gets the part of the loaded Document from the nodes begining to its end

getInner
string getInner(Node node)

gets the Inner-HTML from the given node

getNodes
Node[] getNodes()

returns all found Nodes. Please note, that also Nodes will be returned which was found in comments. use isComment() to check if a Node is in a comment or use libdominator.Filter.filterComments()

getStartElement
string getStartElement(Node node)

gets the Tag Name of the Node

load
Dominator load(string haystack)

loads a Document

stripTags
string stripTags(Node node)

Removes tags and returns plain inner content

stripTags
string stripTags()

Removes tags and returns plain inner content

Examples

get descendants of a specific Node and apply further filtering on the result.

const string content = `<div data-function="<some weird> stuff">
    <span>
        <span>
            <span>bäm!</span>
        </span>
        <span>boing!</span>
    </span>
    <ol id="ol-1">
      <li id="li-1-ol-1">li-1-ol-1 Inner</li>
      <li id="li-2-ol-1">li-2-ol-1 Inner</li>
      <li id="li-3-ol-1">li-3-ol-1 Inner</li>
    </ol>
  </div>`;

  Dominator dom = new Dominator(content);
  Node [] descendants = (*dom.filterDom("div").ptr).getDescendants();
  assert( descendants.filterDom("span").length == 4 );
  assert( descendants.filterDom("li").length == 3 );
  assert( descendants.filterDom("ol").length == 1 );

basic example

const string html =
`<div>
    <p>Here comes a list!</p>
    <ul>
        <li class="wanted">one</li>
        <!-- <li>two</li> -->
        <li class="wanted hard">three</li>
        <li id="item-4">four</li>
        <li checked>five</li>
        <li id="item-6">six</li>
    </ul>
    <p>another list</p>
    <ol>
        <li>eins</li>
        <li>zwei</li>
        <li>drei</li>
    </ol>
    <p>have a nice day</p>
</div>`;
Dominator dom = new Dominator(html);

foreach(node ; dom.filterDom("ul.li")) {
    //do something more usefull with the node then:
    assert(node.getParent.getTag() == "ul");
}

Node[] nodes = dom.filterDom("ul.li");
assert(dom.getInner( nodes[0] ) == "one" );
assert(nodes[0].getAttributes() == [ Attribute("class","wanted") ] , to!(string)(nodes[0].getAttributes()) );
assert(Attribute("class","wanted").matches(nodes[0]));
assert(Attribute("class","wanted").matches(nodes[2]));
assert(Attribute("class",["wanted","hard"]).matches(nodes[2]));
assert(nodes[1].isComment());

assert(dom.filterDom("ul.li").length == 6);
assert(dom.filterDom("ul.li").filterComments.length == 5);
assert(dom.filterDom("li").length == 9);
assert(dom.filterDom("li[1]").length == 1); //the first li in the dom
assert(dom.filterDom("*.li[1]").length == 2); //the first li in ul and first li in ol
assert(dom.getInner( (*dom.filterDom("*{checked:}").ptr) ) == "five");

Find nodes with a special href - In HTML5 it is ok to have attribute-values without quotation marks.

Dominator dom = new Dominator(readText("tests/dummy.html"));
foreach(node ; dom.filterDom("scpdurl"))
{
    assert( dom.getInner(node) == "/timeSCPD.xml" );
}

Meta