<p>Htmlparser is a listed <ahref="http://luarocks.org/repositories/rocks/">LuaRock</a>. Install using <ahref="http://www.luarocks.org/">LuaRocks</a>: <code>luarocks install htmlparser</code></p>
<p>Htmlparser depends on <ahref="http://www.lua.org/download.html">Lua 5.2</a>, and on the <ahref="http://wscherphof.github.com/lua-set/">"set"</a> LuaRock, which is installed along automatically. To be able to run the tests, <ahref="https://github.com/dcurrie/lunit">lunitx</a> also comes along as a LuaRock</p>
<p>The root element is a container for the top level elements in the parsed text, i.e. the <code><html></code> element in a parsed html document would be a child of the returned root element.</p>
<code>"[attribute|='value']"</code> prefix: attribute's value is given value, or starts with given value, followed by a hyphen (<code>-</code>)</li>
<li>
<code>"[attribute*='value']"</code> contains: attribute's value contains given value</li>
<li>
<code>"[attribute~='value']"</code> word: attribute's value is a space-separated token, where one of the tokens is the given value</li>
<li>
<code>"[attribute^='value']"</code> starts with: attribute's value starts with given value</li>
<li>
<code>"[attribute$='value']"</code> ends with: attribute's value ends with given value</li>
<li>
<code>":not(selectorstring)"</code> elements not selected by given selector string</li>
<li>
<code>"ancestor descendant"</code> elements selected by the <code>descendant</code> selector string, that are a descendant of any element selected by the <code>ancestor</code> selector string</li>
<li>
<code>"parent > child"</code> elements selected by the <code>child</code> selector string, that are a child element of any element selected by the <code>parent</code> selector string</li>
</ul><p>Selectors can be combined; e.g. <code>".class:not([attribute]) element.class"</code></p>
<code>:gettext()</code> the complete element text, starting with <code>"<tagname"</code> and ending with <code>"/>"</code> or <code>"</tagname>"</code>
<code>.deepernodes</code> a <ahref="http://wscherphof.github.com/lua-set/">Set</a> containing all elements in the tree beneath this element, including this element's <code>.nodes</code>; <code>{}</code> if none</li>
<code>.deeperelements</code> a table with a key for each distinct tagname in <code>.deepernodes</code>, containing a <ahref="http://wscherphof.github.com/lua-set/">Set</a> of all deeper element nodes with that name; <code>{}</code> if none</li>
<li>Attribute values in selector strings cannot contain any spaces, nor any of <code>#</code>, <code>.</code>, <code>[</code>, <code>]</code>, <code>:</code>, <code>(</code>, or <code>)</code>
</li>
<li>The spaces before and after the <code>></code> in a <code>parent > child</code> relation are mandatory </li>
<li>
<code><!</code> elements (including doctype, comments, and CDATA) are not parsed; markup within CDATA is <em>not</em> escaped</li>
<li>Textnodes are no separate tree elements; in <code>local root = htmlparser.parse("<p>line1<br />line2</p>")</code>, <code>root.nodes[1]:getcontent()</code> is <code>"line1<br />line2"</code>, while <code>root.nodes[1].nodes[1].name</code> is <code>"br"</code>
<li>No start or end tags are implied when <ahref="http://www.w3.org/TR/html5/syntax.html#optional-tags">omitted</a>. Only the <ahref="http://www.w3.org/TR/html5/syntax.html#void-elements">void elements</a> should not have an end tag</li>
<li>No validation is done for tag or attribute names or nesting of element types. The list of void elements is in fact the only part specific to HTML</li>