mirror of
https://github.com/msva/lua-htmlparser.git
synced 2024-11-27 12:44:22 +00:00
Create gh-pages branch via GitHub
This commit is contained in:
parent
4478397df9
commit
82a478cfb1
61
index.html
61
index.html
@ -21,53 +21,54 @@
|
||||
<h1 id="project_title">LuaRock "htmlparser"</h1>
|
||||
<h2 id="project_tagline">Parse HTML text into a tree of elements with selectors</h2>
|
||||
|
||||
<!--
|
||||
<section id="downloads">
|
||||
<a class="zip_download_link" href="https://github.com/wscherphof/lua-htmlparser/zipball/master">Download this project as a .zip file</a>
|
||||
<a class="tar_download_link" href="https://github.com/wscherphof/lua-htmlparser/tarball/master">Download this project as a tar.gz file</a>
|
||||
</section>
|
||||
-->
|
||||
</header>
|
||||
</div>
|
||||
|
||||
<!-- MAIN CONTENT -->
|
||||
<div id="main_content_wrap" class="outer">
|
||||
<section id="main_content" class="inner">
|
||||
<h2>Install</h2>
|
||||
<h2>
|
||||
<a name="install" class="anchor" href="#install"><span class="octicon octicon-link"></span></a>Install</h2>
|
||||
|
||||
<p>Htmlparser is a listed <a href="http://luarocks.org/repositories/rocks/">LuaRock</a>. Install using <a href="http://www.luarocks.org/">LuaRocks</a>: <code>luarocks install htmlparser</code></p>
|
||||
|
||||
<h3>Dependencies</h3>
|
||||
<h3>
|
||||
<a name="dependencies" class="anchor" href="#dependencies"><span class="octicon octicon-link"></span></a>Dependencies</h3>
|
||||
|
||||
<p>Htmlparser depends on <a href="http://www.lua.org/download.html">Lua 5.2</a>, and on the <a href="http://wscherphof.github.com/lua-set/">"set"</a> LuaRock, which is installed along automatically. To be able to run the tests, <a href="https://github.com/dcurrie/lunit">lunitx</a> also comes along as a LuaRock</p>
|
||||
<p>Htmlparser depends on <a href="http://www.lua.org/download.html">Lua 5.2</a>, and on the ["set"][1] LuaRock, which is installed along automatically. To be able to run the tests, <a href="https://github.com/dcurrie/lunit">lunitx</a> also comes along as a LuaRock</p>
|
||||
|
||||
<h2>Usage</h2>
|
||||
<h2>
|
||||
<a name="usage" class="anchor" href="#usage"><span class="octicon octicon-link"></span></a>Usage</h2>
|
||||
|
||||
<p>Start off with</p>
|
||||
|
||||
<div class="highlight"><pre><span class="nb">require</span><span class="p">(</span><span class="s2">"</span><span class="s">luarocks.loader"</span><span class="p">)</span>
|
||||
<div class="highlight highlight-lua"><pre><span class="nb">require</span><span class="p">(</span><span class="s2">"</span><span class="s">luarocks.loader"</span><span class="p">)</span>
|
||||
<span class="kd">local</span> <span class="n">htmlparser</span> <span class="o">=</span> <span class="nb">require</span><span class="p">(</span><span class="s2">"</span><span class="s">htmlparser"</span><span class="p">)</span>
|
||||
</pre></div>
|
||||
|
||||
<p>Then, parse some html:</p>
|
||||
|
||||
<div class="highlight"><pre><span class="kd">local</span> <span class="n">root</span> <span class="o">=</span> <span class="n">htmlparser</span><span class="p">.</span><span class="n">parse</span><span class="p">(</span><span class="n">htmlstring</span><span class="p">)</span>
|
||||
<div class="highlight highlight-lua"><pre><span class="kd">local</span> <span class="n">root</span> <span class="o">=</span> <span class="n">htmlparser</span><span class="p">.</span><span class="n">parse</span><span class="p">(</span><span class="n">htmlstring</span><span class="p">)</span>
|
||||
</pre></div>
|
||||
|
||||
<p>The input to parse may be the contents of a complete html document, or any valid html snippet, as long as all tags are correctly opened and closed.
|
||||
Now, find sepcific contained elements by selecting:</p>
|
||||
Now, find specific contained elements by selecting:</p>
|
||||
|
||||
<div class="highlight"><pre><span class="kd">local</span> <span class="n">elements</span> <span class="o">=</span> <span class="n">root</span><span class="p">:</span><span class="nb">select</span><span class="p">(</span><span class="n">selectorstring</span><span class="p">)</span>
|
||||
<div class="highlight highlight-lua"><pre><span class="kd">local</span> <span class="n">elements</span> <span class="o">=</span> <span class="n">root</span><span class="p">:</span><span class="nb">select</span><span class="p">(</span><span class="n">selectorstring</span><span class="p">)</span>
|
||||
</pre></div>
|
||||
|
||||
<p>Or in shorthand:</p>
|
||||
|
||||
<div class="highlight"><pre><span class="kd">local</span> <span class="n">elements</span> <span class="o">=</span> <span class="n">root</span><span class="p">(</span><span class="n">selectorstring</span><span class="p">)</span>
|
||||
<div class="highlight highlight-lua"><pre><span class="kd">local</span> <span class="n">elements</span> <span class="o">=</span> <span class="n">root</span><span class="p">(</span><span class="n">selectorstring</span><span class="p">)</span>
|
||||
</pre></div>
|
||||
|
||||
<p>This wil return a <a href="http://wscherphof.github.com/lua-set/">Set</a> of elements, all of which are of the same type as the root element, and thus support selecting as well, if ever needed:</p>
|
||||
<p>This wil return a [Set][1] of elements, all of which are of the same type as the root element, and thus support selecting as well, if ever needed:</p>
|
||||
|
||||
<div class="highlight"><pre><span class="k">for</span> <span class="n">e</span> <span class="k">in</span> <span class="nb">pairs</span><span class="p">(</span><span class="n">elements</span><span class="p">)</span> <span class="k">do</span>
|
||||
<div class="highlight highlight-lua"><pre><span class="k">for</span> <span class="n">e</span> <span class="k">in</span> <span class="nb">pairs</span><span class="p">(</span><span class="n">elements</span><span class="p">)</span> <span class="k">do</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="n">e</span><span class="p">.</span><span class="n">name</span><span class="p">)</span>
|
||||
<span class="kd">local</span> <span class="n">subs</span> <span class="o">=</span> <span class="n">e</span><span class="p">(</span><span class="n">subselectorstring</span><span class="p">)</span>
|
||||
<span class="k">for</span> <span class="n">sub</span> <span class="k">in</span> <span class="nb">pairs</span><span class="p">(</span><span class="n">subs</span><span class="p">)</span> <span class="k">do</span>
|
||||
@ -78,9 +79,10 @@ Now, find sepcific contained elements by selecting:</p>
|
||||
|
||||
<p>The root element is a container for the top level elements in the parsed text, i.e. the <code><html></code> element in a parsed html document would be a child of the returned root element.</p>
|
||||
|
||||
<h2>Selectors</h2>
|
||||
<h2>
|
||||
<a name="selectors" class="anchor" href="#selectors"><span class="octicon octicon-link"></span></a>Selectors</h2>
|
||||
|
||||
<p>Supported selectors are a subset of <a href="http://api.jquery.com/category/selectors/">jQuery's selectors</a>:</p>
|
||||
<p>Supported selectors are a subset of [jQuery's selectors][2]:</p>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
@ -115,11 +117,13 @@ Now, find sepcific contained elements by selecting:</p>
|
||||
<code>"parent > child"</code> elements selected by the <code>child</code> selector string, that are a child element of any element selected by the <code>parent</code> selector string</li>
|
||||
</ul><p>Selectors can be combined; e.g. <code>".class:not([attribute]) element.class"</code></p>
|
||||
|
||||
<h2>Element type</h2>
|
||||
<h2>
|
||||
<a name="element-type" class="anchor" href="#element-type"><span class="octicon octicon-link"></span></a>Element type</h2>
|
||||
|
||||
<p>All tree elements provide, apart from <code>:select</code> and <code>()</code>, the following accessors:</p>
|
||||
|
||||
<h3>Basic</h3>
|
||||
<h3>
|
||||
<a name="basic" class="anchor" href="#basic"><span class="octicon octicon-link"></span></a>Basic</h3>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
@ -137,7 +141,8 @@ Now, find sepcific contained elements by selecting:</p>
|
||||
<li>
|
||||
<code>.parent</code> the elements that contains this element; <code>root.parent</code> is <code>nil</code>
|
||||
</li>
|
||||
</ul><h3>Other</h3>
|
||||
</ul><h3>
|
||||
<a name="other" class="anchor" href="#other"><span class="octicon octicon-link"></span></a>Other</h3>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
@ -150,16 +155,17 @@ Now, find sepcific contained elements by selecting:</p>
|
||||
<code>.root</code> the root element of the tree; <code>root.root</code> is <code>root</code>
|
||||
</li>
|
||||
<li>
|
||||
<code>.deepernodes</code> a <a href="http://wscherphof.github.com/lua-set/">Set</a> containing all elements in the tree beneath this element, including this element's <code>.nodes</code>; <code>{}</code> if none</li>
|
||||
<code>.deepernodes</code> a [Set][1] containing all elements in the tree beneath this element, including this element's <code>.nodes</code>; <code>{}</code> if none</li>
|
||||
<li>
|
||||
<code>.deeperelements</code> a table with a key for each distinct tagname in <code>.deepernodes</code>, containing a <a href="http://wscherphof.github.com/lua-set/">Set</a> of all deeper element nodes with that name; <code>{}</code> in none</li>
|
||||
<code>.deeperelements</code> a table with a key for each distinct tagname in <code>.deepernodes</code>, containing a [Set][1] of all deeper element nodes with that name; <code>{}</code> in none</li>
|
||||
<li>
|
||||
<code>.deeperattributes</code> as <code>.deeperelements</code>, but keyed on attribute name</li>
|
||||
<li>
|
||||
<code>.deeperids</code> as <code>.deeperelements</code>, but keyed on id value</li>
|
||||
<li>
|
||||
<code>.deeperclasses</code> as <code>.deeperelements</code>, but keyed on class name</li>
|
||||
</ul><h2>Limitations</h2>
|
||||
</ul><h2>
|
||||
<a name="limitations" class="anchor" href="#limitations"><span class="octicon octicon-link"></span></a>Limitations</h2>
|
||||
|
||||
<ul>
|
||||
<li>Attribute values in selector strings cannot contain any spaces, nor any of <code>#</code>, <code>.</code>, <code>[</code>, <code>]</code>, <code>:</code>, <code>(</code>, or <code>)</code>
|
||||
@ -167,21 +173,24 @@ Now, find sepcific contained elements by selecting:</p>
|
||||
<li>The spaces before and after the <code>></code> in a <code>parent > child</code> relation are mandatory </li>
|
||||
<li>
|
||||
<code><!</code> elements (including doctype, comments, and CDATA) are not parsed; markup within CDATA is <em>not</em> escaped</li>
|
||||
<li>Textnodes are no seperate tree elements; in <code>local root = htmlparser.parse("<p>line1<br />line2</p>")</code>, <code>root.nodes[1]:getcontent()</code> is <code>"line1<br />line2"</code>, while <code>root.nodes[1].nodes[1].name</code> is <code>"br"</code>
|
||||
<li>Textnodes are no separate tree elements; in <code>local root = htmlparser.parse("<p>line1<br />line2</p>")</code>, <code>root.nodes[1]:getcontent()</code> is <code>"line1<br />line2"</code>, while <code>root.nodes[1].nodes[1].name</code> is <code>"br"</code>
|
||||
</li>
|
||||
<li>No start or end tags are implied when <a href="http://www.w3.org/TR/html5/syntax.html#optional-tags">omitted</a>. Only the <a href="http://www.w3.org/TR/html5/syntax.html#void-elements">void elements</a> should not have an end tag</li>
|
||||
<li>No validation is done for tag or attribute names or nesting of element types. The list of void elements is in fact the only part specific to HTML</li>
|
||||
</ul><h2>Examples</h2>
|
||||
</ul><h2>
|
||||
<a name="examples" class="anchor" href="#examples"><span class="octicon octicon-link"></span></a>Examples</h2>
|
||||
|
||||
<p>See <code>./doc/sample.lua</code></p>
|
||||
|
||||
<h2>Tests</h2>
|
||||
<h2>
|
||||
<a name="tests" class="anchor" href="#tests"><span class="octicon octicon-link"></span></a>Tests</h2>
|
||||
|
||||
<p>See <code>./tst/init.lua</code></p>
|
||||
|
||||
<h2>License</h2>
|
||||
<h2>
|
||||
<a name="license" class="anchor" href="#license"><span class="octicon octicon-link"></span></a>License</h2>
|
||||
|
||||
<p>MIT; see <code>./doc/LICENSE</code></p>
|
||||
<p>LGPL+; see <code>./doc/LICENSE</code></p>
|
||||
</section>
|
||||
</div>
|
||||
|
||||
|
File diff suppressed because one or more lines are too long
@ -39,18 +39,11 @@ ol, ul {
|
||||
list-style: none;
|
||||
}
|
||||
|
||||
blockquote, q {
|
||||
}
|
||||
|
||||
table {
|
||||
border-collapse: collapse;
|
||||
border-spacing: 0;
|
||||
}
|
||||
|
||||
a:focus {
|
||||
outline: none;
|
||||
}
|
||||
|
||||
/*******************************************************************************
|
||||
Theme Styles
|
||||
*******************************************************************************/
|
||||
@ -125,14 +118,11 @@ a {
|
||||
-ms-transition: text-shadow 0.5s ease;
|
||||
}
|
||||
|
||||
#main_content a:hover {
|
||||
color: #0069ba;
|
||||
text-shadow: #0090ff 0px 0px 2px;
|
||||
}
|
||||
a:hover, a:focus {text-decoration: underline;}
|
||||
|
||||
footer a:hover {
|
||||
color: #43adff;
|
||||
text-shadow: #0090ff 0px 0px 2px;
|
||||
footer a {
|
||||
color: #F2F2F2;
|
||||
text-decoration: underline;
|
||||
}
|
||||
|
||||
em {
|
||||
@ -158,6 +148,15 @@ img {
|
||||
-ms-box-shadow: 0 0 5px #ebebeb;
|
||||
}
|
||||
|
||||
p img {
|
||||
display: inline;
|
||||
margin: 0;
|
||||
padding: 0;
|
||||
vertical-align: middle;
|
||||
text-align: center;
|
||||
border: none;
|
||||
}
|
||||
|
||||
pre, code {
|
||||
width: 100%;
|
||||
color: #222;
|
||||
@ -169,9 +168,6 @@ pre, code {
|
||||
border-radius: 2px;
|
||||
-moz-border-radius: 2px;
|
||||
-webkit-border-radius: 2px;
|
||||
|
||||
|
||||
|
||||
}
|
||||
|
||||
pre {
|
||||
@ -199,16 +195,17 @@ blockquote {
|
||||
border-left: 3px solid #bbb;
|
||||
}
|
||||
|
||||
|
||||
ul, ol, dl {
|
||||
margin-bottom: 15px
|
||||
}
|
||||
|
||||
ul li {
|
||||
ul {
|
||||
list-style: inside;
|
||||
padding-left: 20px;
|
||||
}
|
||||
|
||||
ol li {
|
||||
ol {
|
||||
list-style: decimal inside;
|
||||
padding-left: 20px;
|
||||
}
|
||||
@ -257,11 +254,6 @@ form {
|
||||
padding: 20px;
|
||||
}
|
||||
|
||||
img {
|
||||
width: 100%;
|
||||
max-width: 100%;
|
||||
}
|
||||
|
||||
/*******************************************************************************
|
||||
Full-Width Styles
|
||||
*******************************************************************************/
|
||||
@ -272,7 +264,7 @@ Full-Width Styles
|
||||
|
||||
.inner {
|
||||
position: relative;
|
||||
max-width: 800px;
|
||||
max-width: 640px;
|
||||
padding: 20px 10px;
|
||||
margin: 0 auto;
|
||||
}
|
||||
|
Loading…
Reference in New Issue
Block a user