From abbab47865e7b9f9ed78a46028e651e7fc58bb9c Mon Sep 17 00:00:00 2001
From: Wouter Scherphof
Date: Mon, 8 Apr 2013 05:57:33 -0700
Subject: [PATCH] Create gh-pages branch via GitHub
---
index.html | 57 +++++++++++++++++++++++++----------------------------
params.json | 2 +-
2 files changed, 28 insertions(+), 31 deletions(-)
diff --git a/index.html b/index.html
index defbbcc..f711728 100644
--- a/index.html
+++ b/index.html
@@ -31,11 +31,7 @@
- License
-
-MIT; see ./doc/LICENSE
-
-Install
+ Install
Htmlparser is a listed LuaRock. Install using LuaRocks: luarocks install htmlparser
@@ -96,9 +92,9 @@ Now, find sepcific contained elements by selecting:
"[attribute]"
elements with an attribute of the given name
-"[attribute='value']"
equals: elements with the given value for the attribute with the given name
+"[attribute='value']"
equals: elements with the given value for the given attribute
-"[attribute!='value']"
not equals: elements without an attribute of the given name, or with that attribute, but with a value that is different from the given value
+"[attribute!='value']"
not equals: elements without the given attribute, or having the attribute, but with a different value
"[attribute|='value']"
prefix: attribute's value is given value, or starts with given value, followed by a hyphen (-
)
@@ -117,27 +113,6 @@ Now, find sepcific contained elements by selecting:
"parent > child"
elements selected by the child
selector string, that are a child element of any element selected by the parent
selector string
Selectors can be combined; e.g. ".class:not([attribute]) element.class"
-Limitations
-
-
-- Attribute values in selectors currently cannot contain any spaces, since space is interpreted as a delimiter between the
ancestor
and descendant
, parent
and >
, or >
and child
parts of the selector
-- Consequently, for the
parent > child
relation, the spaces before and after the >
are mandatory
-- Attribute values in selectors currently also cannot contain any of
#
, .
, [
, ]
, :
, (
, or )
-
--
-
<!
elements are not parsed, including doctype, comments, and CDATA
-- Textnodes are not seperate entries in the tree, so the content of
<p>line1<br />line2</p>
is plainly "line1<br />line2"
-
-- All start and end tags should be explicitly specified in the text to be parsed; omitted tags (as permitted by the the HTML spec) are NOT implied. Only the void elements naturally don't need (and mustn't have) an end tag
-- The HTML text is not validated in any way; tag and attribute names and the nesting of different tags is completely arbitrary. The only HTML-specific part of the parser is that it knows which tags are void elements
-
Examples
-
-See ./doc/sample.lua
-
-Tests
-
-See ./tst/init.lua
-
Element type
All tree elements provide, apart from :select
and ()
, the following accessors:
@@ -164,7 +139,7 @@ Now, find sepcific contained elements by selecting:
-
-
:gettext()
the raw text of the complete element, starting with "<tagname"
and ending with "/>"
+:gettext()
the complete element text, starting with "<tagname"
and ending with "/>"
or "</tagname>"
-
.level
how deep the element is in the tree; root level is 0
@@ -182,7 +157,29 @@ Now, find sepcific contained elements by selecting:
.deeperids
as .deeperelements
, but keyed on id value
-
.deeperclasses
as .deeperelements
, but keyed on class name
-
+Limitations
+
+
+- Attribute values in selector strings cannot contain any spaces, nor any of
#
, .
, [
, ]
, :
, (
, or )
+
+- The spaces before and after the
>
in a parent > child
relation are mandatory
+-
+
<!
elements (including doctype, comments, and CDATA) are not parsed; markup within CDATA is not escaped
+- Textnodes are no seperate tree elements; in
local root = htmlparser.parse("<p>line1<br />line2</p>")
, root.nodes[1]:getcontent()
is "line1<br />line2"
, while root.nodes[1].nodes[1].name
is "br"
+
+- No start or end tags are implied when omitted. Only the void elements should not have an end tag
+- No validation is done for tag or attribute names or nesting of element types. The list of void elements is in fact the only part specific to HTML
+
Examples
+
+See ./doc/sample.lua
+
+Tests
+
+See ./tst/init.lua
+
+License
+
+MIT; see ./doc/LICENSE
diff --git a/params.json b/params.json
index cc0a10d..498a1f9 100644
--- a/params.json
+++ b/params.json
@@ -1 +1 @@
-{"name":"LuaRock \"htmlparser\"","tagline":"Parse HTML text into a tree of elements with selectors","body":"[1]: http://wscherphof.github.com/lua-set/\r\n[2]: http://api.jquery.com/category/selectors/\r\n\r\n##License\r\nMIT; see `./doc/LICENSE`\r\n\r\n##Install\r\nHtmlparser is a listed [LuaRock](http://luarocks.org/repositories/rocks/). Install using [LuaRocks](http://www.luarocks.org/): `luarocks install htmlparser`\r\n\r\n###Dependencies\r\nHtmlparser depends on [Lua 5.2](http://www.lua.org/download.html), and on the [\"set\"][1] LuaRock, which is installed along automatically. To be able to run the tests, [lunitx](https://github.com/dcurrie/lunit) also comes along as a LuaRock\r\n\r\n##Usage\r\nStart off with\r\n```lua\r\nrequire(\"luarocks.loader\")\r\nlocal htmlparser = require(\"htmlparser\")\r\n```\r\nThen, parse some html:\r\n```lua\r\nlocal root = htmlparser.parse(htmlstring)\r\n```\r\nThe input to parse may be the contents of a complete html document, or any valid html snippet, as long as all tags are correctly opened and closed.\r\nNow, find sepcific contained elements by selecting:\r\n```lua\r\nlocal elements = root:select(selectorstring)\r\n```\r\nOr in shorthand:\r\n```lua\r\nlocal elements = root(selectorstring)\r\n```\r\nThis wil return a [Set][1] of elements, all of which are of the same type as the root element, and thus support selecting as well, if ever needed:\r\n```lua\r\nfor e in pairs(elements) do\r\n\tprint(e.name)\r\n\tlocal subs = e(subselectorstring)\r\n\tfor sub in pairs(subs) do\r\n\t\tprint(\"\", sub.name)\r\n\tend\r\nend\r\n```\r\nThe root element is a container for the top level elements in the parsed text, i.e. the `` element in a parsed html document would be a child of the returned root element.\r\n\r\n##Selectors\r\nSupported selectors are a subset of [jQuery's selectors][2]:\r\n\r\n- `\"*\"` all contained elements\r\n- `\"element\"` elements with the given tagname\r\n- `\"#id\"` elements with the given id attribute value\r\n- `\".class\"` elements with the given classname in the class attribute\r\n- `\"[attribute]\"` elements with an attribute of the given name\r\n- `\"[attribute='value']\"` equals: elements with the given value for the attribute with the given name\r\n- `\"[attribute!='value']\"` not equals: elements without an attribute of the given name, or with that attribute, but with a value that is different from the given value\r\n- `\"[attribute|='value']\"` prefix: attribute's value is given value, or starts with given value, followed by a hyphen (`-`)\r\n- `\"[attribute*='value']\"` contains: attribute's value contains given value\r\n- `\"[attribute~='value']\"` word: attribute's value is a space-separated token, where one of the tokens is the given value\r\n- `\"[attribute^='value']\"` starts with: attribute's value starts with given value\r\n- `\"[attribute$='value']\"` ends with: attribute's value ends with given value\r\n- `\":not(selectorstring)\"` elements not selected by given selector string\r\n- `\"ancestor descendant\"` elements selected by the `descendant` selector string, that are a descendant of any element selected by the `ancestor` selector string\r\n- `\"parent > child\"` elements selected by the `child` selector string, that are a child element of any element selected by the `parent` selector string\r\n\r\nSelectors can be combined; e.g. `\".class:not([attribute]) element.class\"`\r\n\r\n###Limitations\r\n- Attribute values in selectors currently cannot contain any spaces, since space is interpreted as a delimiter between the `ancestor` and `descendant`, `parent` and `>`, or `>` and `child` parts of the selector\r\n- Consequently, for the `parent > child` relation, the spaces before and after the `>` are mandatory\r\n- Attribute values in selectors currently also cannot contain any of `#`, `.`, `[`, `]`, `:`, `(`, or `)`\r\n- `line1
line2
` is plainly `\"line1
line2\"`\r\n- All start and end tags should be explicitly specified in the text to be parsed; omitted tags (as [permitted](http://www.w3.org/TR/html5/syntax.html#optional-tags) by the the HTML spec) are NOT implied. Only the [void](http://www.w3.org/TR/html5/syntax.html#void-elements) elements naturally don't need (and mustn't have) an end tag\r\n- The HTML text is not validated in any way; tag and attribute names and the nesting of different tags is completely arbitrary. The only HTML-specific part of the parser is that it knows which tags are void elements\r\n\r\n##Examples\r\nSee `./doc/sample.lua`\r\n\r\n##Tests\r\nSee `./tst/init.lua`\r\n\r\n##Element type\r\nAll tree elements provide, apart from `:select` and `()`, the following accessors:\r\n\r\n###Basic\r\n- `.name` the element's tagname\r\n- `.attributes` a table with keys and values for the element's attributes; `{}` if none\r\n- `.id` the value of the element's id attribute; `nil` if not present\r\n- `.classes` an array with the classes listed in element's class attribute; `{}` if none\r\n- `:getcontent()` the raw text between the opening and closing tags of the element; `\"\"` if none\r\n- `.nodes` an array with the element's child elements, `{}` if none\r\n- `.parent` the elements that contains this element; `root.parent` is `nil`\r\n\r\n###Other\r\n- `:gettext()` the raw text of the complete element, starting with `\"\"`\r\n- `.level` how deep the element is in the tree; root level is `0`\r\n- `.root` the root element of the tree; `root.root` is `root`\r\n- `.deepernodes` a [Set][1] containing all elements in the tree beneath this element, including this element's `.nodes`; `{}` if none\r\n- `.deeperelements` a table with a key for each distinct tagname in `.deepernodes`, containing a [Set][1] of all deeper element nodes with that name; `{}` in none\r\n- `.deeperattributes` as `.deeperelements`, but keyed on attribute name\r\n- `.deeperids` as `.deeperelements`, but keyed on id value\r\n- `.deeperclasses` as `.deeperelements`, but keyed on class name\r\n","google":"","note":"Don't delete this file! It's used internally to help with page regeneration."}
\ No newline at end of file
+{"name":"LuaRock \"htmlparser\"","tagline":"Parse HTML text into a tree of elements with selectors","body":"[1]: http://wscherphof.github.com/lua-set/\r\n[2]: http://api.jquery.com/category/selectors/\r\n\r\n##Install\r\nHtmlparser is a listed [LuaRock](http://luarocks.org/repositories/rocks/). Install using [LuaRocks](http://www.luarocks.org/): `luarocks install htmlparser`\r\n\r\n###Dependencies\r\nHtmlparser depends on [Lua 5.2](http://www.lua.org/download.html), and on the [\"set\"][1] LuaRock, which is installed along automatically. To be able to run the tests, [lunitx](https://github.com/dcurrie/lunit) also comes along as a LuaRock\r\n\r\n##Usage\r\nStart off with\r\n```lua\r\nrequire(\"luarocks.loader\")\r\nlocal htmlparser = require(\"htmlparser\")\r\n```\r\nThen, parse some html:\r\n```lua\r\nlocal root = htmlparser.parse(htmlstring)\r\n```\r\nThe input to parse may be the contents of a complete html document, or any valid html snippet, as long as all tags are correctly opened and closed.\r\nNow, find sepcific contained elements by selecting:\r\n```lua\r\nlocal elements = root:select(selectorstring)\r\n```\r\nOr in shorthand:\r\n```lua\r\nlocal elements = root(selectorstring)\r\n```\r\nThis wil return a [Set][1] of elements, all of which are of the same type as the root element, and thus support selecting as well, if ever needed:\r\n```lua\r\nfor e in pairs(elements) do\r\n\tprint(e.name)\r\n\tlocal subs = e(subselectorstring)\r\n\tfor sub in pairs(subs) do\r\n\t\tprint(\"\", sub.name)\r\n\tend\r\nend\r\n```\r\nThe root element is a container for the top level elements in the parsed text, i.e. the `` element in a parsed html document would be a child of the returned root element.\r\n\r\n##Selectors\r\nSupported selectors are a subset of [jQuery's selectors][2]:\r\n\r\n- `\"*\"` all contained elements\r\n- `\"element\"` elements with the given tagname\r\n- `\"#id\"` elements with the given id attribute value\r\n- `\".class\"` elements with the given classname in the class attribute\r\n- `\"[attribute]\"` elements with an attribute of the given name\r\n- `\"[attribute='value']\"` equals: elements with the given value for the given attribute\r\n- `\"[attribute!='value']\"` not equals: elements without the given attribute, or having the attribute, but with a different value\r\n- `\"[attribute|='value']\"` prefix: attribute's value is given value, or starts with given value, followed by a hyphen (`-`)\r\n- `\"[attribute*='value']\"` contains: attribute's value contains given value\r\n- `\"[attribute~='value']\"` word: attribute's value is a space-separated token, where one of the tokens is the given value\r\n- `\"[attribute^='value']\"` starts with: attribute's value starts with given value\r\n- `\"[attribute$='value']\"` ends with: attribute's value ends with given value\r\n- `\":not(selectorstring)\"` elements not selected by given selector string\r\n- `\"ancestor descendant\"` elements selected by the `descendant` selector string, that are a descendant of any element selected by the `ancestor` selector string\r\n- `\"parent > child\"` elements selected by the `child` selector string, that are a child element of any element selected by the `parent` selector string\r\n\r\nSelectors can be combined; e.g. `\".class:not([attribute]) element.class\"`\r\n\r\n##Element type\r\nAll tree elements provide, apart from `:select` and `()`, the following accessors:\r\n\r\n###Basic\r\n- `.name` the element's tagname\r\n- `.attributes` a table with keys and values for the element's attributes; `{}` if none\r\n- `.id` the value of the element's id attribute; `nil` if not present\r\n- `.classes` an array with the classes listed in element's class attribute; `{}` if none\r\n- `:getcontent()` the raw text between the opening and closing tags of the element; `\"\"` if none\r\n- `.nodes` an array with the element's child elements, `{}` if none\r\n- `.parent` the elements that contains this element; `root.parent` is `nil`\r\n\r\n###Other\r\n- `:gettext()` the complete element text, starting with `\"\"` or `\"\"`\r\n- `.level` how deep the element is in the tree; root level is `0`\r\n- `.root` the root element of the tree; `root.root` is `root`\r\n- `.deepernodes` a [Set][1] containing all elements in the tree beneath this element, including this element's `.nodes`; `{}` if none\r\n- `.deeperelements` a table with a key for each distinct tagname in `.deepernodes`, containing a [Set][1] of all deeper element nodes with that name; `{}` in none\r\n- `.deeperattributes` as `.deeperelements`, but keyed on attribute name\r\n- `.deeperids` as `.deeperelements`, but keyed on id value\r\n- `.deeperclasses` as `.deeperelements`, but keyed on class name\r\n\r\n##Limitations\r\n- Attribute values in selector strings cannot contain any spaces, nor any of `#`, `.`, `[`, `]`, `:`, `(`, or `)`\r\n- The spaces before and after the `>` in a `parent > child` relation are mandatory \r\n- `line1
line2\")`, `root.nodes[1]:getcontent()` is `\"line1
line2\"`, while `root.nodes[1].nodes[1].name` is `\"br\"`\r\n- No start or end tags are implied when [omitted](http://www.w3.org/TR/html5/syntax.html#optional-tags). Only the [void elements](http://www.w3.org/TR/html5/syntax.html#void-elements) should not have an end tag\r\n- No validation is done for tag or attribute names or nesting of element types. The list of void elements is in fact the only part specific to HTML\r\n\r\n##Examples\r\nSee `./doc/sample.lua`\r\n\r\n##Tests\r\nSee `./tst/init.lua`\r\n\r\n##License\r\nMIT; see `./doc/LICENSE`\r\n","google":"","note":"Don't delete this file! It's used internally to help with page regeneration."}
\ No newline at end of file