mirror of
https://github.com/msva/lua-htmlparser.git
synced 2024-11-04 23:34:20 +00:00
added some links and descriptions
This commit is contained in:
parent
eb1c0f4df8
commit
8483c2ca02
63
README.md
63
README.md
@ -2,6 +2,9 @@
|
|||||||
|
|
||||||
Parse HTML text into a tree of elements with selectors
|
Parse HTML text into a tree of elements with selectors
|
||||||
|
|
||||||
|
[1]: http://wscherphof.github.com/lua-set/
|
||||||
|
[2]: http://api.jquery.com/category/selectors/
|
||||||
|
|
||||||
##License
|
##License
|
||||||
MIT; see `./doc/LICENSE`
|
MIT; see `./doc/LICENSE`
|
||||||
|
|
||||||
@ -16,7 +19,7 @@ Then, parse some html:
|
|||||||
local root = htmlparser.parse(htmlstring)
|
local root = htmlparser.parse(htmlstring)
|
||||||
```
|
```
|
||||||
The input to parse may be the contents of a complete html document, or any valid html snippet, as long as all tags are correctly opened and closed.
|
The input to parse may be the contents of a complete html document, or any valid html snippet, as long as all tags are correctly opened and closed.
|
||||||
Now, find specific elements by selecting:
|
Now, find sepcific contained elements by selecting:
|
||||||
```lua
|
```lua
|
||||||
local elements = root:select(selectorstring)
|
local elements = root:select(selectorstring)
|
||||||
```
|
```
|
||||||
@ -24,7 +27,7 @@ Or in shorthand:
|
|||||||
```lua
|
```lua
|
||||||
local elements = root(selectorstring)
|
local elements = root(selectorstring)
|
||||||
```
|
```
|
||||||
This wil return a Set of elements, all of which are of the same type as the root element, and thus support selecting as well, if ever needed:
|
This wil return a [Set][1] of elements, all of which are of the same type as the root element, and thus support selecting as well, if ever needed:
|
||||||
```lua
|
```lua
|
||||||
for e in pairs(elements) do
|
for e in pairs(elements) do
|
||||||
print(e.name)
|
print(e.name)
|
||||||
@ -34,19 +37,23 @@ for e in pairs(elements) do
|
|||||||
end
|
end
|
||||||
end
|
end
|
||||||
```
|
```
|
||||||
|
The root element is a container for the top level elements in the parsed text, i.e. the `<html>` element in a parsed html document would be a child of the returned root element.
|
||||||
|
|
||||||
##Selectors
|
##Selectors
|
||||||
- `"element"`
|
Supported selectors are a subset of [jQuery's selectors][2]:
|
||||||
- `"#id"`
|
|
||||||
- `".class"`
|
- `"*"` all contained elements
|
||||||
- `"[attribute]"`
|
- `"element"` elements with the given tagname
|
||||||
- `"[attribute=value]"`
|
- `"#id"` elements with the given id attribute value
|
||||||
- `"[attribute!=value]"`
|
- `".class"` elements with the given classname in the class attribute
|
||||||
- `"[attribute|=value]"`
|
- `"[attribute]"` elements with an attribute of the given name
|
||||||
- `"[attribute*=value]"`
|
- `"[attribute='value']"` equals: elements with the given value for the attribute with the given name
|
||||||
- `"[attribute~=value]"`
|
- `"[attribute!='value']"` not equals: elements without an attribute of the given name, or with that attribute, but with a value that is different from the given value
|
||||||
- `"[attribute^=value]"`
|
- `"[attribute|='value']"` prefix: attribute's value is given value, or starts with given value, followed by a hyphen (`-`)
|
||||||
- `"[attribute$=value]"`
|
- `"[attribute*='value']"` contains: attribute's value contains given value
|
||||||
|
- `"[attribute~='value']"` word: attribute's value is a space-separated token, where one of the tokens is the given value
|
||||||
|
- `"[attribute^='value']"` starts with: attribute's value starts with given value
|
||||||
|
- `"[attribute$='value']"` ends with: attribute's value ends with given value
|
||||||
- `":not(selector)"`
|
- `":not(selector)"`
|
||||||
- `"ancestor descendant"`
|
- `"ancestor descendant"`
|
||||||
- `"parent > child"`
|
- `"parent > child"`
|
||||||
@ -56,6 +63,8 @@ Selectors can be combined; e.g. `".class:not([attribute]) element.class"`
|
|||||||
###Limitations
|
###Limitations
|
||||||
- Attribute values in selectors currently cannot contain any spaces, since space is interpreted as a delimiter between the `ancestor` and `descendant`, `parent` and `>`, or `>` and `child` parts of the selector
|
- Attribute values in selectors currently cannot contain any spaces, since space is interpreted as a delimiter between the `ancestor` and `descendant`, `parent` and `>`, or `>` and `child` parts of the selector
|
||||||
- Likewise, for the `parent > child` relation, the spaces before and after the `>` are mandatory
|
- Likewise, for the `parent > child` relation, the spaces before and after the `>` are mandatory
|
||||||
|
- `<!` elements are not parsed, including doctype and comments
|
||||||
|
- Textnodes are not seperate entries in the tree, so the content of `<p>line1<br />line2</p>` is plainly `"line1<br />line2"`
|
||||||
|
|
||||||
##Examples
|
##Examples
|
||||||
See `./doc/samples.lua`
|
See `./doc/samples.lua`
|
||||||
@ -64,20 +73,20 @@ See `./doc/samples.lua`
|
|||||||
All tree elements provide, apart from `:select` and `()`, the following accessors:
|
All tree elements provide, apart from `:select` and `()`, the following accessors:
|
||||||
|
|
||||||
###Basic
|
###Basic
|
||||||
- `.name` = the element's tagname
|
- `.name` the element's tagname
|
||||||
- `.attributes` = a table with keys and values for the element's attributes; `{}` if none
|
- `.attributes` a table with keys and values for the element's attributes; `{}` if none
|
||||||
- `.id` = the value of the element's id attribute; `nil` if not present
|
- `.id` the value of the element's id attribute; `nil` if not present
|
||||||
- `.classes` = an array with the classes listed in element's class attribute; `{}` if none
|
- `.classes` an array with the classes listed in element's class attribute; `{}` if none
|
||||||
- `:getcontent()` = the raw text between the opening and closing tags of the element; `""` if none
|
- `:getcontent()` the raw text between the opening and closing tags of the element; `""` if none
|
||||||
- `.nodes` = an array with the element's child elements, `{}` if none
|
- `.nodes` an array with the element's child elements, `{}` if none
|
||||||
- `.parent` = the elements that contains this element; `root.parent` is `nil`
|
- `.parent` the elements that contains this element; `root.parent` is `nil`
|
||||||
|
|
||||||
###Other
|
###Other
|
||||||
- `:gettext()` = the raw text of the complete element, starting with `"<tagname"` and ending with `"/>"`
|
- `:gettext()` the raw text of the complete element, starting with `"<tagname"` and ending with `"/>"`
|
||||||
- `.level` = how deep the element is in the tree; root level is `0`
|
- `.level` how deep the element is in the tree; root level is `0`
|
||||||
- `.root` the root element of the tree; `root.root` is `root`
|
- `.root` the root element of the tree; `root.root` is `root`
|
||||||
- `.deepernodes` = a Set containing all elements in the tree beneath this element, including this element's `.nodes`; `{}` if none
|
- `.deepernodes` a [Set][1] containing all elements in the tree beneath this element, including this element's `.nodes`; `{}` if none
|
||||||
- `.deeperelements` = a table with a key for each distinct tagname in `.deepernodes`, containing a Set of all deeper element nodes with that name; `{}` in none
|
- `.deeperelements` a table with a key for each distinct tagname in `.deepernodes`, containing a [Set][1] of all deeper element nodes with that name; `{}` in none
|
||||||
- `.deeperattributes` = as `.deeperelements`, but keyed on attribute name
|
- `.deeperattributes` as `.deeperelements`, but keyed on attribute name
|
||||||
- `.deeperids` = as `.deeperelements`, but keyed on id value
|
- `.deeperids` as `.deeperelements`, but keyed on id value
|
||||||
- `.deeperclasses` = as `.deeperelements`, but keyed on class name
|
- `.deeperclasses` as `.deeperelements`, but keyed on class name
|
||||||
|
Loading…
Reference in New Issue
Block a user