diff --git a/README.md b/README.md index 682cc3f..52ceca6 100644 --- a/README.md +++ b/README.md @@ -2,6 +2,9 @@ Parse HTML text into a tree of elements with selectors +[1]: http://wscherphof.github.com/lua-set/ +[2]: http://api.jquery.com/category/selectors/ + ##License MIT; see `./doc/LICENSE` @@ -16,7 +19,7 @@ Then, parse some html: local root = htmlparser.parse(htmlstring) ``` The input to parse may be the contents of a complete html document, or any valid html snippet, as long as all tags are correctly opened and closed. -Now, find specific elements by selecting: +Now, find sepcific contained elements by selecting: ```lua local elements = root:select(selectorstring) ``` @@ -24,7 +27,7 @@ Or in shorthand: ```lua local elements = root(selectorstring) ``` -This wil return a Set of elements, all of which are of the same type as the root element, and thus support selecting as well, if ever needed: +This wil return a [Set][1] of elements, all of which are of the same type as the root element, and thus support selecting as well, if ever needed: ```lua for e in pairs(elements) do print(e.name) @@ -34,19 +37,23 @@ for e in pairs(elements) do end end ``` +The root element is a container for the top level elements in the parsed text, i.e. the `` element in a parsed html document would be a child of the returned root element. ##Selectors -- `"element"` -- `"#id"` -- `".class"` -- `"[attribute]"` -- `"[attribute=value]"` -- `"[attribute!=value]"` -- `"[attribute|=value]"` -- `"[attribute*=value]"` -- `"[attribute~=value]"` -- `"[attribute^=value]"` -- `"[attribute$=value]"` +Supported selectors are a subset of [jQuery's selectors][2]: + +- `"*"` all contained elements +- `"element"` elements with the given tagname +- `"#id"` elements with the given id attribute value +- `".class"` elements with the given classname in the class attribute +- `"[attribute]"` elements with an attribute of the given name +- `"[attribute='value']"` equals: elements with the given value for the attribute with the given name +- `"[attribute!='value']"` not equals: elements without an attribute of the given name, or with that attribute, but with a value that is different from the given value +- `"[attribute|='value']"` prefix: attribute's value is given value, or starts with given value, followed by a hyphen (`-`) +- `"[attribute*='value']"` contains: attribute's value contains given value +- `"[attribute~='value']"` word: attribute's value is a space-separated token, where one of the tokens is the given value +- `"[attribute^='value']"` starts with: attribute's value starts with given value +- `"[attribute$='value']"` ends with: attribute's value ends with given value - `":not(selector)"` - `"ancestor descendant"` - `"parent > child"` @@ -56,6 +63,8 @@ Selectors can be combined; e.g. `".class:not([attribute]) element.class"` ###Limitations - Attribute values in selectors currently cannot contain any spaces, since space is interpreted as a delimiter between the `ancestor` and `descendant`, `parent` and `>`, or `>` and `child` parts of the selector - Likewise, for the `parent > child` relation, the spaces before and after the `>` are mandatory +- `line1
line2

` is plainly `"line1
line2"` ##Examples See `./doc/samples.lua` @@ -64,20 +73,20 @@ See `./doc/samples.lua` All tree elements provide, apart from `:select` and `()`, the following accessors: ###Basic -- `.name` = the element's tagname -- `.attributes` = a table with keys and values for the element's attributes; `{}` if none -- `.id` = the value of the element's id attribute; `nil` if not present -- `.classes` = an array with the classes listed in element's class attribute; `{}` if none -- `:getcontent()` = the raw text between the opening and closing tags of the element; `""` if none -- `.nodes` = an array with the element's child elements, `{}` if none -- `.parent` = the elements that contains this element; `root.parent` is `nil` +- `.name` the element's tagname +- `.attributes` a table with keys and values for the element's attributes; `{}` if none +- `.id` the value of the element's id attribute; `nil` if not present +- `.classes` an array with the classes listed in element's class attribute; `{}` if none +- `:getcontent()` the raw text between the opening and closing tags of the element; `""` if none +- `.nodes` an array with the element's child elements, `{}` if none +- `.parent` the elements that contains this element; `root.parent` is `nil` ###Other -- `:gettext()` = the raw text of the complete element, starting with `""` -- `.level` = how deep the element is in the tree; root level is `0` +- `:gettext()` the raw text of the complete element, starting with `""` +- `.level` how deep the element is in the tree; root level is `0` - `.root` the root element of the tree; `root.root` is `root` -- `.deepernodes` = a Set containing all elements in the tree beneath this element, including this element's `.nodes`; `{}` if none -- `.deeperelements` = a table with a key for each distinct tagname in `.deepernodes`, containing a Set of all deeper element nodes with that name; `{}` in none -- `.deeperattributes` = as `.deeperelements`, but keyed on attribute name -- `.deeperids` = as `.deeperelements`, but keyed on id value -- `.deeperclasses` = as `.deeperelements`, but keyed on class name +- `.deepernodes` a [Set][1] containing all elements in the tree beneath this element, including this element's `.nodes`; `{}` if none +- `.deeperelements` a table with a key for each distinct tagname in `.deepernodes`, containing a [Set][1] of all deeper element nodes with that name; `{}` in none +- `.deeperattributes` as `.deeperelements`, but keyed on attribute name +- `.deeperids` as `.deeperelements`, but keyed on id value +- `.deeperclasses` as `.deeperelements`, but keyed on class name