#LuaRock "htmlparser" Parse HTML text into a tree of elements with selectors ###License MIT; see ./doc/LICENSE ###Usage Start off with ```lua require("luarocks.loader") local htmlparser = require("htmlparser") ``` Then, parse some html: ```lua local root = htmlparser.parse(htmlstring) ``` The input to parse may be the contents of a complete html document, or any valid html snippet, as long as all tags are correctly opened and closed. Now, find specific elements by selecting: ```lua local elements = root:select(selectorstring) ``` Or in shorthand: ```lua local elements = root(selectorstring) ``` This wil return a Set of elements, all of which are of the same type as the root element, and thus support selecting as well, if ever needed: ```lua for e in pairs(elements) do print(e.name) local subs = e(subselectorstring) for sub in pairs(subs) do print("", sub.name) end end ``` ###Selectors - "element" - "#id" - ".class" - "[attribute]" - "[attribute=value]" - "[attribute!=value]" - "[attribute|=value]" - "[attribute*=value]" - "[attribute~=value]" - "[attribute^=value]" - "[attribute$=value]" - ":not(selector)" - "ancestor descendant" - "parent > child" Selectors can be combined; e.g. ".class:not([attribute]) element.class" ####Limitations - Attribute values in selectors currently cannot contain any spaces, since space is interpreted as a delimiter between ancestor and descendant, parent and >, or > and child parts of the selector - Likewise, for the parent > child relation, the spaces before and after the > are mandatory ###Element type The tree elements provide, apart from :select and (), the following accessors: - .name = the elements tagname - .attributes = a table with keys and values for the element's attributes - .id = the value of the element's id attribute, if present - .classes = an array with the classes listed in element's class attribute, if any - :getcontent() = the raw text between the opening and closing tags of the element - .nodes = an array with the element's child elements - .parent = the elements that contains this element; root.parent is nil - :gettext() = the raw text of the complete element, starting with `""` - .level = how deep the element is in the tree; root level is 0 - .root the root element of the tree; root.root is root - .deepernodes = a Set containing all elements in the tree beneath this element, including this element's .nodes - .deeperelements = a table with a key for each distinct tagname in .deepernodes, containing a Set of all deeper element nodes with that name - .deeperattributes = as .deeperelements, but keyed on attribute name - .deeperids = as .deeperelements, but keyed on id value - .deeperclasses = as .deeperelements, but keyed on class name