2013-12-04 09:16:11 +00:00
|
|
|
# Lua-CSV - delimited file reading
|
|
|
|
|
|
|
|
## 1. What?
|
|
|
|
|
|
|
|
Lua-CSV is a Lua module for reading delimited text files (popularly CSV and
|
|
|
|
tab-separated files, but you can specify the separator).
|
|
|
|
|
|
|
|
Lua-CSV tries to auto-detect whether a file is delimited with commas or tabs,
|
|
|
|
copes with non-native newlines, survives newlines and quotes inside quoted
|
|
|
|
fields and offers an iterator interface so it can handle large files.
|
|
|
|
|
|
|
|
|
|
|
|
## 2. How?
|
|
|
|
|
|
|
|
local csv = require("csv")
|
|
|
|
local f = csv.open("file.csv")
|
|
|
|
for fields in f:lines() do
|
|
|
|
for i, v in ipairs(fields) do print(i, v) end
|
|
|
|
end
|
|
|
|
|
|
|
|
`csv.open` takes a second argument `parameters`, a table of parameters
|
|
|
|
controlling how the file is read:
|
|
|
|
|
|
|
|
+ `separator` sets the separator. It'll probably guess the separator
|
|
|
|
correctly if it's a comma or a tab (unless, say, the first field in a
|
|
|
|
tab-delimited file contains a comma), but if you want something else you'll
|
|
|
|
have to set this. It could be more than one character, but it's used as
|
|
|
|
part of a set: `"["..sep.."\n\r]"`
|
|
|
|
|
|
|
|
+ Set `header` to true if the file contains a header and each set of fields
|
|
|
|
will be keyed by the names in the header rather than by integer index.
|
|
|
|
|
|
|
|
+ `columns` provides a mechanism for column remapping.
|
|
|
|
Suppose you have a csv file as follows:
|
|
|
|
|
|
|
|
Word,Number
|
|
|
|
ONE,10
|
|
|
|
|
|
|
|
And columns is:
|
|
|
|
|
|
|
|
+ `{ word = true }` then the only field in the file would be
|
|
|
|
`{ word = "ONE" }`
|
|
|
|
+ `{ first = { name = "word"} }` then it would be { first = "ONE" }
|
|
|
|
+ `{ word = { transform = string.lower }}` would give { word = "one" }
|
|
|
|
+
|
|
|
|
{ word = true
|
|
|
|
number = { transform = function(x) return tonumber(x) / 10 end }}
|
|
|
|
would give `{ word = "ONE", number = 1 }`
|
|
|
|
|
|
|
|
A column can have more than one name:
|
|
|
|
`{ first = { names = {"word", "worm"}}}` to help cope with badly specified
|
|
|
|
file formats and spelling mistakes.
|
|
|
|
|
|
|
|
+ `buffer_size` controls the size of the blocks the file is read in. The
|
2014-05-26 09:38:32 +00:00
|
|
|
default is 1MB. It used to be 4096 bytes which is what `pagesize` says on
|
|
|
|
my system, but that seems kind of small.
|
2013-12-04 09:16:11 +00:00
|
|
|
|
2014-05-18 17:52:16 +00:00
|
|
|
`csv.openstring` works exactly like `csv.open` except the first argument
|
|
|
|
is the contents of the csv file. In this case `buffer_size` is set to
|
|
|
|
the length of the string.
|
2013-12-04 09:16:11 +00:00
|
|
|
|
|
|
|
## 3. Requirements
|
|
|
|
|
2013-12-04 09:45:55 +00:00
|
|
|
Lua 5.1, 5.2 or LuaJIT.
|
2013-12-04 09:16:11 +00:00
|
|
|
|
|
|
|
|
|
|
|
## 4. Issues
|
|
|
|
|
2013-12-04 17:50:18 +00:00
|
|
|
+ Some whitespace-delimited files might use more than one space between
|
|
|
|
fields, for example if the columns are "manually" aligned:
|
|
|
|
|
|
|
|
street nr city
|
|
|
|
"Oneway Street" 1 Toontown
|
|
|
|
|
|
|
|
It won't cope with this - you'll get lots of extra empty fields.
|
2013-12-04 09:16:11 +00:00
|
|
|
|
|
|
|
## 5. Wishlist
|
|
|
|
|
|
|
|
+ Tests would be nice.
|
|
|
|
+ So would better LDoc documentation.
|
|
|
|
|
|
|
|
|
|
|
|
## 6. Alternatives
|
|
|
|
|
|
|
|
+ [Penlight](http://github.com/stevedonovan/penlight) contains delimited
|
|
|
|
file reading. It reads the whole file in one go.
|
|
|
|
+ The Lua Wiki contains two pages on CSV
|
|
|
|
[here](http://lua-users.org/wiki/LuaCsv) and
|
|
|
|
[here](http://lua-users.org/wiki/CsvUtils).
|
|
|
|
+ There's an example using [LPeg](http://www.inf.puc-rio.br/~roberto/lpeg/)
|
|
|
|
to parse CSV [here](http://www.inf.puc-rio.br/~roberto/lpeg/#CSV)
|