352d17e7ff
better error handling with unit tests! |
||
---|---|---|
spec | ||
.travis.yml | ||
ERRORS.md | ||
ftcsv-1.1.2-1.rockspec | ||
ftcsv.lua | ||
LICENSE | ||
README.md |
ftcsv
ftcsv, a fairly fast csv library written in pure Lua. It's been tested with LuaJIT 2.0/2.1 and Lua 5.1, 5.2, and 5.3
It works well for CSVs that can easily be fully loaded into memory (easily up to a hundred MB). Currently, there isn't a "large" file mode with proper readers and writers for ingesting CSVs in bulk with a fixed amount of memory. It correctly handles both \n
(LF) and \r\n
(CRLF) line endings (ie it should work with Windows and Mac/Linux line endings) and has UTF-8 support.
Installing
You can either grab ftcsv.lua
from here or install via luarocks:
luarocks install ftcsv
Parsing
###ftcsv.parse(fileName, delimiter [, options])
ftcsv will load the entire csv file into memory, then parse it in one go, returning a lua table with the parsed data. It has only two required parameters - a file name and delimiter (limited to one character). A few optional parameters can be passed in via a table (examples below).
Just loading a csv file:
local ftcsv = require('ftcsv')
local zipcodes = ftcsv.parse("free-zipcode-database.csv", ",")
Options
The following are optional parameters passed in via the third argument as a table. For example if you wanted to loadFromString
and not use headers
, you could use the following:
ftcsv.parse("apple,banana,carrot", ",", {loadFromString=true, headers=false})
-
loadFromString
If you want to load a csv from a string instead of a file, set
loadFromString
totrue
(default:false
)ftcsv.parse("a,b,c\r\n1,2,3", ",", {loadFromString=true})
-
rename
If you want to rename a field, you can set
rename
to change the field names. The below example will change the headers froma,b,c
tod,e,f
Note: You can rename two fields to the same value, ftcsv will keep the field that appears latest in the line.
local options = {loadFromString=true, rename={["a"] = "d", ["b"] = "e", ["c"] = "f"}} local actual = ftcsv.parse("a,b,c\r\napple,banana,carrot", ",", options)
-
fieldsToKeep
If you only want to keep certain fields from the CSV, send them in as a table-list and it should parse a little faster and use less memory.
Note: If you want to keep a renamed field, put the new name of the field in
fieldsToKeep
:local options = {loadFromString=true, fieldsToKeep={"a","f"}, rename={["c"] = "f"}} local actual = ftcsv.parse("a,b,c\r\napple,banana,carrot\r\n", ",", options)
-
headerFunc
Applies a function to every field in the header. If you are using
rename
, the function is applied after the rename.Ex: making all fields uppercase
local options = {loadFromString=true, headerFunc=string.upper} local actual = ftcsv.parse("a,b,c\napple,banana,carrot", ",", options)
-
headers
Set
headers
tofalse
if the file you are reading doesn't have any headers. This will cause ftcsv to create indexed tables rather than a key-value tables for the output.local options = {loadFromString=true, headers=false} local actual = ftcsv.parse("apple>banana>carrot\ndiamond>emerald>pearl", ">", options)
Note: Header-less files can still use the
rename
option and after a field has been renamed, it can specified as a field to keep. Therename
syntax changes a little bit:local options = {loadFromString=true, headers=false, rename={"a","b","c"}, fieldsToKeep={"a","b"}} local actual = ftcsv.parse("apple>banana>carrot\ndiamond>emerald>pearl", ">", options)
In the above example, the first field becomes 'a', the second field becomes 'b' and so on.
For all tested examples, take a look in /spec/feature_spec.lua
Encoding
###ftcsv.encode(inputTable, delimiter[, options])
ftcsv can also take a lua table and turn it into a text string to be written to a file. It has two required parameters, an inputTable and a delimiter. You can use it to write out a file like this:
local fileOutput = ftcsv.encode(users, ",")
local file = assert(io.open("ALLUSERS.csv", "w"))
file:write(fileOutput)
file:close()
Options
-
fieldsToKeep
if
fieldsToKeep
is set in the encode process, only the fields specified will be written out to a file.local output = ftcsv.encode(everyUser, ",", {fieldsToKeep={"Name", "Phone", "City"}})
Performance
I did some basic testing and found that in lua, if you want to iterate over a string character-by-character and look for single chars, string.byte
performs better than string.sub
. As such, ftcsv iterates over the whole file and does byte compares to find quotes and delimiters and then generates a table from it. If you have thoughts on how to improve performance (either big picture or specifically within the code), create a GitHub issue - I'd love to hear about it!
Error Handling
ftcsv returns a litany of errors when passed a bad csv file or incorrect parameters. You can find a more detailed explanation of the more cryptic errors in ERRORS.md
Contributing
Feel free to create a new issue for any bugs you've found or help you need. If you want to contribute back to the project please do the following: 0. If it's a major change (aka more than a quick little < 5 line bugfix), please create an issue so we can discuss it!
- Fork the repo
- Create a new branch
- Push your changes to the branch
- Run the test suite and make sure it still works
- Submit a pull request
- ???
- Enjoy the changes made to the repo!
Licenses
- The main library is licensed under the MIT License. Feel free to use it!
- Some of the test CSVs are from csv-spectrum (BSD-2-Clause) which includes some from csvkit (MIT License)