6.0 KiB
ftcsv
ftcsv, a fairly fast csv library written in pure Lua. It's been tested with LuaJIT 2.0/2.1 and Lua 5.1, 5.2, and 5.3
It works well for CSVs that can easily be fully loaded into memory (easily up to a hundred MB). Currently, there isn't a "large" file mode with proper readers and writers for ingesting CSVs in bulk with a fixed amount of memory. It correctly handles both \n
(LF), \r
(CR) and \r\n
(CRLF) line endings (ie it should work with Unix, Mac OS 9, and Windows line endings) and has UTF-8 support.
Installing
You can either grab ftcsv.lua
from here or install via luarocks:
luarocks install ftcsv
Parsing
ftcsv.parse(fileName, delimiter [, options])
ftcsv will load the entire csv file into memory, then parse it in one go, returning a lua table with the parsed data. It has only two required parameters - a file name and delimiter (limited to one character). A few optional parameters can be passed in via a table (examples below).
Just loading a csv file:
local ftcsv = require('ftcsv')
local zipcodes = ftcsv.parse("free-zipcode-database.csv", ",")
Options
The following are optional parameters passed in via the third argument as a table. For example if you wanted to loadFromString
and not use headers
, you could use the following:
ftcsv.parse("apple,banana,carrot", ",", {loadFromString=true, headers=false})
-
loadFromString
If you want to load a csv from a string instead of a file, set
loadFromString
totrue
(default:false
)ftcsv.parse("a,b,c\r\n1,2,3", ",", {loadFromString=true})
-
rename
If you want to rename a field, you can set
rename
to change the field names. The below example will change the headers froma,b,c
tod,e,f
Note: You can rename two fields to the same value, ftcsv will keep the field that appears latest in the line.
local options = {loadFromString=true, rename={["a"] = "d", ["b"] = "e", ["c"] = "f"}} local actual = ftcsv.parse("a,b,c\r\napple,banana,carrot", ",", options)
-
fieldsToKeep
If you only want to keep certain fields from the CSV, send them in as a table-list and it should parse a little faster and use less memory.
Note: If you want to keep a renamed field, put the new name of the field in
fieldsToKeep
:local options = {loadFromString=true, fieldsToKeep={"a","f"}, rename={["c"] = "f"}} local actual = ftcsv.parse("a,b,c\r\napple,banana,carrot\r\n", ",", options)
-
headerFunc
Applies a function to every field in the header. If you are using
rename
, the function is applied after the rename.Ex: making all fields uppercase
local options = {loadFromString=true, headerFunc=string.upper} local actual = ftcsv.parse("a,b,c\napple,banana,carrot", ",", options)
-
headers
Set
headers
tofalse
if the file you are reading doesn't have any headers. This will cause ftcsv to create indexed tables rather than a key-value tables for the output.local options = {loadFromString=true, headers=false} local actual = ftcsv.parse("apple>banana>carrot\ndiamond>emerald>pearl", ">", options)
Note: Header-less files can still use the
rename
option and after a field has been renamed, it can specified as a field to keep. Therename
syntax changes a little bit:local options = {loadFromString=true, headers=false, rename={"a","b","c"}, fieldsToKeep={"a","b"}} local actual = ftcsv.parse("apple>banana>carrot\ndiamond>emerald>pearl", ">", options)
In the above example, the first field becomes 'a', the second field becomes 'b' and so on.
For all tested examples, take a look in /spec/feature_spec.lua
Encoding
ftcsv.encode(inputTable, delimiter[, options])
ftcsv can also take a lua table and turn it into a text string to be written to a file. It has two required parameters, an inputTable and a delimiter. You can use it to write out a file like this:
local fileOutput = ftcsv.encode(users, ",")
local file = assert(io.open("ALLUSERS.csv", "w"))
file:write(fileOutput)
file:close()
Options
-
fieldsToKeep
if
fieldsToKeep
is set in the encode process, only the fields specified will be written out to a file.local output = ftcsv.encode(everyUser, ",", {fieldsToKeep={"Name", "Phone", "City"}})
Performance
I did some basic testing and found that in lua, if you want to iterate over a string character-by-character and look for single chars, string.byte
performs better than string.sub
. As such, ftcsv iterates over the whole file and does byte compares to find quotes and delimiters and then generates a table from it. If you have thoughts on how to improve performance (either big picture or specifically within the code), create a GitHub issue - I'd love to hear about it!
Error Handling
ftcsv returns a litany of errors when passed a bad csv file or incorrect parameters. You can find a more detailed explanation of the more cryptic errors in ERRORS.md
Contributing
Feel free to create a new issue for any bugs you've found or help you need. If you want to contribute back to the project please do the following:
- If it's a major change (aka more than a quick bugfix), please create an issue so we can discuss it!
- Fork the repo
- Create a new branch
- Push your changes to the branch
- Run the test suite and make sure it still works
- Submit a pull request
- Wait for review
- Enjoy the changes made!
Licenses
- The main library is licensed under the MIT License. Feel free to use it!
- Some of the test CSVs are from csv-spectrum (BSD-2-Clause) which includes some from csvkit (MIT License)