ftcsv/README.md
Shakil Thakur 54d0817376 Merge pull request #11 from FourierTransformer/os9-line-endings-fix
Mac OS9 line endings fix and BOM removal
2017-12-01 22:56:10 -06:00

6.1 KiB

ftcsv

Build Status Coverage Status

ftcsv, a fairly fast csv library written in pure Lua. It's been tested with LuaJIT 2.0/2.1 and Lua 5.1, 5.2, and 5.3

It works well for CSVs that can easily be fully loaded into memory (easily up to a hundred MB). Currently, there isn't a "large" file mode with proper readers and writers for ingesting CSVs in bulk with a fixed amount of memory. It correctly handles both \n (LF), \r (CR) and \r\n (CRLF) line endings (ie it should work with Unix, Mac OS 9, and Windows line endings), and has UTF-8 support (it will strip out BOM if it exists).

Installing

You can either grab ftcsv.lua from here or install via luarocks:

luarocks install ftcsv

Parsing

ftcsv.parse(fileName, delimiter [, options])

ftcsv will load the entire csv file into memory, then parse it in one go, returning a lua table with the parsed data and a lua table containing the column headers. It has only two required parameters - a file name and delimiter (limited to one character). A few optional parameters can be passed in via a table (examples below).

Just loading a csv file:

local ftcsv = require('ftcsv')
local zipcodes, headers = ftcsv.parse("free-zipcode-database.csv", ",")

Options

The following are optional parameters passed in via the third argument as a table. For example if you wanted to loadFromString and not use headers, you could use the following:

ftcsv.parse("apple,banana,carrot", ",", {loadFromString=true, headers=false})
  • loadFromString

    If you want to load a csv from a string instead of a file, set loadFromString to true (default: false)

    ftcsv.parse("a,b,c\r\n1,2,3", ",", {loadFromString=true})
    
  • rename

    If you want to rename a field, you can set rename to change the field names. The below example will change the headers from a,b,c to d,e,f

    Note: You can rename two fields to the same value, ftcsv will keep the field that appears latest in the line.

    local options = {loadFromString=true, rename={["a"] = "d", ["b"] = "e", ["c"] = "f"}}
    local actual = ftcsv.parse("a,b,c\r\napple,banana,carrot", ",", options)
    
  • fieldsToKeep

    If you only want to keep certain fields from the CSV, send them in as a table-list and it should parse a little faster and use less memory.

    Note: If you want to keep a renamed field, put the new name of the field in fieldsToKeep:

    local options = {loadFromString=true, fieldsToKeep={"a","f"}, rename={["c"] = "f"}}
    local actual = ftcsv.parse("a,b,c\r\napple,banana,carrot\r\n", ",", options)
    
  • headerFunc

    Applies a function to every field in the header. If you are using rename, the function is applied after the rename.

    Ex: making all fields uppercase

    local options = {loadFromString=true, headerFunc=string.upper}
    local actual = ftcsv.parse("a,b,c\napple,banana,carrot", ",", options)
    
  • headers

    Set headers to false if the file you are reading doesn't have any headers. This will cause ftcsv to create indexed tables rather than a key-value tables for the output.

    local options = {loadFromString=true, headers=false}
    local actual = ftcsv.parse("apple>banana>carrot\ndiamond>emerald>pearl", ">", options)
    

    Note: Header-less files can still use the rename option and after a field has been renamed, it can specified as a field to keep. The rename syntax changes a little bit:

    local options = {loadFromString=true, headers=false, rename={"a","b","c"}, fieldsToKeep={"a","b"}}
    local actual = ftcsv.parse("apple>banana>carrot\ndiamond>emerald>pearl", ">", options)
    

    In the above example, the first field becomes 'a', the second field becomes 'b' and so on.

For all tested examples, take a look in /spec/feature_spec.lua

Encoding

ftcsv.encode(inputTable, delimiter[, options])

ftcsv can also take a lua table and turn it into a text string to be written to a file. It has two required parameters, an inputTable and a delimiter. You can use it to write out a file like this:

local fileOutput = ftcsv.encode(users, ",")
local file = assert(io.open("ALLUSERS.csv", "w"))
file:write(fileOutput)
file:close()

Options

  • fieldsToKeep

    if fieldsToKeep is set in the encode process, only the fields specified will be written out to a file.

    local output = ftcsv.encode(everyUser, ",", {fieldsToKeep={"Name", "Phone", "City"}})
    

Performance

I did some basic testing and found that in lua, if you want to iterate over a string character-by-character and look for single chars, string.byte performs better than string.sub. As such, ftcsv iterates over the whole file and does byte compares to find quotes and delimiters and then generates a table from it. If you have thoughts on how to improve performance (either big picture or specifically within the code), create a GitHub issue - I'd love to hear about it!

Error Handling

ftcsv returns a litany of errors when passed a bad csv file or incorrect parameters. You can find a more detailed explanation of the more cryptic errors in ERRORS.md

Contributing

Feel free to create a new issue for any bugs you've found or help you need. If you want to contribute back to the project please do the following:

  1. If it's a major change (aka more than a quick bugfix), please create an issue so we can discuss it!
  2. Fork the repo
  3. Create a new branch
  4. Push your changes to the branch
  5. Run the test suite and make sure it still works
  6. Submit a pull request
  7. Wait for review
  8. Enjoy the changes made!

Licenses

  • The main library is licensed under the MIT License. Feel free to use it!
  • Some of the test CSVs are from csv-spectrum (BSD-2-Clause) which includes some from csvkit (MIT License)