LoveFrames/loveframes/third-party/utf8
2020-08-04 11:28:04 +01:00
..
begins/compiletime utf8 lib update 2020-08-04 11:28:04 +01:00
charclass utf8 lib update 2020-08-04 11:28:04 +01:00
context utf8 lib update 2020-08-04 11:28:04 +01:00
ends/compiletime utf8 lib update 2020-08-04 11:28:04 +01:00
functions utf8 lib update 2020-08-04 11:28:04 +01:00
modifier/compiletime utf8 lib update 2020-08-04 11:28:04 +01:00
primitives utf8 lib update 2020-08-04 11:28:04 +01:00
init.lua utf8 lib update 2020-08-04 11:28:04 +01:00
LICENSE Add new utf8 library 2020-05-11 17:23:16 +01:00
README.md Add new utf8 library 2020-05-11 17:23:16 +01:00
regex_parser.lua utf8 lib update 2020-08-04 11:28:04 +01:00
util.lua Add new utf8 library 2020-05-11 17:23:16 +01:00

utf8.lua

pure-lua 5.3 regex library for Lua 5.3, Lua 5.1, LuaJIT

This library provides simple way to add UTF-8 support into your application.

Example:

local utf8 = require('.utf8'):init()
for k,v in pairs(utf8) do
  string[k] = v
end

local str = "пыщпыщ ололоо я водитель нло"
print(str:find("(.л.+)н"))
-- 8	26	ололоо я водитель

print(str:gsub("ло+", "보라"))
-- пыщпыщ о보라보라 я водитель н보라	3

print(str:match("^п[лопыщ ]*я"))
-- пыщпыщ ололоо я

Usage:

This library can be used as drop-in replacement for vanilla string library. It exports all vanilla functions under raw sub-object.

local utf8 = require('.utf8'):init()
local str = "пыщпыщ ололоо я водитель нло"
utf8.gsub(str, "ло+", "보라")
-- пыщпыщ о보라보라 я водитель н보라	3
utf8.raw.gsub(str, "ло+", "보라")
-- пыщпыщ о보라보라о я водитель н보라	3

It also provides all functions from Lua 5.3 UTF-8 module except utf8.len (s [, i [, j]]). If you need to validate your strings use utf8.validate(str, byte_pos) or iterate over with utf8.validator.

Installation:

Download repository to your project folder. (no rockspecs yet)

As of Lua 5.3 default utf8 module has precedence over user-provided. In this case you can specify full module path (.utf8).

Configuration:

Library is highly modular. You can provide your implementation for almost any function used. Library already has several back-ends:

Probably most interesting customizations are utf8.config.loadstring and utf8.config.cache if you want to precompile your regexes.

local utf8 = require('.utf8')
utf8.config = {
  cache = my_smart_cache,
}
utf8:init()

Customization is done before initialization. If you want, you can change configuration after init, it might work for everything but modules. All of them should be reloaded.

Documentation: