crytok
crytok
Fastest configurable Indo European Language Tokenizer on earth based on double array trie & ac automata.
Installation
Add this to your application's shard.yml
:
dependencies:
crytok:
github: chenkovsky/crytok
Usage
require "crytok"
require "crytok/langs/en"
tokenizer = CryTok.build_en # a simple english tokenizer
# if you want to change the tokenize rule, look the implementation of 'build_en'
File.open(ARGV[0]) do |fi|
File.open(ARGV[1], "w") do |fo|
tokenizer.tokenized(fi, fo)
end
end
Development
TODO: Write development instructions here
Contributing
- Fork it (https://github.com/chenkovsky/crytok/fork)
- Create your feature branch (
git checkout -b my-new-feature
) - Commit your changes (
git commit -am 'Add some feature'
) - Push to the branch (
git push origin my-new-feature
) - Create a new Pull Request
Contributors
- chenkovsky chenkovsky - creator, maintainer
Repository
crytok
Owner
Statistic
- 1
- 0
- 0
- 0
- 1
- about 6 years ago
- August 29, 2018
License
MIT License
Links
Synced at
Thu, 07 Nov 2024 19:25:15 GMT
Languages